# Local Model

In this tutorial, we'll explore working with local LLMs on your own computer using Spring-AI and Kotlin.
While we've focused on remote providers like OpenAI and Anthropic in previous tutorials,
running models locally offers several distinct advantages.
Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

In [1]:
%useLatestDescriptors
%use spring-ai-ollama

## Why use local models?

Local models may not match the power of cloud-based alternatives, but they offer compelling benefits:

- **Free to use** — no API costs or rate limits
- **Work offline** — no internet connection required
- **Privacy** — process sensitive data without sending it to third parties
- **Full control** — you determine model settings and deployment

Spring-AI supports local models through Ollama,
which makes running LLMs on your machine remarkably straightforward.

## Setting up Ollama

First,
visit [ollama.com](https://ollama.com/)
and install the application by following the simple instructions for your operating system.

Next,
explore the [Models page](https://ollama.com/search)
to find a model that fits your needs.
You'll see options ranging from small,
fast models to larger, more capable ones.
Consider your hardware specifications when choosing —
larger models require more RAM and GPU resources.

For this tutorial, I've selected `deepseek-r1`,
which demonstrates _"thinking"_ capabilities similar to reasoning modes in commercial models.

Let's download and start the model:

```bash
ollama run deepseek-r1
```

This command will download the `deepseek-r1:7b` model
(if you don't already have it) and start it.
The first run may take some time as it downloads the model files.

## Creating a Spring-AI Client for Local Model

Now that our local model is running, let's set up the Spring-AI client:

In [2]:
val ollamaApi = OllamaApi.builder().build()

val ollamaOptions = OllamaOptions.builder()
    .model("deepseek-r1")
    .temperature(0.7)
    .build()

val ollamaModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(ollamaOptions)
    .build()

val chatClient = ChatClient.create(ollamaModel)

Notice that we don't need any API keys — the model is already running on our device.
The process for creating a ChatModel is very similar to what we've done with OpenAI,
Anthropic, and other providers.

With everything set up, let's test our model with a simple query:

In [3]:
chatClient.prompt("tell me a joke about Kotlin").call().content()

<think>
Okay, so I need to come up with a joke about Kotlin. Hmm, where do I start? Well, I know that Kotlin is a programming language, and it's often compared to Java because it runs on the Java Virtual Machine (JVM). So maybe I can play around with that.

Let me think about the differences between Java and Kotlin. One of the main selling points of Kotlin is that it's more modern and concise, which makes coding easier and less error-prone. There are also some features like null safety, which was a big issue in Java. Oh, right, in Java you can have nullable types with 'null' being allowed, but that can lead to NullPointerExceptions if not handled properly.

In Kotlin, they introduced the concept of 'nullable' and 'non-nullable' variables using the '?' symbol. So, for example, a variable like String? would be nullable, meaning it could hold a String or null. But if you try to call a method on a null value, it will throw an exception automatically, which is called a NullPointerException 

You'll notice the response includes a new section under the `<think>` tag.
This is the reasoning block where you can see how the model formed its response to your request.

Given the amount of information returned,
the model processed quite quickly — another advantage of local models.

## Streaming Responses with Local Models

Let's see how streaming works with local models,
allowing us to watch the reasoning process in real-time.

First, we'll need to add dependencies for working with `Flow`:

In [4]:
%useLatestDescriptors
%use coroutines
@file:DependsOn("org.jetbrains.kotlinx:kotlinx-coroutines-reactive:1.10.1")

Now, let's create a more complex prompt and stream the response:

In [7]:
import kotlinx.coroutines.reactive.asFlow

val streamResponse = chatClient
    .prompt(
        """
        Generate a response in the format of a fully valid HTML5 document, answering the following question:

        <Question>
        Create a colorful periodic table of Kotlin operators showing precedence levels.
        Include descriptions with examples of each operator in action.
        </Question>

        <Instructions>
        Your answer must include:
        1. The complete structure of an HTML document with <html>, <head> (including <title>) and <body> tags
        2. Semantic markup (h1, h2, p, ul/ol, etc.)
        3. Use tables or lists where appropriate to structure the data.
        4. Include minimalist CSS styles to improve readability.
        5. The page must be fully valid HTML5.

        Do not add explanations outside the HTML markup. The entire answer must be only in HTML format!
        </Instructions>
        """
    )
    .stream()
    .content()
    .asFlow()

val answer = StringBuilder()
runBlocking {
    streamResponse.collect {
        answer.append(it)
        print(it)
    }
}

<think>
Okay, I need to create a response that's a fully valid HTML5 document answering how to generate a colorful periodic table of Kotlin operators with precedence levels and examples. 

First, I'll start by setting up the basic structure: html, head, title, and body tags. The head should include meta tags for charset and viewport, and a title like "Kotlin Operators Periodic Table."

Next, styling is important for readability. I'll add internal CSS styles. Maybe define classes for precedence levels as different colors—like yellow for highest, green for next, etc. Also, style the table with borders and alternating row colors to make it visually appealing.

In the body, an h1 heading will introduce the periodic table. Then, a p to explain what's included—precedence levels and examples.

The main content is a table. I'll structure it with operators as rows, columns for precedence level, symbol, description, and example. Each operator should have its own row. For precedence, assign color

The response consists of two parts: the reasoning and the final HTML answer.
Let's extract the HTML part and render it in our notebook:

In [8]:
val html = answer.toString().substringAfterLast("```html").substringBeforeLast("```")

HTML(html)

Precedence Level,Operator Symbol,Description,Example
Level 1,++,Increment operator (prefix),val a = 5 println(a++) // Outputs 5 then increments to 6
Level 2,-,Unary minus,val b = -5 // Represents negative five
Level 3,*,Multiplication,val c = 2 * 3 // Outputs 6
Level 4,+,Addition,val d = 2 + 3 // Outputs 5
Level 5,=,Assignment,var x = 10 // Assigns value 10 to x


Local models offer an excellent alternative when you need privacy, offline capability, or want to avoid API costs.
With Spring-AI, the programming interface remains consistent whether you're using cloud-based or local models,
making it easy to switch between them as your needs change.

You can apply all the techniques from our previous tutorials (prompts, streaming, tools, structured outputs, advisors, RAG) with local models as well —
the same Spring-AI abstractions work across providers.