# Local Model

In this tutorial, we'll explore working with local LLMs on your own computer using Spring-AI and Kotlin.
While we've focused on remote providers like OpenAI and Anthropic in previous tutorials,
running models locally offers several distinct advantages.
Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

In [1]:
@file:DependsOn("org.springframework.ai:spring-ai-ollama-spring-boot-starter:1.0.0-M6")
@file:DependsOn("com.fasterxml.jackson.module:jackson-module-kotlin:2.18.2")

## Why use local models?

Local models may not match the power of cloud-based alternatives, but they offer compelling benefits:

- **Free to use** — no API costs or rate limits
- **Work offline** — no internet connection required
- **Privacy** — process sensitive data without sending it to third parties
- **Full control** — you determine model settings and deployment

Spring-AI supports local models through Ollama,
which makes running LLMs on your machine remarkably straightforward.

## Setting up Ollama

First,
visit [ollama.com](https://ollama.com/)
and install the application by following the simple instructions for your operating system.

Next,
explore the [Models page](https://ollama.com/search)
to find a model that fits your needs.
You'll see options ranging from small,
fast models to larger, more capable ones.
Consider your hardware specifications when choosing —
larger models require more RAM and GPU resources.

For this tutorial, I've selected `deepseek-r1`,
which demonstrates _"thinking"_ capabilities similar to reasoning modes in commercial models.

Let's download and start the model:

```bash
ollama run deepseek-r1
```

This command will download the `deepseek-r1:7b` model
(if you don't already have it) and start it.
The first run may take some time as it downloads the model files.

## Creating a Spring-AI Client for Local Model

Now that our local model is running, let's set up the Spring-AI client:

In [2]:
import org.springframework.ai.chat.client.ChatClient
import org.springframework.ai.ollama.OllamaChatModel
import org.springframework.ai.ollama.api.OllamaApi
import org.springframework.ai.ollama.api.OllamaModel
import org.springframework.ai.ollama.api.OllamaOptions

val ollamaApi = OllamaApi()

val ollamaOptions = OllamaOptions.builder()
    .model("deepseek-r1")
    .temperature(0.7)
    .build()

val ollamaModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(ollamaOptions)
    .build()

val chatClient = ChatClient.create(ollamaModel)

Notice that we don't need any API keys — the model is already running on our device.
The process for creating a ChatModel is very similar to what we've done with OpenAI,
Anthropic, and other providers.

With everything set up, let's test our model with a simple query:

In [3]:
chatClient.prompt("tell me a joke about Kotlin").call().content()

<think>
Alright, so I need to come up with a joke about Kotlin. Hmm, where do I start? Well, first, I should think about what makes Kotlin unique or special compared to other programming languages.

I remember that Kotlin is often referred to as the "Kotlin language" but sometimes people jokingly call it "Java's little brother." Maybe I can use that in a joke.

Oh wait, there's also something about Kotlin being statically typed. That could be funny because of the way the word 'static' is used. Like maybe someone says they've been static for a while, meaning they're old or something. But that might not fit well.

Wait, another angle: Kotlin is known for its syntax being similar to Java but with some improvements. So maybe I can compare it to something else in a funny way. For example, comparing it to a more complex system like an operating system.

Oh! Maybe something about Kotlin being the OS of the future. That's a common phrase people use. So if someone says "Kotlin is the next OS," 

You'll notice the response includes a new section under the `<think>` tag.
This is the reasoning block where you can see how the model formed its response to your request.

Given the amount of information returned,
the model processed quite quickly — another advantage of local models.

## Streaming Responses with Local Models

Let's see how streaming works with local models,
allowing us to watch the reasoning process in real-time.

First, we'll need to add dependencies for working with `Flow`:

In [4]:
%useLatestDescriptors
%use coroutines
@file:DependsOn("org.jetbrains.kotlinx:kotlinx-coroutines-reactive:1.10.1")

Now, let's create a more complex prompt and stream the response:

In [5]:
import kotlinx.coroutines.reactive.asFlow

val streamResponse = chatClient
    .prompt(
        """
        Generate a response in the format of a fully valid HTML5 document, answering the following question:

        <Question>
        Create a colorful periodic table of Kotlin operators showing precedence levels.
        Include descriptions with examples of each operator in action.
        </Question>

        <Instructions>
        Your answer must include:
        1. The complete structure of an HTML document with <html>, <head> (including <title>) and <body> tags
        2. Semantic markup (h1, h2, p, ul/ol, etc.)
        3. Use tables or lists where appropriate to structure the data.
        4. Include minimalist CSS styles to improve readability.
        5. The page must be fully valid HTML5.

        Do not add explanations outside the HTML markup. The entire answer must be only in HTML format!
        </Instructions>
        """
    )
    .stream()
    .content()
    .asFlow()

val answer = StringBuilder()
runBlocking {
    streamResponse.collect {
        answer.append(it)
        print(it)
    }
}

<think>
Alright, so I need to figure out how to create a colorful periodic table of Kotlin operators that shows their precedence levels with examples. Let me break down the task and see how I can approach it.

First, I know that a periodic table of operators typically lists each operator along with its precedence level. In this case, since it's for Kotlin, I should list all the relevant operators in order from highest to lowest precedence.

I'll start by outlining the structure of an HTML document. It needs to have html, head, and body tags. The head will include a title and some CSS styles. The body will contain the content, which is the periodic table itself.

Next, I need to decide on the visual elements. Using tables makes sense because each operator can be a row with its precedence level and description. Maybe using a grid or table within the body could help organize the information neatly.

For styling, I want it to look clean and readable. I'll use a sans-serif font for readabil

The response consists of two parts: the reasoning and the final HTML answer.
Let's extract the HTML part and render it in our notebook:

In [6]:
val html = answer.toString().substringAfterLast("```html").substringBeforeLast("```")

HTML(html)

0,1,2
Operator,Precedence Level,Description and Example
`++`,9,Increment operator. Adds or subtracts one from variable.
`--`,9,
`+`,8,Addition operator. Adds two values.
`?`,1,


Local models offer an excellent alternative when you need privacy, offline capability, or want to avoid API costs.
With Spring-AI, the programming interface remains consistent whether you're using cloud-based or local models,
making it easy to switch between them as your needs change.

You can apply all the techniques from our previous tutorials (prompts, streaming, tools, structured outputs, advisors, RAG) with local models as well —
the same Spring-AI abstractions work across providers.