# Local Model

In this tutorial, we'll explore working with local LLMs on your own computer using Spring-AI and Kotlin.
While we've focused on remote providers like OpenAI and Anthropic in previous tutorials,
running models locally offers several distinct advantages.
Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

Let's explore how to set up and use local models with the same Spring-AI abstractions we've been learning throughout this series.

In [1]:
@file:DependsOn("org.springframework.ai:spring-ai-ollama-spring-boot-starter:1.0.0-M6")
@file:DependsOn("com.fasterxml.jackson.module:jackson-module-kotlin:2.18.2")

## Why use local models?

Local models may not match the power of cloud-based alternatives, but they offer compelling benefits:

- **Free to use** — no API costs or rate limits
- **Work offline** — no internet connection required
- **Privacy** — process sensitive data without sending it to third parties
- **Full control** — you determine model settings and deployment

Spring-AI supports local models through Ollama,
which makes running LLMs on your machine remarkably straightforward.

## Setting up Ollama

First,
visit [ollama.com](https://ollama.com/)
and install the application by following the simple instructions for your operating system.

Next,
explore the [Models page](https://ollama.com/search)
to find a model that fits your needs.
You'll see options ranging from small,
fast models to larger, more capable ones.
Consider your hardware specifications when choosing —
larger models require more RAM and GPU resources.

For this tutorial, I've selected `deepseek-r1`,
which demonstrates _"thinking"_ capabilities similar to reasoning modes in commercial models.

Let's download and start the model:

```bash
ollama run deepseek-r1
```

This command will download the `deepseek-r1:7b` model
(if you don't already have it) and start it.
The first run may take some time as it downloads the model files.

## Creating a Spring-AI Client for Local Model

Now that our local model is running, let's set up the Spring-AI client:

In [2]:
import org.springframework.ai.chat.client.ChatClient
import org.springframework.ai.ollama.OllamaChatModel
import org.springframework.ai.ollama.api.OllamaApi
import org.springframework.ai.ollama.api.OllamaModel
import org.springframework.ai.ollama.api.OllamaOptions

val ollamaApi = OllamaApi()

val ollamaOptions = OllamaOptions.builder()
    .model("deepseek-r1")
    .temperature(0.7)
    .build()

val ollamaModel = OllamaChatModel.builder()
    .ollamaApi(ollamaApi)
    .defaultOptions(ollamaOptions)
    .build()

val chatClient = ChatClient.create(ollamaModel)

Notice that we don't need any API keys — the model is already running on our device.
The process for creating a ChatModel is very similar to what we've done with OpenAI,
Anthropic, and other providers.

With everything set up, let's test our model with a simple query:

In [3]:
chatClient.prompt("tell me a joke about Kotlin").call().content()

<think>
Alright, the user asked for a joke about Kotlin. Hmm, I need to make sure it's appropriate and not offensive. Let me think of some Kotlin-related puns or wordplay.

Maybe something with "knotting" since that's a feature in Kotlin 1.3. Or perhaps something about programming terms like "null pointer" or "throw".

Wait, here's an idea: "Why don't developers ever get tired? Because they only work with null and errors!" That plays on the idea of common issues in development.

Does that make sense? It's a bit light-hearted and uses a play on words. I think it should be okay for the user.
</think>

Sure! Here's a joke about Kotlin:

Why don’t developers ever get tired?

Because they only work with null and errors!

😄

You'll notice the response includes a new section under the `<think>` tag.
This is the reasoning block where you can see how the model formed its response to your request.

Given the amount of information returned,
the model processed quite quickly — another advantage of local models.

## Streaming Responses with Local Models

Let's see how streaming works with local models,
allowing us to watch the reasoning process in real-time.

First, we'll need to add dependencies for working with `Flow`:

In [4]:
%useLatestDescriptors
%use coroutines
@file:DependsOn("org.jetbrains.kotlinx:kotlinx-coroutines-reactive:1.10.1")

Now, let's create a more complex prompt and stream the response:

In [9]:
import kotlinx.coroutines.reactive.asFlow

val streamResponse = chatClient
    .prompt(
        """
        Generate a response in the format of a fully valid HTML5 document, answering the following question:

        <Question>
        Create a colorful periodic table of Kotlin operators showing precedence levels.
        Include descriptions with examples of each operator in action.
        </Question>

        <Instructions>
        Your answer must include:
        1. The complete structure of an HTML document with <html>, <head> (including <title>) and <body> tags
        2. Semantic markup (h1, h2, p, ul/ol, etc.)
        3. Use tables or lists where appropriate to structure the data.
        4. Include minimalist CSS styles to improve readability.
        5. The page must be fully valid HTML5.

        Do not add explanations outside the HTML markup. The entire answer must be only in HTML format!
        </Instructions>
        """
    )
    .stream()
    .content()
    .asFlow()

val answer = StringBuilder()
runBlocking {
    streamResponse.collect {
        answer.append(it)
        print(it)
    }
}

<think>
Alright, so I need to create a colorful periodic table of Kotlin operators with precedence levels and examples. Let me break down how I approached this.

First, understanding the user's requirements was key. They want an HTML5 document that's visually appealing, uses semantic tags, and includes operator precedence information along with examples. The structure should be clear, so using tables or lists makes sense for organizing the operators.

I started by outlining the sections: a header, a table of operators with their precedence levels, descriptions, examples, and a note on associativity. Using semantic HTML tags like h1, h2, p, and ul will help with structure and SEO.

Next, I considered the styling. Minimalist CSS is important for readability. I decided on using a clean font, alternating row colors for clarity, and padding to make the table neat. Including a subtle shadow on the table adds depth without being distracting.

For the operators themselves, each needs a precede

The response consists of two parts: the reasoning and the final HTML answer.
Let's extract the HTML part and render it in our notebook:

In [10]:
val html = answer.toString().substringAfter("```html").substringBeforeLast("```")

HTML(html)

Unnamed: 0,Precedence Level,Unnamed: 2,Unnamed: 3,Unnamed: 4
,10Special Operators,.::,Naming member of type in qualified expression.,Example: ClassA.::method()
,9Member Operators,..,"Accessing members of type (class, interface, enum, member, nested class)",Example: obj..property
,8Cast Operator,(type),Casting value to specified type,Example: (Int) 5.5
,7Concatenation,$$,String concatenation,"Example: ""Hello"" $$ ""World"""
,6Comparison,",!",Component-wise comparison (for arrays),"Example: arr1 ,! arr2"
,5Logical Operators,&&,Logical AND,Example: a && b
,4Increment/Decrement,++,Increment value by one,Example: i++
,3Multiplication,"*,*",Multiplication and repeated concatenation,"Example: 2 * 3 or ""a"" * 3"
,2Addition/Subtraction,+-,Arithmetic addition and subtraction,"Example: 5 + 3 or ""5"" - ""2"">"
,1Unary Operators,-!,Negation and logical NOT,Example: -5 or !b

Unnamed: 0,Unnamed: 1,Unnamed: 2
Operator,Precedence Level,Nesting Level

0,1,2,3,4
,Precedence Level,Operator,Description,Example
,,,,
10.0,.::,Naming member of type in qualified expression.,Example: ClassA.::method(),
9.0,..,"Accessing members of type (class, interface, enum, member, nested class)",Example: obj..property,
8.0,(type),Casting value to specified type,Example: (Int) 5.5,
7.0,$$,String concatenation,"Example: ""Hello"" $$ ""World""",
6.0,",!",Component-wise comparison (for arrays),"Example: a, b = [1,2], [3,4]",
5.0,-!,Negation and logical NOT,Example: -5 or !b,
4.0,x,Multiplication operator,Example: 3 * 4,
3.0,+!,Increment operator,Example: x+,


Local models offer an excellent alternative when you need privacy, offline capability, or want to avoid API costs.
With Spring-AI, the programming interface remains consistent whether you're using cloud-based or local models,
making it easy to switch between them as your needs change.

You can apply all the techniques from our previous tutorials (prompts, streaming, tools, structured outputs, advisors, RAG) with local models as well —
the same Spring-AI abstractions work across providers.