# Learning the basics with Ollama

[Ollama](https://ollama.com/) is an open-source application that allows you to download, run, and interact with LLMs on your own hardware. By running models locally, you maintain complete control over your data and can use LLMs without an internet connection. It also allows you to easily experiment with different models.



## Prerequisites
If you haven't done so already, [install Ollama](https://ollama.com/download) on your computer. You can then add any of the models available in the [Ollama library](https://ollama.com/library) using the `ollama pull` command. There are models of different sizes, indicated by the number of parameters (e.g. 2B, 7B, etc.) the model learned during its training. Larger models are usually more capable but also require more computational resources like RAM. Ollama's GitHub repo has some advice on [selecting models based on the available RAM](https://github.com/ollama/ollama#:~:text=You%20should%20have%20at%20least%208%20GB%20of%20RAM%20available%20to%20run%20the%207B%20models%2C%2016%20GB%20to%20run%20the%2013B%20models%2C%20and%2032%20GB%20to%20run%20the%2033B%20models.).

For the code examples in this notebook, we will be using Meta's Llama 3.2, which is a relatively small but capable model (3B parameters, 2GB in size). Download it with:

```sh
ollama pull llama3.2
```

Also, ensure you've followed the README instructions to set up your Python environment.

## Using `ollama run`
We will use Ollama's `run` command to ask the Llama model to tell us a kid-friendly joke, just like in the OpenAI notebook.

In [1]:
!ollama run llama3.2 "Tell me a silly joke for a kid."

[?2026h[?25l[1G⠋ [K[?25h[?2026l[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25hHere[?25l[?25h's[?25l[?25h one[?25l[?25h:

[?25l[?25hWhat[?25l[?25h do[?25l[?25h you[?25l[?25h call[?25l[?25h a[?25l[?25h group[?25l[?25h of[?25l[?25h cows[?25l[?25h playing[?25l[?25h instruments[?25l[?25h?

[?25l[?25hA[?25l[?25h moo[?25l[?25h-s[?25l[?25hical[?25l[?25h band[?25l[?25h![?25l[?25h

[?25l[?25h

## Raw JSON response using `curl`

Ollama creates an API service, which is available at `http://localhost:11434` by default. Let's use `curl` to make the same request and see the raw JSON response from the model.

In [2]:
%%bash --out curl_response

curl http://localhost:11434/api/generate -s -d '{
  "model": "llama3.2",
  "prompt": "Tell me a silly joke for a kid.",
  "stream": false
}'

By default, the `/api/generate` endpoint returns a stream of responses. Adding `"stream": false` in our request ensures we get a single JSON response.

In [3]:
import json

from rich import print as rich_print

data = json.loads(curl_response)  # noqa
rich_print(data)

We can extract the actual text from the JSON with:

In [4]:
rich_print(data["response"])

Without `"stream": false`, we would get a series of responses, then one final response with some extra data about the request.

In [5]:
%%bash --out curl_response

curl http://localhost:11434/api/generate -sd '{
  "model": "llama3.2",
  "prompt": "Tell me a silly joke for a kid."
}'


In [6]:
rich_print(curl_response)  # noqa

## Using the Python SDK

The same can be done using Ollama's Python SDK:

In [7]:
import ollama

In [8]:
response = ollama.generate(model="llama3.2", prompt="Tell me a silly joke for a kid.")
rich_print(response)

As we can see, the response is very similar to what we got using `curl` and the API is quite straightforward to use. We can also extract the actual text from the response.

In [9]:
print(response.response)

Here's one:

What do you call a group of cows playing instruments?

A moo-sical band!

I hope that made the kiddo giggle!


## Using the `llm` CLI tool and Python library

[`llm`](https://llm.datasette.io/) is an open-source CLI tool and Python library for interacting with LLMs, created by Simon Willison. It works with both local models and remote APIs (from OpenAI, Anthropic’s Claude, Google’s Gemini, etc.). It should already be installed in your environment along with the [`llm-ollama` plugin](https://github.com/taketwo/llm-ollama), which allows us to query any Ollama-installed models.

Let's use it to run the same prompt we've been running. The output should be similar to what we got from `ollama run`.

In [10]:
!llm -m llama3.2 "Tell me a silly joke for a kid."

Here's one:

What do you call a group of cows playing instruments?

A moo-sical band!

Kids love puns, and this one is sure to make them giggle!


### Using the Python library

We can also use the `llm` Python API to interact with the model.

In [11]:
import llm

model = llm.get_model("llama3.2")
response = model.prompt("Tell me a silly joke for a kid.")

The prompt will not be evaluated until you call `response.text()` or `print(response)`.

In [12]:
print(response)

Here's one that kids usually love:

What do you call a group of cows playing instruments?

A moo-sical band!

I hope that made you giggle!


However, it seems we cannot get the raw JSON response with this tool, unlike with the `ollama` Python SDK. According to [the docs](https://llm.datasette.io/en/stable/python-api.html#accessing-the-underlying-json), some model plugins make the JSON available through the `response.json()` method.

In [13]:
print(response.json())

None


## Conclusion

Ollama is indeed an easy-to-use tool, which gives us the ability to run LLMs locally. Pairing it with `llm` adds some powerful features, such as the ability to [log all prompts and responses to a SQLite database](https://llm.datasette.io/en/stable/logging.html). It also allows us to switch between local models and remote APIs if necessary, which may be quite useful when experimenting with various LLMs.