# ChatOllama

[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.

Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

It optimizes setup and configuration details, including GPU usage.

For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).

## Overview
### Integration details

| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/ollama) | Package downloads | Package latest |
| :--- | :--- | :---: | :---: |  :---: | :---: | :---: |
| [ChatOllama](https://python.langchain.com/v0.2/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html) | [langchain-ollama](https://python.langchain.com/v0.2/api_reference/ollama/index.html) | ✅ | ❌ | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-ollama?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-ollama?style=flat-square&label=%20) |

### Model features
| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |
| :---: |:----------------------------------------------------:| :---: | :---: |  :---: | :---: | :---: | :---: | :---: | :---: |
| ✅ |                          ✅                           | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |

## Setup

First, follow [these instructions](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) to set up and run a local Ollama instance:

* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux aka WSL, macOS, and Linux)
    * macOS users can install via Homebrew with `brew install ollama` and start with `brew services start ollama`
* Fetch available LLM model via `ollama pull <name-of-model>`
    * View a list of available models via the [model library](https://ollama.ai/library)
    * e.g., `ollama pull llama3`
* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.

> On Mac, the models will be download to `~/.ollama/models`
>
> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`

* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)
* To view all pulled models, use `ollama list`
* To chat directly with a model from the command line, use `ollama run <name-of-model>`
* View the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) for more commands. You can run `ollama help` in the terminal to see available commands.


To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:

In [None]:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

### Installation

The LangChain Ollama integration lives in the `langchain-ollama` package:

In [1]:
%pip install -qU langchain-ollama

Make sure you're using the latest Ollama version for structured outputs. Update by running:

In [2]:
%pip install -U ollama



## Instantiation

Now we can instantiate our model object and generate chat completions:


## Invocation

In [12]:
!curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 13281    0 13281    0     0  75953      0 --:--:-- --:--:-- --:--:-- 76327
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [18]:
!nohup ollama serve > ollama.log 2>&1 &

In [19]:
!ollama pull llama3:latest


[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h

In [20]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3:latest",
    temperature=0,
    # other params...
)

In [22]:
# Read the message from a file called 'test.txt'
with open("/content/Porsche+Macan+July+5+2018+(1).txt", "r", encoding="utf-8") as file:
    human_message = file.read().strip()

# Construct the messages list
messages = [
    (
        "system",
        "You are a helpful assistant that summarizes the message in short.",
    ),
    ("human", human_message),
]

# Invoke the LLM
ai_msg = llm.invoke(messages)

# Output the result
print(ai_msg)


content="Here's a summary:\n\nThe Porsche Macan combines performance and practicality, offering an exceptional driving experience. Despite its SUV design, it delivers sports car-like performance. Leasing options are available starting at 3.99% with conditions applying." additional_kwargs={} response_metadata={'model': 'llama3:latest', 'created_at': '2025-07-23T20:44:03.669285008Z', 'done': True, 'done_reason': 'stop', 'total_duration': 116471834530, 'load_duration': 83970274, 'prompt_eval_count': 142, 'prompt_eval_duration': 65212206617, 'eval_count': 50, 'eval_duration': 51173972390, 'model_name': 'llama3:latest'} id='run--7adb2dc4-aff0-43f6-96bd-9f52a2efd5d2-0' usage_metadata={'input_tokens': 142, 'output_tokens': 50, 'total_tokens': 192}


In [24]:
!ollama -v

ollama version is 0.9.6


In [23]:
print(ai_msg.content)

Here's a summary:

The Porsche Macan combines performance and practicality, offering an exceptional driving experience. Despite its SUV design, it delivers sports car-like performance. Leasing options are available starting at 3.99% with conditions applying.


## Chaining

We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:

In [25]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": ai_msg.content,
    }
)

AIMessage(content='Hier ist eine Zusammenfassung auf Deutsch:\n\nDer Porsche Macan kombiniert Leistung und Praktikabilität, um eine außergewöhnliche Fahrerlebnis zu bieten. Trotz seines SUV-Designs liefert er ein Sportwagenähnliches Leistungsvermögen. Mietoptionen sind ab 3,99% verfügbar, unterliegen jedoch bestimmten Bedingungen.', additional_kwargs={}, response_metadata={'model': 'llama3:latest', 'created_at': '2025-07-23T20:46:09.274794598Z', 'done': True, 'done_reason': 'stop', 'total_duration': 125364917642, 'load_duration': 73313634, 'prompt_eval_count': 75, 'prompt_eval_duration': 29169387564, 'eval_count': 95, 'eval_duration': 96120246844, 'model_name': 'llama3:latest'}, id='run--23cb91f6-a28f-42d3-ae0e-c8a84285968a-0', usage_metadata={'input_tokens': 75, 'output_tokens': 95, 'total_tokens': 170})