# Ollama

[Ollama](https://ollama.com/) is a local inference engine that enables you to run open-weight LLMs in your environment. It has native support for a large number of models such as Google's Gemma, Meta's Llama 2/3/3.1, Microsoft's Phi 3, Mistral.AI's Mistral/Mixtral, and Cohere's Command R models.

Note: Previously, to use Ollama with AutoGen you required LiteLLM. Now it can be used directly and supports tool calling.

## Features

When using this Ollama client class, messages are tailored to accommodate the specific requirements of Ollama's API and this includes message role sequences, support for function/tool calling, and token usage.

## Installing Ollama

For Mac and Windows, [download Ollama](https://ollama.com/download).

For Linux:

```bash
curl -fsSL https://ollama.com/install.sh | sh
```

## Downloading models for Ollama

Ollama has a library of models to choose from, see them [here](https://ollama.com/library).

Before you can use a model, you need to download it (using the name of the model from the library):

```bash
ollama pull llama3.1
```

To view the models you have downloaded and can use:

```bash
ollama list
```

## Getting started with AutoGen and Ollama

When installing AutoGen, you need to install the `pyautogen` package with the Ollama library.

``` bash
pip install pyautogen[ollama]
```

See the sample `OAI_CONFIG_LIST` below showing how the Ollama client class is used by specifying the `api_type` as `ollama`.

```python
[
    {
        "model": "llama3.1",
        "api_type": "ollama"
    },
    {
        "model": "llama3.1:8b-instruct-q6_K",
        "api_type": "ollama"
    },
    {
        "model": "mistral-nemo",
        "api_type": "ollama"
    }
]
```

If you need to specify the URL for your Ollama install, use the `client_host` key in your config as per the below example:

```python
[
    {
        "model": "llama3.1",
        "api_type": "ollama",
        "client_host": "http://192.168.0.1:11434"
    }
]
```

## API parameters

The following Ollama parameters can be added to your config. See [this link](https://github.com/ollama/ollama/blob/main/docs/api.md#parameters) for further information on them.

- num_predict (integer): -1 is infinite, -2 is fill context, 128 is default
- repeat_penalty (float)
- seed (integer)
- stream (boolean)
- temperature (float)
- top_k (int)
- top_p (float)

Example:
```python
[
    {
        "model": "llama3.1:instruct",
        "api_type": "ollama",
        "num_predict": -1,
        "repeat_penalty": 1.1,
        "seed": 42,
        "stream": False,
        "temperature": 1,
        "top_k": 50,
        "top_p": 0.8
    }
]
```

## Two-Agent Coding Example

In this example, we run a two-agent chat with an AssistantAgent (primarily a coding agent) to generate code to count the number of prime numbers between 1 and 10,000 and then it will be executed.

We'll use Meta's Llama 3.1 model which is suitable for coding.

In this example we will specify the URL for the Ollama installation using `client_host`.

In [1]:
config_list = [
    {
        # Let's choose the Meta's Llama 3.1 model (model names must match Ollama exactly)
        "model": "llama3.1",
        # We specify the API Type as 'ollama' so it uses the Ollama client class
        "api_type": "ollama",
        "stream": False,
        "client_host": "http://192.168.0.1:11434",
    }
]

Importantly, we have tweaked the system message so that the model doesn't return the termination keyword, which we've changed to FINISH, with the code block.

In [2]:
from pathlib import Path

from autogen import AssistantAgent, UserProxyAgent
from autogen.coding import LocalCommandLineCodeExecutor

# Setting up the code executor
workdir = Path("coding")
workdir.mkdir(exist_ok=True)
code_executor = LocalCommandLineCodeExecutor(work_dir=workdir)

# Setting up the agents

# The UserProxyAgent will execute the code that the AssistantAgent provides
user_proxy_agent = UserProxyAgent(
    name="User",
    code_execution_config={"executor": code_executor},
    is_termination_msg=lambda msg: "FINISH" in msg.get("content"),
)

system_message = """You are a helpful AI assistant who writes code and the user
executes it. Solve tasks using your python coding skills.
In the following cases, suggest python code (in a python coding block) for the
user to execute. When using code, you must indicate the script type in the code block.
You only need to create one working sample.
Do not suggest incomplete code which requires users to modify it.
Don't use a code block if it's not intended to be executed by the user. Don't
include multiple code blocks in one response. Do not ask users to copy and
paste the result. Instead, use 'print' function for the output when relevant.
Check the execution result returned by the user.

If the result indicates there is an error, fix the error.

IMPORTANT: If it has executed successfully, ONLY output 'FINISH'."""

# The AssistantAgent, using the Ollama config, will take the coding request and return code
assistant_agent = AssistantAgent(
    name="Ollama Assistant",
    system_message=system_message,
    llm_config={"config_list": config_list},
)

  from .autonotebook import tqdm as notebook_tqdm


We can now start the chat.

In [3]:
# Start the chat, with the UserProxyAgent asking the AssistantAgent the message
chat_result = user_proxy_agent.initiate_chat(
    assistant_agent,
    message="Provide code to count the number of prime numbers from 1 to 10000.",
)

[33mUser[0m (to Ollama Assistant):

Provide code to count the number of prime numbers from 1 to 10000.

--------------------------------------------------------------------------------
[33mOllama Assistant[0m (to User):

```python
def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True


def count_primes():
    """Count the number of prime numbers from 1 to 10000."""
    count = sum(1 for num in range(1, 10001) if is_prime(num))
    print(count)


# Execute the function
count_primes()
```

--------------------------------------------------------------------------------
[31m
>>>>>>>> NO HUMAN INPUT RECEIVED.[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[31m
>>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...[0m
[33mUser[0m (to Ollama Assistant):

exitcode: 0 (execution succeeded)
Code output: 1229


----------------------------------

## Tool Call Example

In this example, instead of writing code, we will have an agent assist with some trip planning using multiple tool calling.

Again, we'll use Meta's versatile Llama 3.1.

In [4]:
import json
from typing import Literal

from typing_extensions import Annotated

import autogen

config_list = [
    {
        # Let's choose the Meta's Llama 3.1 model (model names must match Ollama exactly)
        "model": "llama3.1",
        "api_type": "ollama",
        "stream": False,
        "client_host": "http://192.168.0.1:11434",
    }
]

We'll create our agents

In [5]:
# Create the agent for tool calling
chatbot = autogen.AssistantAgent(
    name="chatbot",
    system_message="""For currency exchange and weather forecasting tasks,
        only use the functions you have been provided with.
        Output 'HAVE FUN!' when an answer has been provided.""",
    llm_config={"config_list": config_list},
)

# Note that we have changed the termination string to be "HAVE FUN!"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda x: x.get("content", "") and "HAVE FUN!" in x.get("content", ""),
    human_input_mode="NEVER",
    max_consecutive_auto_reply=1,
)

Create and register our functions (tools). See the [tutorial chapter on tool use](/docs/tutorial/tool-use) 
for more information.

In [6]:
# Currency Exchange function

CurrencySymbol = Literal["USD", "EUR"]

# Define our function that we expect to call


def exchange_rate(base_currency: CurrencySymbol, quote_currency: CurrencySymbol) -> float:
    if base_currency == quote_currency:
        return 1.0
    elif base_currency == "USD" and quote_currency == "EUR":
        return 1 / 1.1
    elif base_currency == "EUR" and quote_currency == "USD":
        return 1.1
    else:
        raise ValueError(f"Unknown currencies {base_currency}, {quote_currency}")


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Currency exchange calculator.")
def currency_calculator(
    base_amount: Annotated[float, "Amount of currency in base_currency"],
    base_currency: Annotated[CurrencySymbol, "Base currency"] = "USD",
    quote_currency: Annotated[CurrencySymbol, "Quote currency"] = "EUR",
) -> str:
    quote_amount = exchange_rate(base_currency, quote_currency) * base_amount
    return f"{format(quote_amount, '.2f')} {quote_currency}"


# Weather function


# Example function to make available to model
def get_current_weather(location, unit="fahrenheit"):
    """Get the weather for some location"""
    if "chicago" in location.lower():
        return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit})
    elif "new york" in location.lower():
        return json.dumps({"location": "New York", "temperature": "11", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Weather forecast for US cities.")
def weather_forecast(
    location: Annotated[str, "City name"],
) -> str:
    weather_details = get_current_weather(location=location)
    weather = json.loads(weather_details)
    return f"{weather['location']} will be {weather['temperature']} degrees {weather['unit']}"

And run it!

In [7]:
# start the conversation
res = user_proxy.initiate_chat(
    chatbot,
    message="What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.",
    summary_method="reflection_with_llm",
)

print(f"LLM SUMMARY: {res.summary['content']}")

[33muser_proxy[0m (to chatbot):

What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.

--------------------------------------------------------------------------------
[33mchatbot[0m (to user_proxy):


[32m***** Suggested tool call (ollama_func_2863): weather_forecast *****[0m
Arguments: 
{"location": "New York"}
[32m********************************************************************[0m
[32m***** Suggested tool call (ollama_func_2864): currency_calculator *****[0m
Arguments: 
{"base_amount": 123.45, "quote_currency": "USD", "base_currency": "EUR"}
[32m***********************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION weather_forecast...[0m
[35m
>>>>>>>> EXECUTING FUNCTION currency_calculator...[0m
[33muser_proxy[0m (to chatbot):

[33muser_proxy[0m 

Great, we can see that Llama 3.1 has helped choose the right functions, their parameters, and then summarised them for us.