## LiteLLM, Ollama and Autogen

**LiteLLM** provides a simplified interface to interact with language models, offering synchronous and asynchronous text generation capabilities, and supporting streaming outputs for efficient generation of longer texts.

**Ollama** lets you run, manage, and interact with a variety of LLMs locally, allowing for enhanced control, customization, and application of language models without the necessity for constant internet connectivity once models are downloaded and stored locally.

Being able to leverage all these together, will help reduce cost as you work with autogen, as you don't need to make API calls to a provider. 
Also a nice bonus, no API keys.

---
Links:
- [Ollama](https://ollama.ai/)
- [LiteLLM](https://litellm.ai/)
- [Autogen opensource Capability - PR](https://github.com/microsoft/autogen/pull/95)

Here all the different models Ollama supports, so you can use this and play around will all different model types:
- [Link](https://ollama.ai/library)


Run the following command in the terminal:
>  litellm --model ollama/llama2 --api_base http://localhost:11434/

In [2]:
from litellm import completion

response = completion(
    model="ollama/llama2", 
    messages=[{ "content": "Tell me a story about llamas.", "role": "user"}], 
    api_base="http://localhost:11434"
)
print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": " Of course! Here's a fun story about llamas for you:\n\nOnce upon a time, in a small village nestled in the Andes mountains, there lived a group of friendly llamas. These llamas were known far and wide for their kind hearts and playful antics.\n\nOne sunny day, a young llama named Luna decided to explore the nearby forest. She wandered deeper and deeper into the woods until she came across a clearing filled with the most beautiful wildflowers. Luna was so mesmerized by their colors that she didn't notice the tall trees closing in around her.\n\nSuddenly, she heard a rustling in the bushes. She turned to see a mischievous fox peeking out from behind a fern. The fox had a sly grin on his face and Luna knew he was up to something.\n\n\"Hello there, little llama,\" said the fox in a smooth voice. \"What are you doing so far from home?\"\n\nLuna hesit

In [1]:
from litellm import completion

response = completion(
    model="ollama/mistral", 
    messages=[{ "content": "Tell me a story about llamas.", "role": "user"}], 
    api_base="http://localhost:11434"
)
print(response)

{
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "\nOnce upon a time, in the high Andes Mountains of South America, there lived a group of llamas. Llamas are known for their gentle and friendly nature, and these animals were no exception. They spent their days grazing on the lush grasses that grew in the mountain meadows, taking leisurely naps in the warm sun, and playing with one another.\n\nOne day, a group of hikers came through the area. The hikers were amazed by the beauty of the landscape, and they were eager to learn more about the animals that called this place home. They approached the llamas, hoping to get a closer look. To their surprise, the llamas did not run away, but instead came closer to investigate the newcomers.\n\nThe hikers were delighted by the friendly demeanor of the llamas, and they spent hours talking with them and learning about their way of life. They discovered that 

In [None]:
import os
import shutil
from pathlib import Path
import autogen
from autogen import AssistantAgent, UserProxyAgent

def clear_cache(clear_previous_work=True):
    # Function for cleaning up cash to
    # avoid potential spill of conversation
    # between models

    # Should be run before and after each chat initialization
    if os.path.exists('.cache') and os.path.isdir('.cache'):
        print('deleting cache...')
        shutil.rmtree('.cache')
clear_cache()

config_list = [
    {
        # "model": "ollama/mistral",
        "model": "ollama/llama2",
        "api_base": "http://0.0.0.0:8000",
        "api_type": "litellm"
    }
]

import pprint as pp
pp.pprint(config_list)

coding_assistant = AssistantAgent(
    name="coding_assistant",
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": config_list,
        "temperature": 0.8,
    },
)

coding_runner = UserProxyAgent(
    name="coding_runner",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=20,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False
    },
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": config_list,
        "temperature": 0.4,
    },
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key.")
# clear_cache(clear_previous_work=False)

### Leveraging Multiple Models

> litellm --model ollama/llama2 --api_base http://localhost:11434/ \
> litellm --model codellama:7b-code --api_base http://localhost:11434/ --port 8001

In [1]:
import os
import shutil
from pathlib import Path
import autogen
from autogen import AssistantAgent, UserProxyAgent

def clear_cache(clear_previous_work=True):
    # Function for cleaning up cash to
    # avoid potential spill of conversation
    # between models

    # Should be run before and after each chat initialization
    if os.path.exists('.cache') and os.path.isdir('.cache'):
        print('deleting cache...')
        shutil.rmtree('.cache')
clear_cache()

proxy_config_list = [
    {
        "model": "ollama/mistral",
        # "model": "ollama/llama2",
        "api_base": "http://0.0.0.0:8000",
        "api_type": "litellm"
    }
]

coding_config_list = [
    {
        # "model": "ollama/mistral",
        "model": "ollama/codellama",
        "api_base": "http://0.0.0.0:8001",
        "api_type": "litellm"
    }
]


import pprint as pp
pp.pprint(proxy_config_list)
pp.pprint(coding_config_list)

coding_assistant = AssistantAgent(
    name="coding_assistant",
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": coding_config_list,
        "temperature": 0.1,
    },
)

coding_runner = UserProxyAgent(
    name="coding_runner",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=20,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False
    },
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": proxy_config_list,
        "temperature": 0.1,
    },
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key. Write python of SH code to achieve the problem")
# clear_cache(clear_previous_work=False)

deleting cache...
[{'api_base': 'http://0.0.0.0:8000',
  'api_type': 'litellm',
  'model': 'ollama/mistral'}]
[{'api_base': 'http://0.0.0.0:8001',
  'api_type': 'litellm',
  'model': 'ollama/codellama'}]
[33mcoding_runner[0m (to coding_assistant):

Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key. Write python of SH code to achieve the problem

--------------------------------------------------------------------------------


KeyboardInterrupt: 