## LiteLLM, Ollama and Autogen

**LiteLLM** provides a simplified interface to interact with language models, offering synchronous and asynchronous text generation capabilities, and supporting streaming outputs for efficient generation of longer texts.

**Ollama** lets you run, manage, and interact with a variety of LLMs locally, allowing for enhanced control, customization, and application of language models without the necessity for constant internet connectivity once models are downloaded and stored locally.

Being able to leverage all these together, will help reduce cost as you work with autogen, as you don't need to make API calls to a provider. 
Also a nice bonus, no API keys.

---
Links:
- [Ollama](https://ollama.ai/)
- [LiteLLM](https://litellm.ai/)
- [Autogen opensource Capability - PR](https://github.com/microsoft/autogen/pull/95)

Here all the different models Ollama supports, so you can use this and play around will all different model types:
- [Link](https://ollama.ai/library)


Run the following command in the terminal:
>  litellm --model ollama/llama2 --api_base http://localhost:11434/

In [None]:
from litellm import completion

response = completion(
    model="ollama/mistral", 
    messages=[{ "content": "Tell me a story about llamas.", "role": "user"}], 
    api_base="http://localhost:11434"
)
print(response)

In [None]:
import os
import shutil
from pathlib import Path
import autogen
from autogen import AssistantAgent, UserProxyAgent

def clear_cache(clear_previous_work=True):
    # Function for cleaning up cash to
    # avoid potential spill of conversation
    # between models

    # Should be run before and after each chat initialization
    if os.path.exists('.cache') and os.path.isdir('.cache'):
        print('deleting cache...')
        shutil.rmtree('.cache')
clear_cache()

config_list = [
    {
        # "model": "ollama/mistral",
        "model": "ollama/llama2",
        "api_base": "http://0.0.0.0:8000",
        "api_type": "litellm"
    }
]

import pprint as pp
pp.pprint(config_list)

coding_assistant = AssistantAgent(
    name="coding_assistant",
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": config_list,
        "temperature": 0.8,
    },
)

coding_runner = UserProxyAgent(
    name="coding_runner",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=20,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False
    },
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": config_list,
        "temperature": 0.4,
    },
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key.")
# clear_cache(clear_previous_work=False)

### Leveraging Multiple Models

> litellm --model ollama/llama2 --api_base http://localhost:11434/ \
> litellm --model codellama:7b-code --api_base http://localhost:11434/ --port 8001

In [1]:
import os
import shutil
from pathlib import Path
import autogen
from autogen import AssistantAgent, UserProxyAgent

def clear_cache(clear_previous_work=True):
    # Function for cleaning up cash to
    # avoid potential spill of conversation
    # between models

    # Should be run before and after each chat initialization
    if os.path.exists('.cache') and os.path.isdir('.cache'):
        print('deleting cache...')
        shutil.rmtree('.cache')
clear_cache()

proxy_config_list = [
    {
        "model": "ollama/mistral",
        # "model": "ollama/llama2",
        "api_base": "http://0.0.0.0:8000",
        "api_type": "litellm"
    }
]

coding_config_list = [
    {
        # "model": "ollama/mistral",
        "model": "ollama/codellama",
        "api_base": "http://0.0.0.0:8001",
        "api_type": "litellm"
    }
]


import pprint as pp
pp.pprint(proxy_config_list)
pp.pprint(coding_config_list)

coding_assistant = AssistantAgent(
    name="coding_assistant",
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": coding_config_list,
        "temperature": 0.1,
    },
)

coding_runner = UserProxyAgent(
    name="coding_runner",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=20,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False
    },
    llm_config={
        "request_timeout": 1000,
        "seed": 42,
        "config_list": proxy_config_list,
        "temperature": 0.1,
    },
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key. Write python of SH code to achieve the problem")
# clear_cache(clear_previous_work=False)

deleting cache...
[{'api_base': 'http://0.0.0.0:8000',
  'api_type': 'litellm',
  'model': 'ollama/mistral'}]
[{'api_base': 'http://0.0.0.0:8001',
  'api_type': 'litellm',
  'model': 'ollama/codellama'}]
[33mcoding_runner[0m (to coding_assistant):

Calculate the percentage gain YTD for Berkshire Hathaway stock and plot it as price.png. Don't use any approach that requires an API key.

--------------------------------------------------------------------------------
[33mcoding_assistant[0m (to coding_runner):

 To calculate the percentage gain YTD for Berkshire Hathaway stock, you can follow these steps:
1. Go to the website of a financial news or information service (e.g., Google Finance, Yahoo Finance) and enter "Berkshire Hathaway" in the search bar.
2. Click on the "Price" tab and find the current price of Berkshire Hathaway stock. This is the price at which you will later compare with the previous closing price to calculate the percentage gain YTD.
3. Now, go back to the financi

KeyboardInterrupt: 