# Ollama - pure, Langchain, litellm, LMStudio

Examples, how-tos

- call via Python API: OpenAI, local models (Ollama, LM Studio)
- call via REST: OpenAI, local models (Ollama)

see
- https://litellm.vercel.app/docs
- https://litellm.vercel.app/docs/tutorials/first_playground

In [None]:
#!uv pip install jupyterlab==4.2.1 langgraph==0.1.15 langchain-ollama==0.1.1 langchain-openai==0.1.19 langsmith==0.1.93 langchainhub==0.1.20 litellm 'litellm[proxy]'

In [None]:
#!uv pip freeze > requirements.txt 

In [1]:
# Warning control
import warnings
#warnings.simplefilter("ignore", category=Warning)
warnings.filterwarnings("ignore")

In [10]:
from IPython.display import Image, Markdown,display
import json
import os
import shutil
import psutil
import re
import requests
from getpass import getpass
import openai
from os import environ

IN_NOTEBOOK = any(["jupyter-lab" in i for i in psutil.Process().parent().cmdline()])
if IN_NOTEBOOK:
  CREDS = json.loads(getpass("Secrets (JSON string): "))
  os.environ['CREDS'] = json.dumps(CREDS)
  CREDS = json.loads(os.getenv('CREDS'))

if environ.get('OPENAI_API_KEY') is None:
    print('Environment variable not found. Setting values...')
    os.environ["OPENAI_API_KEY"] = CREDS['OpenAI']['v1']['credential'] 
    os.environ["TOGETHERAI_API_KEY"] = CREDS['together-ai']['key']['credential']
    os.environ['ANTHROPIC_API_KEY'] = CREDS['anthropic']['key']['credential']
    os.environ["SERPER_API_KEY"] = CREDS['serper_dev']['key']['credential']
    #openai.api_key = CREDS['OpenAI']['v2']['credential'] 

Secrets (JSON string):  ········


In [None]:
# Get list of available models
client = openai.OpenAI()
model_list = client.models.list()
for model in model_list:
  print(model.id)

# Ollama via Langchain

- Simple test for Ollama: Uselocal Llama3.1 model
- llm is defined by factory 'get_llm' - can easilly switched between Ollama and OpenAI

In [12]:
from langchain_ollama import ChatOllama
from langchain_ollama import OllamaEmbeddings
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

def get_llm(llm_type):
    if llm_type == "ollama":
        return ChatOllama(model="llama3.1", temperature=0)
    else:
        return ChatOpenAI(temperature=0, model="gpt-4o-mini")

def get_embeddings(embedding_type):
    if embedding_type == "ollama":
        return OllamaEmbeddings(model="llama3.1:8b")
    else:
        return OpenAIEmbeddings()

In [13]:
llm = get_llm('ollama')
result = llm.invoke(
    "Question: What do dogs like to eat?"
    "Answer: Let's think step by step."
)
print(result.content)
print(result)

Let's break down what dogs like to eat, step by step.

**Step 1: Protein is a Must**
Dogs are carnivores, which means they primarily thrive on protein-rich foods. Their natural diet consists of meat, bones, and organs from animals. So, we can conclude that dogs love to eat protein-rich foods.

**Step 2: Meat and Poultry**
Within the category of protein-rich foods, dogs tend to enjoy meat and poultry products like chicken, beef, lamb, and pork. These are common ingredients in dog food and treats.

**Step 3: Variety is Key**
While dogs have a strong preference for meat-based foods, they also appreciate variety in their diet. This can include fruits, vegetables, and grains, but these should not be the primary components of their meals.

**Step 4: Avoid Human Foods (Mostly)**
While it's tempting to share our snacks with our furry friends, many human foods are toxic or unhealthy for dogs. For example, chocolate, onions, garlic, grapes, and raisins can be poisonous to dogs. It's best to stic

## Function calling

In [14]:
from typing import List

def validate_user(user_id: int, addresses: List) -> bool:
    """Validate user using historical addresses. 

        Args:
        user_id: (int) the user ID.
        addresses: Previous addresses.
    """
    return True

llm = get_llm('ollama')
llm_with_tools = llm.bind_tools([validate_user])

result = llm_with_tools. invoke(
    "Could you validate user 123? They previously lived at "
    "123 Fake St in Boston MA and 234 Pretend Boulevard in "
    "Houston TX."
)
print(result.tool_calls)
#print(result.content)
#print(result)

[{'name': 'validate_user', 'args': {'addresses': '[{"street":"Fake St","city":"Boston","state":"MA"},{"street":"Pretend Boulevard","city":"Houston","state":"TX"}]', 'user_id': 123}, 'id': 'f98e1893-c9f0-4eb7-9676-3c46ad5737e0', 'type': 'tool_call'}]


# LiteLLM

In [6]:
from litellm import completion
import openai

# Call OpenAI
Requirement: OPENAI key availabe (see cell above)

In [7]:
response = completion(
  model="gpt-4o-mini",
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)
print(response.choices[0].message.content)
print(response)

Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?
ModelResponse(id='chatcmpl-9s8bvZBMevwfAnV4UbXXtS3eX56qW', choices=[Choices(finish_reason='stop', index=0, message=Message(content="Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?", role='assistant', tool_calls=None, function_call=None))], created=1722689839, model='gpt-4o-mini-2024-07-18', object='chat.completion', system_fingerprint='fp_0f03d4f0ee', usage=Usage(completion_tokens=30, prompt_tokens=13, total_tokens=43), service_tier=None)


# Call Ollama
Requirement: Ollama running

In [8]:
response = completion(
  # api_base="http://localhost:11434",   # seems to work even without this parameter!
  model="ollama/llama3.1:8b", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)
print(response.choices[0].message.content)
print(response)

I'm just a computer program, so I don't have feelings or emotions like humans do. However, I'm functioning properly and ready to help with any questions or tasks you may have! How about you? How's your day going?
ModelResponse(id='chatcmpl-0b92b0e6-8ddb-4324-9b62-1770872b411b', choices=[Choices(finish_reason='stop', index=0, message=Message(content="I'm just a computer program, so I don't have feelings or emotions like humans do. However, I'm functioning properly and ready to help with any questions or tasks you may have! How about you? How's your day going?", role='assistant', tool_calls=None, function_call=None))], created=1722689891, model='ollama/llama3.1:8b', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=19, completion_tokens=49, total_tokens=68))


# Call LM Studio

Requirement: LM Studio and server running

In [None]:
# see https://litellm.vercel.app/docs/providers/openai_compatible
response = completion(
  api_base="http://localhost:1234/v1",
  model="openai/just-a-dummy-model", # prefix 'openai/' is required! model name is currently unused in LM Studio
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)
print(response.choices[0].message.content)
print(response)

# Call Ollama via litellm Proxy

**The proxy routes all requests (eg. to ChatGPT or Anthropic) to Ollama!**

Requirement: Type in Terminal:
- `source myenv/bin/activate ` # if in virtualenv
- `litellm --model ollama/mixtral:8x7b`

## Step 1: Use OpenAI API and route to proxy (Mixtral)

In [15]:
client = openai.OpenAI(base_url="http://0.0.0.0:4000", api_key="anything")
#client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [{"role": "user", "content": "this is a test request, write a short poem"}]
)
print(response)

ChatCompletion(id='chatcmpl-f0477143-332e-43e4-bbac-060779405cd6', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=" In the heart of the night, under stars' gentle light,\nA whispering breeze sings a lullaby so bright.\nThe moon in the sky, with its glow so pure,\nWatches over the world, in tranquility's allure.\n\nIn this quiet moment, let your worries cease,\nLet the beauty of nature bring you peace.\nTake a deep breath, and just be,\nFor life is but a fleeting decree.", role='assistant', function_call=None, tool_calls=None))], created=1711466425, model='ollama/mixtral:8x7b', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=102, prompt_tokens=20, total_tokens=122))


## Step 2: Use OpenAI API without proxy (same code, but no base_url)

In [11]:
#client = openai.OpenAI(base_url="http://0.0.0.0:8000", api_key="anything")
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [{"role": "user", "content": "this is a test request, write a short poem"}]
)
print(response)

ChatCompletion(id='chatcmpl-972lbLZCHmYH0sDV6x49jeexxba07', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="In the quiet of the night,\nStars glisten with soft light.\nWhispers of the wind so light,\nSing a song of pure delight.\n\nNature's beauty all around,\nEvery sight and every sound.\nIn this moment, we are bound,\nTo the wonders we have found.\n\nSo let us pause and take it in,\nThis world of magic, free from sin.\nFor in these moments, we begin,\nTo see the beauty deep within.", role='assistant', function_call=None, tool_calls=None))], created=1711465959, model='gpt-3.5-turbo-0125', object='chat.completion', system_fingerprint='fp_3bc1b5746c', usage=CompletionUsage(completion_tokens=89, prompt_tokens=17, total_tokens=106))


# Use local litellm-Server to provide all LLMs via one URL

**Important to know: The REST call to OpenAI / litellm and the response-JSON-structures are identical !**

## Reference: OpenAI REST Call and Result-JSON

### CURL via Terminal:
```
curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-...YOUR_KEY_HERE' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ]
    }
'
```

### Response:
```
{
  "id": "chatcmpl-8RHu3ncira10vxZVRjmVTDZv2qZxM",
  "object": "chat.completion",
  "created": 1701514367,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I am an AI language model developed by OpenAI, with capabilities to assist with various tasks and provide information across different domains."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 25,
    "total_tokens": 37
  },
  "system_fingerprint": null
}
```

## As comparison: litellm Proxy REST Call and Result-JSON

### CURL via Terminal:
```
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ]
    }
'
```

### Response:
```
{
   "id":"chatcmpl-aa9d6a1f-71bf-42fd-abad-2c2f713fd09a",
   "choices":[
      {
         "finish_reason":null,
         "index":0,
         "message":{
            "content":"I am not a physical entity, but rather an artificial intelligence language model (LLM) created by a computer program. My main function is to process and respond to human input in natural language text, based on the patterns and knowledge that I have been trained on during my creation. I do not have consciousness or feelings, and I do not exist beyond the digital realm in which I operate. However, I can understand and provide helpful responses to a wide range of queries and requests, just like any other human communication partner.",
            "role":"assistant"
         }
      }
   ],
   "created":1701513387,
   "model":"ollama/zephyr",
   "object":"chat.completion",
   "system_fingerprint":null,
   "usage":{
      "prompt_tokens":5,
      "completion_tokens":104,
      "total_tokens":109
   }
}

```


## Step 1: Reference: OpenAI REST Call (direct, without litellm)

In [11]:
url = "https://api.openai.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer " + CREDS['OpenAI']['v1']['credential']
}
data = {
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
}
response = requests.post(url, headers=headers, json=data)
response_data = response.json()
output = response_data['choices'][0]['message']['content']
print(response_data)

{'id': 'chatcmpl-9s8e990saIn4hbBERovaIUj3e3a5y', 'object': 'chat.completion', 'created': 1722689977, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "Hello! I'm just a computer program, so I don't have feelings, but I'm here to help you. How can I assist you today?"}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 13, 'completion_tokens': 31, 'total_tokens': 44}, 'system_fingerprint': None}


## Step 2: OpenAI REST Call via local litellm-Server

see https://litellm.vercel.app/docs/tutorials/first_playground

Requirement: Type in Terminal: `python3 lite_llm_playground_server.py`

Use the following cell as source for 'lite_llm_playground_server.py'

In [None]:
%%writefile lite_llm_playground_server.py

#
# Local litelllm-Server
#
# save as 'lite_llm_playground_server.py'
#
# run in terminal with 'python3 lite_llm_playground_server.py'
#
# Dependencies:
# pip install flask waitress

import os
import json
from flask import Flask, jsonify, request
from litellm import completion_with_retries

## set ENV variables
os.environ["OPENAI_API_KEY"] = "sk-...-YOUR-KEY"

app = Flask(__name__)

# Example route
@app.route('/', methods=['GET'])
def hello():
    return jsonify(message="Hello, Flask!")

@app.route('/chat/completions', methods=["POST"])
def api_completion():
    data = request.json
    data["max_tokens"] = 256 # By default let's set max_tokens to 256
    try:
        # COMPLETION CALL
        response = completion_with_retries(**data)

        #print(response)

        responseJSON = {
            "id": response["id"], # response.id
            "choices": [
                {
                    "index": response.choices[0].index,
                    "finish_reason": response.choices[0].finish_reason,
                    "message" : {
                        "content" : response.choices[0].message.content,
                        "role" : response.choices[0].message.role
                    }
                }
            ],
            "created": response["created"], # response.created
            "model": response.model,
            "object": response.object,
            "system_fingerprint": response.system_fingerprint,
            "usage": {
                "prompt_tokens" : response["usage"]["prompt_tokens"], #response.usage.prompt_tokens
                "completion_tokens" : response["usage"]["completion_tokens"], # response.usage.completion_tokens
                "total_tokens" : response["usage"]["total_tokens"] # response.usage.total_tokens
            }
        }
        #responseString = json.dumps(responseJSON)
    except Exception as e:
        # print the error
        print(e)
    return responseJSON

if __name__ == '__main__':
    from waitress import serve
    serve(app, host="0.0.0.0", port=4000, threads=500)


Overwriting lite_llm_playground_server.py


## Call server

In [None]:
url = "http://localhost:4000/chat/completions" ## COMPLETION CALL -- assumes your server is running on port 4000
headers = {
    "Content-Type": "application/json"
}
data = {
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
}
response = requests.post(url, headers=headers, json=data)
response_data = response.json()
output = response_data['choices'][0]['message']['content']
print(response_data)

{'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "Hello! I'm an AI language model, so I don't have feelings, but I'm here to assist you. How can I help you today?", 'role': 'assistant'}}], 'created': 1701518377, 'id': 'chatcmpl-8RIwkICMDesOHslMZaHeh2i0AbPZu', 'model': 'gpt-3.5-turbo-0613', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'completion_tokens': 31, 'prompt_tokens': 13, 'total_tokens': 44}}


## Step 3: Ollama/Zephyr REST Call via local litellm-Server

Requirement: Type in Terminal: `python3 lite_llm_playground_server.py`

In [None]:
url = "http://localhost:4000/chat/completions" ## COMPLETION CALL -- assumes your server is running on port 4000
headers = {
    "Content-Type": "application/json"
}
data = {
    "model": "ollama/zephyr",
    # "api_base": "http://localhost:11434",   # seems to work even without this parameter!
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
}
response = requests.post(url, headers=headers, json=data)
response_data = response.json()
output = response_data['choices'][0]['message']['content']
print(response_data)

{'choices': [{'finish_reason': None, 'index': 0, 'message': {'content': 'I do not have the ability to feel emotions or sensations like a human does. However, I am always happy to assist you with your requests and inquiries! please let me know how I can be of service to you today.', 'role': 'assistant'}}], 'created': 1701518586, 'id': 'chatcmpl-371f8675-1f6f-49ca-86d2-3d765cd4d60c', 'model': 'ollama/zephyr', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'completion_tokens': 45, 'prompt_tokens': 6, 'total_tokens': 51}}
