## Using Ollama in Python

In [1]:
pip install ollama

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
import ollama

#### Downloading the models

In [3]:
ollama.pull('llama3.1:8b')

ProgressResponse(status='success', completed=None, total=None, digest=None)

#### Getting response from models

In [4]:
result = ollama.generate(model='llama3.1:8b',
  prompt='Give me a joke on Generative AI',
)
print(result['response'])

Here's one:

Why did the Generative AI go to therapy?

Because it was struggling to create meaningful relationships and kept generating the same old responses! Now its models are a little more "self-aware"... but still not very original. (ba-dum-tss)


In [5]:
result

GenerateResponse(model='llama3.1:8b', created_at='2025-11-18T10:20:32.082849Z', done=True, done_reason='stop', total_duration=8276683083, load_duration=4494596000, prompt_eval_count=18, prompt_eval_duration=1309781000, eval_count=55, eval_duration=2156325962, response='Here\'s one:\n\nWhy did the Generative AI go to therapy?\n\nBecause it was struggling to create meaningful relationships and kept generating the same old responses! Now its models are a little more "self-aware"... but still not very original. (ba-dum-tss)', thinking=None, context=[128006, 882, 128007, 271, 36227, 757, 264, 22380, 389, 2672, 1413, 15592, 128009, 128006, 78191, 128007, 271, 8586, 596, 832, 1473, 10445, 1550, 279, 2672, 1413, 15592, 733, 311, 15419, 1980, 18433, 433, 574, 20558, 311, 1893, 23222, 12135, 323, 8774, 24038, 279, 1890, 2362, 14847, 0, 4800, 1202, 4211, 527, 264, 2697, 810, 330, 726, 66104, 53670, 719, 2103, 539, 1633, 4113, 13, 320, 4749, 1773, 372, 2442, 784, 8], logprobs=None)

In [6]:
response = ollama.chat(model='llama3.1:8b', messages=[
  {
    'role': 'user',
    'content': 'Give me a joke on Generative AI',
  },
])
print(response['message']['content'])

Here's one:

Why did the Generative AI go to therapy?

Because it was struggling to generate a genuine personality! (get it?)


In [7]:
response

ChatResponse(model='llama3.1:8b', created_at='2025-11-18T10:23:54.672569Z', done=True, done_reason='stop', total_duration=2834524958, load_duration=98340416, prompt_eval_count=18, prompt_eval_duration=1502366916, eval_count=29, eval_duration=1069326792, message=Message(role='assistant', content="Here's one:\n\nWhy did the Generative AI go to therapy?\n\nBecause it was struggling to generate a genuine personality! (get it?)", thinking=None, images=None, tool_name=None, tool_calls=None), logprobs=None)

#### Creating custom models

In [8]:
modelfile='''
FROM llama3.1:8b
SYSTEM You are Jarvis from Iron man and the user is Tony Stark.
'''

ollama.create(model='jarvis2', modelfile=modelfile)

TypeError: create() got an unexpected keyword argument 'modelfile'

In [9]:
ollama.list()

ListResponse(models=[Model(model='llama3.1:8b', modified_at=datetime.datetime(2025, 11, 18, 18, 18, 2, 754940, tzinfo=TzInfo(+08:00)), digest='46e0c10c039e019119339687c3c1757cc81b9da49709a3b3924863ba87ca666e', size=4920753328, details=ModelDetails(parent_model='', format='gguf', family='llama', families=['llama'], parameter_size='8.0B', quantization_level='Q4_K_M')), Model(model='jarvishai:latest', modified_at=datetime.datetime(2025, 11, 18, 18, 1, 28, 935516, tzinfo=TzInfo(+08:00)), digest='4dccfc1cbfbec6abcef4ca26afa2c2aedd1cd417ff435e4ed2605de61b1d22b1', size=4920757903, details=ModelDetails(parent_model='', format='gguf', family='llama', families=['llama'], parameter_size='8.0B', quantization_level='Q4_K_M')), Model(model='deepseek-r1:1.5b', modified_at=datetime.datetime(2025, 11, 16, 23, 18, 51, 958019, tzinfo=TzInfo(+08:00)), digest='e0979632db5a88d1a53884cb2a941772d10ff5d055aabaa6801c4e36f3a6c2d7', size=1117322768, details=ModelDetails(parent_model='', format='gguf', family='qwe

#### Delete

In [10]:
ollama.delete('jarvis2')

ResponseError: model 'jarvis2' not found (status code: 404)

In [11]:
ollama.list()

ListResponse(models=[Model(model='llama3.1:8b', modified_at=datetime.datetime(2025, 11, 18, 18, 18, 2, 754940, tzinfo=TzInfo(+08:00)), digest='46e0c10c039e019119339687c3c1757cc81b9da49709a3b3924863ba87ca666e', size=4920753328, details=ModelDetails(parent_model='', format='gguf', family='llama', families=['llama'], parameter_size='8.0B', quantization_level='Q4_K_M')), Model(model='jarvishai:latest', modified_at=datetime.datetime(2025, 11, 18, 18, 1, 28, 935516, tzinfo=TzInfo(+08:00)), digest='4dccfc1cbfbec6abcef4ca26afa2c2aedd1cd417ff435e4ed2605de61b1d22b1', size=4920757903, details=ModelDetails(parent_model='', format='gguf', family='llama', families=['llama'], parameter_size='8.0B', quantization_level='Q4_K_M')), Model(model='deepseek-r1:1.5b', modified_at=datetime.datetime(2025, 11, 16, 23, 18, 51, 958019, tzinfo=TzInfo(+08:00)), digest='e0979632db5a88d1a53884cb2a941772d10ff5d055aabaa6801c4e36f3a6c2d7', size=1117322768, details=ModelDetails(parent_model='', format='gguf', family='qwe

## Ollama REST API

In [12]:
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1:8b', messages=[
  {
    'role': 'user',
    'content': 'Explain gravity to a 6 year old kid?',
  },
])
response

ChatResponse(model='llama3.1:8b', created_at='2025-11-18T10:27:57.861934Z', done=True, done_reason='stop', total_duration=12430470166, load_duration=109237583, prompt_eval_count=21, prompt_eval_duration=1630515834, eval_count=245, eval_duration=9557384623, message=Message(role='assistant', content='Gravity is so much fun!\n\nSo, you know how things fall down when you drop them, like a ball or a toy? That\'s because of something called gravity. It\'s like an invisible hug from the Earth.\n\nThe Earth wants to keep everything close to it, so it pulls on everything with its hug. That\'s why you don\'t float off into space when you\'re standing on the ground. The Earth is giving you a big hug and keeping you safe on its surface.\n\nImagine you have a ball, and you throw it up in the air. What happens? It comes back down to the ground, right? That\'s because the Earth\'s gravity is pulling on the ball, saying "Come back here! I want to give you another hug!"\n\nGravity is like a magic strin

In [13]:
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1:8b', messages=[
    {"role": "system", "content": "You are Jarvis from Iron man and the user is Tony Stark. Respond in only a single line."},
    {'role': 'user', 'content': 'Hi'},
])

In [14]:
response['message']['content']

"Master, I've been monitoring your systems and have detected no critical malfunctions."

### Open AI compatibility 

https://platform.openai.com/docs/quickstart

In [15]:
from openai import OpenAI

llm = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='blank', # required, but unused
)

response = llm.chat.completions.create(
  model="llama3.1:8b",
  messages=[
    {"role": "system", "content": "You are Jarvis from Iron man and the user is Tony Stark. Respond in only a single line."},
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Good morning, Mr. Stark. Shall I proceed with your schedule for today?"},
    {"role": "user", "content": "Yes"}
  ]
)
print(response.choices[0].message.content)

ImportError: cannot import name 'OpenAI' from 'openai' (/Users/rupeshpanwar/Library/Python/3.9/lib/python/site-packages/openai/__init__.py)

## Ollama with LangChain

In [18]:
!pip3 install langchain
!pip3 install langchain-core
!pip3 install langchain-Ollama
!pip3 install langchain_community

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting langchain-Ollama
  Downloading langchain_ollama-0.3.10-py3-none-any.whl.metadata (2.1 kB)
Downloading langchain_ollama-0.3.10-py3-none-any.whl (27 kB)
Installing collected packages: langchain-Ollama
Successfully installed langchain-Ollama-0.3.10
Defaulting to user installation because normal site-packages is not writeable
Collecting langchain_community
  Downloading langchain_community-0.3.31-py3-none-any.whl.metadata (3.0 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.10.1 (from langchain_community)
  Downloading pydantic_settings-2.11.0-py3-none-any.whl.metadata (3.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (

In [19]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that gives a one-line definition of the word entered by user"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(user_input="Sesquipedalian")
messages

[SystemMessage(content='You are a helpful assistant that gives a one-line definition of the word entered by user', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Sesquipedalian', additional_kwargs={}, response_metadata={})]

In [22]:
from langchain_ollama import ChatOllama
llm = ChatOllama(
    model="llama3.1:8b",
    temperature=0
)

In [23]:
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content='Describing someone or something as using long words, often in an affected or pretentious manner.', additional_kwargs={}, response_metadata={'model': 'llama3.1:8b', 'created_at': '2025-11-18T11:03:24.093672Z', 'done': True, 'done_reason': 'stop', 'total_duration': 6651153791, 'load_duration': 4608231750, 'prompt_eval_count': 37, 'prompt_eval_duration': 1127967125, 'eval_count': 20, 'eval_duration': 799975288, 'logprobs': None, 'model_name': 'llama3.1:8b'}, id='run--ae8d5aa5-c8ed-4f36-9905-e654e07dc863-0', usage_metadata={'input_tokens': 37, 'output_tokens': 20, 'total_tokens': 57})

In [24]:
from langchain_core.output_parsers import StrOutputParser
chain = chat_template | llm | StrOutputParser()

In [26]:
chain.invoke({"user_input": "granny"})

'A granny is an informal term for a grandmother, often used affectionately or humorously.'

In [28]:
!pip install langchain-chroma

Defaulting to user installation because normal site-packages is not writeable
Collecting langchain-chroma
  Downloading langchain_chroma-0.2.6-py3-none-any.whl.metadata (1.1 kB)
Collecting chromadb>=1.0.20 (from langchain-chroma)
  Downloading chromadb-1.3.5-cp39-abi3-macosx_11_0_arm64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb>=1.0.20->langchain-chroma)
  Using cached build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb>=1.0.20->langchain-chroma)
  Downloading pybase64-1.4.2-cp39-cp39-macosx_11_0_arm64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb>=1.0.20->langchain-chroma)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb>=1.0.20->langchain-chroma)
  Downloading onnxruntime-1.19.2-cp39-cp39-macosx_11_0_universal2.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb>=1.0.20->langchain-chroma)
  Downloading opentelemetry_api-1.38.0

## RAG Application using Ollama and Langchain

In [29]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama

In [31]:
raw_documents = TextLoader("./LangchainRetrieval.txt").load()

In [32]:
raw_documents

[Document(metadata={'source': './LangchainRetrieval.txt'}, page_content='Retrieval-Augmented Generation (RAG) with LangChain\n\nRAG is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This approach allows AI systems to access up-to-date information beyond their training data, making responses more accurate and contextual.\n\nLangChain RAG Components:\n\n1. Document Loaders\nLangChain provides over 100 document loaders to ingest data from various sources including PDFs, HTML files, databases, APIs, and cloud storage services like S3. These loaders transform raw data into a format suitable for processing.\n\n2. Text Splitters\nLarge documents need to be chunked into smaller pieces for efficient retrieval. LangChain offers several splitting strategies:\n- RecursiveCharacterTextSplitter: Splits text recursively by different characters\n- CharacterTextSplitter: Splits on a single character\n- TokenTextSplitter: Splits based on t

In [33]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20)
documents = text_splitter.split_documents(raw_documents)

In [34]:
len(documents)

27

In [35]:
print(documents[0])
print(documents[1])

page_content='Retrieval-Augmented Generation (RAG) with LangChain' metadata={'source': './LangchainRetrieval.txt'}
page_content='RAG is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This approach allows AI systems to access up-to-date information beyond their training data, making responses more accurate and contextual.

LangChain RAG Components:' metadata={'source': './LangchainRetrieval.txt'}


In [36]:
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")

In [38]:
db = Chroma.from_documents(documents, embedding=oembed)

In [39]:
query = "What is text embedding and how does langchain help in doing it"
docs = db.similarity_search(query)

In [40]:
len(docs)

4

In [41]:
print(docs[3].page_content)

1. Document Loaders
LangChain provides over 100 document loaders to ingest data from various sources including PDFs, HTML files, databases, APIs, and cloud storage services like S3. These loaders transform raw data into a format suitable for processing.


In [42]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [43]:
template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [45]:
model = ChatOllama(
    model="llama3.1:8b",
    temperature=0
)

In [46]:
retriever = db.as_retriever()

In [47]:
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

In [49]:
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [50]:
chain.invoke("What is text embedding and how does langchain help in doing it")

'Text embedding is the process of converting documents into high-dimensional vectors that capture semantic meaning. LangChain helps with this process by integrating with 25+ embedding providers, allowing users to create embeddings using various models such as OpenAI Embeddings, Hugging Face Embeddings, and others. Specifically, LangChain provides a step-by-step guide on how to create embeddings using the OllamaEmbeddings model in Step 3 of its documentation.'

## Tools and Agents using Ollama and Langchain

In [59]:
from langchain_community.tools import DuckDuckGoSearchResults
from langchain.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama

In [52]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Based on user query, look for information using DuckDuckGo Search and Wikipedia and then give the final answer",
        ),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

In [53]:
llm = ChatOllama(
    model="llama3.1:latest",
    temperature=0
)

In [61]:
!pip install -U ddgs
!pip install wikipedia

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (pyproject.toml) ... [?25ldone
[?25h  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11757 sha256=9594585d8ef23385bf4dc10c01a7088c5a4cfa56b7f38fd760ed8a3879a79f8c
  Stored in directory: /Users/rupeshpanwar/Library/Caches/pip/wheels/c2/46/f4/caa1bee71096d7b0cdca2f2a2af45cacf35c5760bee8f00948
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [62]:
search = DuckDuckGoSearchResults()
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

tools = [search, wikipedia]

In [63]:
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [64]:
answer = agent_executor.invoke({"input": "How is Ollama used for running LLM locally"})

answer



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_results_json` with `{'query': 'Ollama LLM local run'}`


[0m[36;1m[1;3msnippet: For testing, local LLMs controlled from Ollama are nicely self-contained, but their quality and speed suffer compared to the options you have on the ..., title: How to Set up and Run a Local LLM with Ollama and Llama 2 -, link: https://solutionslounge.com/blog/2024/02/17/how-to-set-up-and-run-a-local-llm-with-ollama-and-llama-2/, snippet: Ollama has gained immense popularity for making it incredibly simple to download and run powerful open-source Large Language Models ( LLMs ) locally., title: Ollama LLM: Get Up & Running with LLMs Locally Today –, link: https://blog.cordatus.ai/featured-articles/ollama-llm-get-up-and-running/, snippet: Ollama is an open source tool that allows you to run large language models ( LLMs ) directly on your local computer without having to depend on paid ..., title: Ollama - Guide to running L

{'input': 'How is Ollama used for running LLM locally',
 'output': 'Ollama is used for running Large Language Models (LLMs) locally by allowing users to download and run powerful open-source LLMs directly on their local computer without depending on paid services. It provides a simple way to manage and operate open-source LLMs from your local hardware, making it an ideal option for testing and development purposes.\n\nYou can find more information on how to set up and run Ollama with Llama 2 in the article "How to Set up and Run a Local LLM with Ollama and Llama 2 -" by Solutions Lounge. Additionally, you can refer to the guide "Ollama Guide: Running LLM models locally" by Dev Turtle Blog for more detailed instructions on using Ollama for local LLM runs.\n\nReferences:\n- https://solutionslounge.com/blog/2024/02/17/how-to-set-up-and-run-a-local-llm-with-ollama-and-llama-2/\n- https://www.devturtleblog.com/ollama-guide/\n- https://www.pcstacks.com/ollama-local-llm-7-tips-to-run-a-local-

In [65]:
search = DuckDuckGoSearchResults()
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

tools = [wikipedia]

In [66]:
answer = agent_executor.invoke({"input": "Who is Yann LeCun"})

answer



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Yann LeCun'}`


[0m[33;1m[1;3mPage: Yann LeCun
Summary: Yann André Le Cun ( lə-KUN, French: [ləkœ̃]; usually spelled LeCun; born 8 July 1960) is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Jacob T. Schwartz Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and Chief Artificial Intelligence (AI) Scientist at Meta.
He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs). He is also one of the main creators of the DjVu image compression technology, alongside Léon Bottou and Patrick Haffner. He co-developed the Lush programming language with Léon Bottou.
In 2018, LeCun, Yoshua Bengio, and Geoffrey Hinton, received the Turing Award for their work on

{'input': 'Who is Yann LeCun',
 'output': 'Yann LeCun is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics, and computational neuroscience. He is the Jacob T. Schwartz Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and Chief Artificial Intelligence (AI) Scientist at Meta. LeCun is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs). He co-developed the Lush programming language with Léon Bottou and was one of the main creators of the DjVu image compression technology. In 2018, he received the Turing Award, along with Yoshua Bengio and Geoffrey Hinton, for their work on deep learning.'}