<a href="https://colab.research.google.com/github/SahDavies/commons/blob/main/Ollama_Setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Run Ollama in Colab
---

[![5aharsh/collama](https://raw.githubusercontent.com/5aharsh/collama/main/assets/banner.png)](https://github.com/5aharsh/collama)

This is an example notebook which demonstrates how to run Ollama inside a Colab instance. With this you can run pretty much any small to medium sized models offerred by Ollama for free.

For the list of available models check [models being offerred by Ollama](https://ollama.com/library).


## Before you proceed
---

Since by default the runtime type of Colab instance is CPU based, in order to use LLM models make sure to change your runtime type to T4 GPU (or better if you're a paid Colab user). This can be done by going to **Runtime > Change runtime type**.

While running your script be mindful of the resources you're using. This can be tracked at **Runtime > View resources**.

## Running the notebook
---

After configuring the runtime just run it with **Runtime > Run all**. And you can start tinkering around. This example uses [Llama 3.2](https://ollama.com/library/llama3.2) to generate a response from a prompted question using [LangChain Ollama Integration](https://python.langchain.com/docs/integrations/chat/ollama/).

## Installing Dependencies
---

1. `pciutils` is required by Ollama to detect the GPU type.
2. Installation of Ollama in the runtime instance will be taken care by `curl -fsSL https://ollama.com/install.sh | sh`




In [1]:
!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

[33m0% [Working][0m            Hit:1 https://cli.github.com/packages stable InRelease
[33m0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (185.1[0m                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease


## Running Ollama
---

In order to use Ollama it needs to run as a service in background parallel to your scripts. Becasue Jupyter Notebooks is built to run code blocks in sequence this make it difficult to run two blocks at the same time. As a workaround we will create a service using subprocess in Python so it doesn't block any cell from running.

Service can be started by command `ollama serve`.

`time.sleep(5)` adds some delay to get the Ollama service up before downloading the model.

In [2]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

## Pulling Model
---

Download the LLM model using `ollama pull llama3.2`.

For other models check https://ollama.com/library

In [3]:
!ollama pull llama3.2

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l


## And that's it!
---

With this you should be able to freely play around with the models in your scripts. Following is an example using `langchain-ollama` to answer a simple prompt.

If you have a use-case that can help out others feel free to add your notebook to [Collama](https://github.com/5aharsh/collama/fork)

In [4]:
!pip install langchain-ollama



In [5]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="llama3.2")

chain = prompt | model

display(Markdown(chain.invoke({"question": "What's the length of hypotenuse in a right angled triangle"})))

To find the length of the hypotenuse (the longest side) in a right-angled triangle, we can use the Pythagorean theorem.

Here are the steps:

1. We have two sides that form the right angle, which we'll call the legs (a and b).
2. The Pythagorean theorem states that: c² = a² + b²
3. Where c is the length of the hypotenuse (the side opposite the right angle).
4. To find c, we can rearrange the equation to get: c = √(a² + b²)
5. Now, if we have the lengths of a and b, we can plug them into this formula to calculate the length of the hypotenuse.

Example:
Suppose we want to find the length of the hypotenuse in a right-angled triangle with legs of length 3cm and 4cm. 

We use the Pythagorean theorem: c² = 3² + 4²
= 9 + 16
= 25

Now, we take the square root of both sides to find c:
c = √25
= 5cm

So, in this example, the length of the hypotenuse is 5cm.

In [1]:
!pip uninstall -y langchain langchain-core langchain-community langchain-ollama

Found existing installation: langchain 1.0.7
Uninstalling langchain-1.0.7:
  Successfully uninstalled langchain-1.0.7
Found existing installation: langchain-core 1.0.5
Uninstalling langchain-core-1.0.5:
  Successfully uninstalled langchain-core-1.0.5
Found existing installation: langchain-community 0.4.1
Uninstalling langchain-community-0.4.1:
  Successfully uninstalled langchain-community-0.4.1
Found existing installation: langchain-ollama 1.0.0
Uninstalling langchain-ollama-1.0.0:
  Successfully uninstalled langchain-ollama-1.0.0


In [2]:
!pip install "langchain>=1.0.0" langchain-community langchain-ollama

Collecting langchain>=1.0.0
  Using cached langchain-1.0.7-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-community
  Using cached langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-ollama
  Using cached langchain_ollama-1.0.0-py3-none-any.whl.metadata (2.1 kB)
Collecting langchain-core<2.0.0,>=1.0.4 (from langchain>=1.0.0)
  Using cached langchain_core-1.0.5-py3-none-any.whl.metadata (3.6 kB)
Using cached langchain-1.0.7-py3-none-any.whl (93 kB)
Using cached langchain_community-0.4.1-py3-none-any.whl (2.5 MB)
Using cached langchain_ollama-1.0.0-py3-none-any.whl (29 kB)
Using cached langchain_core-1.0.5-py3-none-any.whl (471 kB)
Installing collected packages: langchain-core, langchain-ollama, langchain-community, langchain
Successfully installed langchain-1.0.7 langchain-community-0.4.1 langchain-core-1.0.5 langchain-ollama-1.0.0


In [3]:
!pip install --upgrade \
  langchain \
  langchain-core \
  langchain-community \
  langchain-openai \
  langchain-ollama \
  langchain-nomic




In [4]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

In [5]:
!ollama pull llama3.2

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l


In [6]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="llama3.2")

chain = prompt | model

display(Markdown(chain.invoke({"question": "What's the length of hypotenuse in a right angled triangle"})))

To find the length of the hypotenuse (the longest side) in a right-angled triangle, we can use the Pythagorean theorem.

The Pythagorean theorem states that:

a² + b² = c²

where:

* a is one of the shorter sides (called the legs)
* b is the other leg
* c is the hypotenuse (the longest side)

To find the length of the hypotenuse, we need to know the lengths of both legs. Let's call them "a" and "b".

Can you give me the lengths of one or both of the legs?

In [7]:
import faiss
print(faiss.__version__)

1.12.0


In [8]:
import faiss
import numpy as np

# Create random vectors
d = 64                  # dimension
xb = np.random.random((1000, d)).astype('float32')
xq = np.random.random((5, d)).astype('float32')

# Build index
index = faiss.IndexFlatL2(d)
index.add(xb)

# Search
D, I = index.search(xq, 3)

print("Distances:\n", D)
print("Indices:\n", I)


Distances:
 [[7.0624356 7.4036016 7.46629  ]
 [6.2473993 6.271619  6.275965 ]
 [6.276194  7.197847  7.2003193]
 [7.049431  7.2449436 7.40722  ]
 [6.930308  7.419937  7.489576 ]]
Indices:
 [[163 954  91]
 [766 163 502]
 [500 849 684]
 [922  71  44]
 [492 996 959]]


In [9]:
from langchain_nomic import NomicEmbeddings

print("LangChain-Nomic OK")

LangChain-Nomic OK


In [10]:
!pip install "nomic[local]"



In [11]:
!pip list

Package                                  Version
---------------------------------------- --------------------
absl-py                                  1.4.0
absolufy-imports                         0.3.1
accelerate                               1.11.0
aiofiles                                 24.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.13.2
aiosignal                                1.4.0
alabaster                                1.0.0
albucore                                 0.0.24
albumentations                           2.0.8
ale-py                                   0.11.2
alembic                                  1.17.1
altair                                   5.5.0
annotated-doc                            0.0.4
annotated-types                          0.7.0
antlr4-python3-runtime                   4.9.3
anyio                                    4.11.0
anywidget                                0.9.19
argon2-cffi                        

In [12]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="llama3.2")

chain = prompt | model

display(Markdown(chain.invoke({"question": "What's the length of hypotenuse in a right angled triangle"})))

To find the length of the hypotenuse (the longest side) in a right-angled triangle, we can use the Pythagorean theorem.

The formula for the Pythagorean theorem is:

a² + b² = c²

where 'c' is the length of the hypotenuse, and 'a' and 'b' are the lengths of the other two sides (the ones that form the right angle).

We can rearrange this formula to solve for 'c':

c² = a² + b²

Now, we need more information about the triangle. What are the lengths of the other two sides (a and b)?

In [13]:
!pip install gpt4all



In [14]:
!pip install nomic --upgrade



In [17]:
from nomic import embed

result = embed.text(
    texts=["hello world"],
    model="nomic-embed-text-v1.5",
    dimensionality=768,
    inference_mode='local'
)

print(result["embeddings"][0][:5])

Embedding texts: 100%|██████████| 1/1 [00:00<00:00, 19.01inputs/s]

[-0.024404408410191536, -0.002609707647934556, -0.15477925539016724, 0.01661060005426407, 0.018586883321404457]





In [18]:
from nomic import embed
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_community.docstore.in_memory import InMemoryDocstore
import faiss
from langchain_nomic import NomicEmbeddings # Import NomicEmbeddings

# ----- 1. Initialize Nomic Embeddings -----
# Create an instance of NomicEmbeddings
embedding_model = NomicEmbeddings(
    model='nomic-embed-text-v1.5',
    inference_mode='local',
    # Other parameters if needed, like nomic_api_key etc.
)

# Use the embedding_model to get the dimension
# embed_query is a method of NomicEmbeddings, not the result of embed.text
dummy_embedding = embedding_model.embed_query(" ")
index_dimension = len(dummy_embedding)

# Create FAISS index with correct dimension
index = faiss.IndexFlatL2(index_dimension)

# Build FAISS vector store
vector_store = FAISS(
    embedding_function=embedding_model, # Pass the embedding_model instance
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

Embedding texts: 100%|██████████| 1/1 [00:00<00:00,  6.91inputs/s]
