# Ollama + LlamaIndex + Kaggle GPU

**Author**: [korkridake (Korkrid Kyle Akepanidtaworn ) · GitHub](https://github.com/korkridake)

# ❔ About

The purpose of this notebook is to **show how easy it is to use** [`ollama`](https://github.com/jmorganca/ollama):

- Locally on small configurations (CPU is supported as well as GPUs of course, but `ollama` manages it for you)
- Interactively from terminal
- From within API from Llamaindex
- Complete blog post : [🆓 Local & Open Source AI: a kind ollama & LlamaIndex intro(https://dev.to/adriens/local-open-source-ai-a-kind-ollama-llamaindex-intro-1nnc)

[![IMAGE ALT TEXT](http://img.youtube.com/vi/MroeN4aTjF4/0.jpg)](http://www.youtube.com/watch?v=MroeN4aTjF4 "ollama intro on kaggle w/ LlamaIndex (CPU)")

## 🔖 Resources

- [Running Mixtral 8x7 locally with LlamaIndex](https://blog.llamaindex.ai/running-mixtral-8x7-locally-with-llamaindex-e6cebeabe0ab)
- [Ollama Python Library](https://pypi.org/project/ollama/) : _The Ollama Python library provides the easiest way to integrate your Python 3 project with Ollama_

## 📑 Cool Kaggle Notebooks

- [How to use Mistral from Kaggle Models](https://www.kaggle.com/code/paultimothymooney/how-to-use-mistral-from-kaggle-models)
- [Talking Papers with `Mistral-7B`](https://www.kaggle.com/code/philculliton/talking-papers-with-mistral-7b)
- [Chain-Strategy-RAG-Github-Indexing](https://www.kaggle.com/code/ahmedelsayedrashad/chain-strategy-rag-github-indexing)
- [Finetune Mistral](https://www.kaggle.com/code/simonstorf/finetune-mistral)
- [How to Work with Mistral-7B?](https://www.kaggle.com/code/tirendazacademy/how-to-work-with-mistral-7b)

## 🎫 `ollama` GitHub issues

Below some GH issues related in some ways to this notebook:

- [💭 Feature request > make it possible to remove the "thinking" animation "⠙ ⠹ ⠸" ](https://github.com/jmorganca/ollama/issues/1681)
- [📊 API Model Trends (Pulls, tags, lastUpdate, memory requirements,...) ❔](https://github.com/jmorganca/ollama/issues/1473)
- [❔ Run a given LLM/model within docker/podman/cloud run 👶 ](https://github.com/jmorganca/ollama/issues/1322)

# 📚 A LLM selection

Here is a non exclusive list of models that support small hardwarde setups (eg. 8 Go RAM nor GPU required):

- [`llama2:7b`](http://https://ollama.ai/library/llama2)
- [`mistral:7b`](http://https://ollama.ai/library/mistral)
- [`llava:7b`](http://https://ollama.ai/library/llava)
- [`neural-chat:7b`](http://https://ollama.ai/library/neural-chat)
- [`llama2-uncensored:7b`](http://https://ollama.ai/library/llama2-uncensored)
- [`orca-mini:7b`](http://https://ollama.ai/library/orca-mini)
- [`orca-mini:3b`](http://https://ollama.ai/library/orca-mini)
- [`wizard-vicuna-uncensored:7b`](http://https://ollama.ai/library/wizard-vicuna-uncensored)
- [`zephyr:7b`](http://https://ollama.ai/library/zephyr)
- [`mistral-openorca:7b`](http://https://ollama.ai/library/mistral-openorca)
- [`orca2:7b`](http://https://ollama.ai/library/orca2)
- [`medllama2:7b`](http://https://ollama.ai/library/medllama2)
- [`phi`](http://https://ollama.ai/library/phi)
- [`meditron:7b`](http://https://ollama.ai/library/meditron)
- [`openhermes2-mistral:7b`](http://https://ollama.ai/library/openhermes2-mistral)
- [`dolphin2.2-mistral:7b`](https://ollama.ai/library/dolphin2.2-mistral)
- [`dolphin-phi:2.7b`](https://ollama.ai/library/dolphin-phi)
- [`nous-hermes:7b`](https://ollama.ai/library/nous-hermes:7b)
- [`tinyllama`](https://ollama.ai/library/tinyllama)
- [`ifioravanti/neuralbeagle14-7b`](https://ollama.ai/ifioravanti/neuralbeagle14-7b)
- [`ifioravanti/alphamonarch`](https://ollama.com/ifioravanti/alphamonarch)
- [`gemma`](https://ollama.com/library/gemma)
- ...

In [1]:
# Use the IPython magic command to execute the nvidia-smi command and capture the output
gpu_info = !nvidia-smi

# Join the list of strings into a single string, separated by newline characters
gpu_info = '\n'.join(gpu_info)

# Check if the output contains the word 'failed,' indicating a failure in connecting to a GPU
if gpu_info.find('failed') >= 0:
    print('Not connected to a GPU')
else:
    # Print the GPU information
    print(gpu_info)

Sat Mar 30 09:55:58 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100 80GB PCIe          On  | 00000001:00:00.0 Off |                    0 |
| N/A   38C    P0              61W / 300W |  16528MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

# 🧰 Install `ollama`

* [Releases · ollama/ollama](https://github.com/ollama/ollama/releases)

In [2]:
!curl https://ollama.ai/install.sh | sh

# https://github.com/jmorganca/ollama/issues/1997#issuecomment-1892948729
!curl https://ollama.ai/install.sh | sed 's#https://ollama.ai/download#https://github.com/jmorganca/ollama/releases/download/v0.1.28#' | sh
!sudo apt install -y neofetch

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0>>> Downloading ollama...
100 10044    0 10044    0     0  17932      0 --:--:-- --:--:-- --:--:-- 17967
######################################################################## 100.0%##O=#  #                                                                      
>>> Installing ollama to /usr/local/bin...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> NVIDIA GPU installed.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10044    0 10044    0     0  31256      0 --:--:-- --:--:-- -

In [11]:

!neofetch

[?25l[?7l[0m[31m[1m            .-/+oossssoo+/-.
        `:+ssssssssssssssssss+:`
      -+ssssssssssssssssssyyssss+-
    .ossssssssssssssssss[37m[0m[1mdMMMNy[0m[31m[1msssso.
   /sssssssssss[37m[0m[1mhdmmNNmmyNMMMMh[0m[31m[1mssssss/
  +sssssssss[37m[0m[1mhm[0m[31m[1myd[37m[0m[1mMMMMMMMNddddy[0m[31m[1mssssssss+
 /ssssssss[37m[0m[1mhNMMM[0m[31m[1myh[37m[0m[1mhyyyyhmNMMMNh[0m[31m[1mssssssss/
.ssssssss[37m[0m[1mdMMMNh[0m[31m[1mssssssssss[37m[0m[1mhNMMMd[0m[31m[1mssssssss.
+ssss[37m[0m[1mhhhyNMMNy[0m[31m[1mssssssssssss[37m[0m[1myNMMMy[0m[31m[1msssssss+
oss[37m[0m[1myNMMMNyMMh[0m[31m[1mssssssssssssss[37m[0m[1mhmmmh[0m[31m[1mssssssso
oss[37m[0m[1myNMMMNyMMh[0m[31m[1msssssssssssssshmmmh[0m[31m[1mssssssso
+ssss[37m[0m[1mhhhyNMMNy[0m[31m[1mssssssssssss[37m[0m[1myNMMMy[0m[31m[1msssssss+
.ssssssss[37m[0m[1mdMMMNh[0m[31m[1mssssssssss[37m[0m[1mhNMMMd[0m[31m[1mssssssss.
 /ssssssss[37m[0m[1mh

# 🧠 Choose a model

`ollama` supports an ever increasing [collection of models](http://https://ollama.ai/library) and his designed to easily embed fresh new custom ones. In our case, we're
going to pick a readt to use one.

See collection [here](http://https://ollama.ai/library).

Take great care of the tags for system requirements.Therefore, before to play with a model, it's is recommanded to:

- check the [model homepage](http://https://ollama.ai/library/zephyr)
- check the [model tags](http://https://ollama.ai/library/zephyr/tags)


In [12]:
# Setup the target model globally
#OLLAMA_MODEL='zephyr'
#OLLAMA_MODEL='dolphin2.2-mistral:7b'
#OLLAMA_MODEL='neural-chat:7b'
#OLLAMA_MODEL='tinyllama'
#OLLAMA_MODEL='nous-hermes:7b'
#OLLAMA_MODEL='openhermes2-mistral:7b'
#OLLAMA_MODEL='ifioravanti/neuralbeagle14-7b'
#OLLAMA_MODEL='ifioravanti/alphamonarch'
#OLLAMA_MODEL='gemma:2b'
#OLLAMA_MODEL='gemma:7b'
OLLAMA_MODEL='phi:latest'

# Set it at the OS level
import os
os.environ['OLLAMA_MODEL'] = OLLAMA_MODEL
!echo $OLLAMA_MODEL

phi:latest


# 🏁 Startup `ollama`

Now, let's locally startup `ollama` in background.

In [7]:
import subprocess
import time

# Start ollama as a backrgound process
command = "nohup ollama serve&"

# Use subprocess.Popen to start the process in the background
process = subprocess.Popen(command,
                            shell=True,
                           stdout=subprocess.PIPE,
                           stderr=subprocess.PIPE)
print("Process ID:", process.pid)
# Let's use fly.io resources
#!OLLAMA_HOST=https://ollama-demo.fly.dev:443
time.sleep(5)  # Makes Python wait for 5 seconds

Process ID: 5556


In [13]:
!ollama -v

ollama version is 0.1.30


# 🐚 Try `ollama` from shell (`cli`)

Ask a frst question so the target model is initially pulled (downloaded) from remote registry.

In [15]:
!ollama run $OLLAMA_MODEL "Translate the following emoji sentence to a meaningful text : 💙🤓💻📱🔧🌀💡"

[?25l[?25l[?25h[2K[1G[?25h I[?25l[?25h am[?25l[?25h feeling[?25l[?25h excited[?25l[?25h,[?25l[?25h intelligent[?25l[?25h,[?25l[?25h tech[?25l[?25h-[?25l[?25hsav[?25l[?25hvy[?25l[?25h,[?25l[?25h connected[?25l[?25h,[?25l[?25h creative[?25l[?25h,[?25l[?25h and[?25l[?25h in[2D[K
innovative[?25l[?25h today[?25l[?25h.[?25l[?25h
[?25l[?25h

[?25l[?25h

# 😜 Ask AI for a joke

In [16]:
!ollama run $OLLAMA_MODEL "Say something fun about open source AI."

[?25l[?25l[?25h[2K[1G[?25h Open[?25l[?25h source[?25l[?25h AI[?25l[?25h is[?25l[?25h like[?25l[?25h a[?25l[?25h never[?25l[?25h-[?25l[?25hending[?25l[?25h game[?25l[?25h of[?25l[?25h ping[?25l[?25h p[?25l[?25hong[?25l[?25h -[?25l[?25h it[?25l[?25h keeps[?25l[?25h evolvin[7D[K
evolving[?25l[?25h,[?25l[?25h getting[?25l[?25h better[?25l[?25h with[?25l[?25h each[?25l[?25h round[?25l[?25h.[?25l[?25h Plus[?25l[?25h,[?25l[?25h it[?25l[?25h's[?25l[?25h free[?25l[?25h and[?25l[?25h available[?25l[?25h for[?25l[?25h[3D[K
for everyone[?25l[?25h to[?25l[?25h use[?25l[?25h![?25l[?25h #[?25l[?25hOpen[?25l[?25hSource[?25l[?25hAI[?25l[?25h #[?25l[?25hTech[?25l[?25hChat[?25l[?25h
[?25l[?25h

[?25l[?25h

# 🚀 Call ollama from `LlamaIndex`

[`LlamaIndex`](https://www.llamaindex.ai/) is a _Data framework for LLM Applications_
So it helps implement a lot of use case within Python on top of any LLM.

Here, we're going to use LlamaIndex framework abstraction on top of `ollama` abstraction so
we we can very easily switch LLMs just with configuration change (which is awesome btw).

In [17]:
# Check my Python version
!python --version

Python 3.10.14


In [18]:
# Install prerequisites
!pip install llama-index
!pip install llama-index-llms-ollama
!pip install llama-index-embeddings-huggingface
!pip install llama-index ipywidgets
!pip install llama-index-llms-huggingface
!pip install chromadb

[0mCollecting llama-index-llms-ollama
  Downloading llama_index_llms_ollama-0.1.2-py3-none-any.whl.metadata (636 bytes)
Downloading llama_index_llms_ollama-0.1.2-py3-none-any.whl (3.2 kB)
Installing collected packages: llama-index-llms-ollama
Successfully installed llama-index-llms-ollama-0.1.2
[0mCollecting llama-index-llms-huggingface
  Downloading llama_index_llms_huggingface-0.1.4-py3-none-any.whl.metadata (741 bytes)
Collecting huggingface-hub<0.21.0,>=0.20.3 (from llama-index-llms-huggingface)
  Using cached huggingface_hub-0.20.3-py3-none-any.whl.metadata (12 kB)
Downloading llama_index_llms_huggingface-0.1.4-py3-none-any.whl (7.2 kB)
Using cached huggingface_hub-0.20.3-py3-none-any.whl (330 kB)
Installing collected packages: huggingface-hub, llama-index-llms-huggingface
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.22.2
    Uninstalling huggingface-hub-0.22.2:
      Successfully uninstalled huggingface-hub-0.22.2
Successfully insta

In [26]:
!pip install llama-index-vector-stores-postgres
!pip install llama-index-vector-stores-chroma



[0mCollecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.1.6-py3-none-any.whl.metadata (654 bytes)
Downloading llama_index_vector_stores_chroma-0.1.6-py3-none-any.whl (4.7 kB)
Installing collected packages: llama-index-vector-stores-chroma
Successfully installed llama-index-vector-stores-chroma-0.1.6
[0m

In [24]:
!pip install llama_index


[0m

In [27]:
# Import pandas library
import pandas as pd

# Import os module for operating system functionalities
import os

# Import logging module for logging messages
import logging

# Import sys module for system-specific parameters and functions
import sys

# Configure logging to display INFO level messages on the console
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Import Markdown and display functions from IPython.display module
from IPython.display import Markdown, display

# Import required modules from the llama_index library
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import StorageContext

# Import ChromaVectorStore and chromadb module
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# LLM with no prior knowledge base...

In [28]:
# Import the Ollama class from the llama_index.llms.ollama module
from llama_index.llms.ollama import Ollama

# Assume OLLAMA_MODEL is a variable specifying the desired Ollama model
llm = Ollama(model=OLLAMA_MODEL)

# Perform a query using the complete method with the specified input text and print the response
response = llm.complete("""Who is Grigori Perelman and why is he so important in mathematics?
(Answer with markdown sections, markdown with be the GitHub flavor.)""")
print(response)

INFO:httpx:HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
 ## Introduction 
Grigori Perelman was a Russian mathematician who made significant contributions to the field of topology. He is best known for solving the Poincaré conjecture, which had remained unsolved for over 100 years.

## The Poincaré Conjecture 
The Poincaré conjecture states that any simply connected 3-dimensional manifold is homeomorphic to a 3-sphere. This means that any space that can be smoothly deformed into a sphere can also be transformed into a sphere with

In [29]:
# Perform another query using the complete method with a different input text and print the response
response = llm.complete("""Who is the current Prime Minister of Thailand?""")
print(response)

INFO:httpx:HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
 The current Prime Minister of Thailand is Prayut Chan-o-cha. He took office on September 19, 2014, following the resignation of Yingluck Shinawatra. Prayut is a member of the Palang Pracharath Party and has been involved in Thai politics since 2011.



# LLM with RAGs

Retrieval augmented generation (RAG) is a strategy that helps address both of these issues, pairing information retrieval with a set of carefully designed system prompts to anchor LLMs on precise, up-to-date, and pertinent information retrieved from an external knowledge store. Prompting LLMs with this contextual knowledge makes it possible to create domain-specific applications that require a deep and evolving understanding of facts, despite LLM training data remaining static.

## Loading Data (Ingestion)

In [57]:
# Load documents
reader = SimpleDirectoryReader("/root/BeijingTravelGuidebook")
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")

Loaded 3 docs


In [58]:
# Initialize a HuggingFace Embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']
2 prompts are loaded, with the keys: ['query', 'text']


In [59]:
# Specify required settings
Settings.llm = llm
Settings.embed_model = embed_model

## Indexing & Storing

In [60]:
# Create client and a new collection
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.create_collection("poc-llamaindex-ops-thaipm2")

In [61]:
# Set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    docs, 
    storage_context = storage_context, 
    embed_model = embed_model
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [62]:
print(chroma_collection)
print(chroma_collection.name)
print(chroma_collection.id)

name='poc-llamaindex-ops-thaipm2' id=UUID('3145c823-e24f-4089-a352-99ab54cff4ca') metadata=None tenant='default_tenant' database='default_database'
poc-llamaindex-ops-thaipm2
3145c823-e24f-4089-a352-99ab54cff4ca


## Querying

In [63]:
# Set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()

In [75]:
response = query_engine.query("How much money will it take of 4 days Beijing City Highlights Tour")
display(Markdown(f"<b>{response}</b>"))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


<b> From the provided context, there is no direct mention of the total cost for a 4-day tour in Beijing. However, you mentioned that the price per person is CN¥2,600. To calculate the total cost for a group of people, you need to know how many people are going on the tour.
</b>

In [1]:
response = query_engine.query("Tell me about MuTianyu and how long it will take from beijing to MuTianyu")
display(Markdown(f"<b>{response}</b>"))

NameError: name 'query_engine' is not defined