# The basics of RAG from scratch

With this notebook you can ask questions about your own document. 
It uses ollama to run LLM's locally. 

Make sure you have downloaded and installed ollama from www.ollama.com.

#### Contents
0. Install and import packages
1. Check available models in ollama + set system prompt
2. Get the text document and split into paragraphs (chunks of text)
3. Embeddings¶
4. Set the prompt, create prompt embeddings and do similarity search
5. Get response from the LLM

#### Source
source & acknowledgements: https://decoder.sh/videos/rag-from-the-ground-up-with-python-and-ollama 

This notebook follows the flow in this flowdiagram. Please study it carefully.
![slide1](slide_flow.png)

## 0. Install and import packages

In [1]:
%pip install ollama
%pip install json

Note: you may need to restart the kernel to use updated packages.
[31mERROR: Could not find a version that satisfies the requirement json (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for json[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
#import packages
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm

## 1. Check availabe models in ollama + set system prompt

For this notebook you'll need two models:
- an embedding model: nomic-embed-text
- any LLM: tinyllama, mistral or other (choose yourself, set later on)

In [3]:
!ollama list

NAME                            	ID          	SIZE  	MODIFIED     
bramvanroy/fietje-2b-chat:Q3_K_M	6dd6525c1e6c	1.4 GB	2 weeks ago 	
llava:latest                    	8dd30f6b0cb1	4.7 GB	4 hours ago 	
mistral:latest                  	61e88e884507	4.1 GB	2 weeks ago 	
nomic-embed-text:latest         	0a109f422b47	274 MB	2 weeks ago 	
tinyllama:latest                	2644915ede35	637 MB	4 months ago	


In [4]:
#uncomment if necessary
#!ollama pull nomic-embed-text

In [5]:
#set the system prompt
SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions 
        based on snippets of text provided in context. Answer only using the context provided, 
        being as concise as possible. If you're unsure, just say that you don't know."""

## 2. Get the text document and split into paragraphs (chunks of text)

In this case, we simply use .txt files from the Project Gutenberg website.
www.gutenberg.org

In [6]:
# function to open a file and return paragraphs
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs

In [7]:
# open file as provided in the GitHub repo
filename = "peter-pan.txt"
paragraphs = parse_file(filename)
print(paragraphs[0:3]) #print first 3 paragraphs
print(f'Total number of paragraphs: {len(paragraphs)}')

['The Project Gutenberg eBook of Peter Pan', 'This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.', 'Title: Peter Pan']
Total number of paragraphs: 1736


## 3. Embeddings

In [8]:
# functions to save, load and get the embeddings
def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)

def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)

def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

In [9]:
#get the embeddings
embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)
len(embeddings) #should be same number as paragraphs

1736

## 4. Set the prompt, create prompt embeddings, do similarity search

In [10]:
#set the prompt
prompt = "Tell me about tinke bell?"

In [11]:
# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]

In [12]:
# find cosine similarity of every chunk to a given embedding
def find_most_similar(needle, haystack):
    needle_norm = norm(needle)
    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)

In [13]:
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]
most_similar_chunks

[(0.6463232225153855, 1576),
 (0.6360647070442861, 252),
 (0.6153480329856226, 237),
 (0.6082050651637667, 1024),
 (0.6079815685485532, 176)]

## 5. Get a response from the LLM

In [14]:
#set your model here!
model='mistral'

In [15]:
response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])




 Tinker Bell is a fairy known for mending pots and kettles. She can be quite temperamental, as shown by her reaction when called a "silly ass." She enjoys unusual places like being in a jug.
