# The basics of RAG from scratch

With this notebook you can ask questions about your own document. 
It uses ollama to run LLM's locally. 

Make sure you have downloaded and installed ollama from www.ollama.com.

#### Contents
0. Install and import packages
1. Check available models in ollama + set system prompt
2. Get the text document and split into paragraphs (chunks of text)
3. Embeddings¶
4. Set the prompt, create prompt embeddings and do similarity search
5. Get response from the LLM

#### Source
source & acknowledgements: https://decoder.sh/videos/rag-from-the-ground-up-with-python-and-ollama 

This notebook follows the flow in this flowdiagram. Please study it carefully.
![slide1](slide_flow.png)

## 0. Install and import packages

In [None]:
%pip install ollama
%pip install json

In [None]:
#import packages
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm

## 1. Check availabe models in ollama + set system prompt

For this notebook you'll need two models:
- an embedding model: nomic-embed-text
- any LLM: tinyllama, mistral or other (choose yourself, set later on)

In [None]:
!ollama list

In [None]:
#uncomment if necessary
#!ollama pull nomic-embed-text

In [None]:
#set the system prompt
SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions 
        based on snippets of text provided in context. Answer only using the context provided, 
        being as concise as possible. If you're unsure, just say that you don't know."""

## 2. Get the text document and split into paragraphs (chunks of text)

In this case, we simply use .txt files from the Project Gutenberg website.
www.gutenberg.org

In [None]:
# function to open a file and return paragraphs
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs

In [None]:
# open file as provided in the GitHub repo
filename = "peter-pan.txt"
paragraphs = parse_file(filename)
print(paragraphs[0:3]) #print first 3 paragraphs
print(f'Total number of paragraphs: {len(paragraphs)}')

## 3. Embeddings

In [None]:
# functions to save, load and get the embeddings
def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)

def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)

def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

In [None]:
#get the embeddings
embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)
len(embeddings) #should be same number as paragraphs

## 4. Set the prompt, create prompt embeddings, do similarity search

In [None]:
#set the prompt
prompt = "Tell me about tinke bell?"

In [None]:
# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]

In [None]:
# find cosine similarity of every chunk to a given embedding
def find_most_similar(needle, haystack):
    needle_norm = norm(needle)
    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)

In [None]:
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]
most_similar_chunks

## 5. Get a response from the LLM

In [None]:
#set your model here!
model='mistral'

In [None]:
response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])