### Create and run a local RAG pipeline from scratch

The goal of this notebook is to build a RAG (Retrieval Augmented Generation) pipeline from scratch and have it run on a local machine.

W we'd like to be able to open a PDF file, ask questio ) of it and have them answered by a Large Language Model (LLM).

There are framewofor this workflow such asding LlamaIndex and LangChain, however, the goal of building from scratch is to be able to inspect and customize all the parts.


### What is RAG and why do we use it

RAG stands for Retrieval Augmented Generation.
This just means "when given a prompt, search relevant sources for the answer to that prompt and give an answer"
"Here's a breakdown of each step:

Retrieval - Get relevant resources given a query. For example, if the query is "what are the macronutrients?" the ideal results will contain information about protein, carbohydrates and fats (and possibly alcohol) rather than information about which tractors are the best for farming (though that is also cool information).

Augmentation - LLMs are capable of generating text given a prompt. However, this generated text is designed to look right. And it often has some correct information, however, they are prone to hallucination (generating a result that looks like legit text but is factually wrong). In augmentation, we pass relevant information into the prompt and get an LLM to use that relevant information as the basis of its generation

.
Generation - This is where the LLM will generate a response that has been flavoured/augmented with the retrieved resources. In turn, this not only gives us a potentially more correct answer, it also gives us resources to investigate more (since we know which resources went into the prompt). prompt)..


In [12]:
import os

# Get PDF document
pdf_path = "H:/mongol_empire.pdf"

# What is a token 
A sub-word piece of text. For example, "hello, world!" could be split into ["hello", ",", "world", "!"]. A token can be a whole word,
part of a word or group of punctuation characters. 1 token ~= 4 characters in English, 100 tokens ~= 75 words.
Text gets broken into tokens before being passed to an LLM. see: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

### Token Count and why we care?
When we pass these tokens to our LLMs and embeddings, there needs to be a size limit to how many tokens we pass

In [101]:
# Requires !pip install PyMuPDF, see: https://github.com/pymupdf/pymupdf
!pip install PyMuPDF
import fitz 
from tqdm.auto import tqdm # for progress bars, requires !pip install tqdm 

def text_formatter(text: str) -> str:
    """Performs minor formatting on text."""
    cleaned_text = text.replace("\n", " ").strip() # note: this might be different for each doc (best to experiment)

    # Other potential text formatting functions can go here
    return cleaned_text

# Open PDF and get lines/pages
# Note: this only focuses on text, rather than images/figures etc
def open_and_read_pdf(pdf_path: str) -> list[dict]:
    """
    Opens a PDF file, reads its text content page by page, and lists stats.

    Parameters:
        pdf_path (str): The file path to the PDF document to be opened and read.

    Returns:
        list[dict]: A list of dictionaries, each containing the page number
        (adjusted), character count, word count, sentence count, token count, and the extracted text
        for each page.
    """
    doc = fitz.open(pdf_path)  # open a document
    pages_and_texts = []
    for page_number, page in tqdm(enumerate(doc)):  # iterate the document pages
        text = page.get_text()  # get plain text encoded as UTF-8
        text = text_formatter(text)
        pages_and_texts.append({"page_number": page_number,  # adjust page numbers since our PDF starts on page 42
                                "page_char_count": len(text),
                                "page_word_count": len(text.split(" ")),
                                "page_sentence_count_raw": len(text.split(". ")),
                                "page_token_count": len(text) / 4,  # 1 token = ~4 chars, see: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
                                "text": text})
    return pages_and_texts

pages_and_texts = open_and_read_pdf(pdf_path=pdf_path)
pages_and_texts[:2]



0it [00:00, ?it/s]

[{'page_number': 0,
  'page_char_count': 1999,
  'page_word_count': 344,
  'page_sentence_count_raw': 13,
  'page_token_count': 499.75,
  'text': 'Mongol empire, empire founded by Genghis Khan in 1206. Originating from the Mongol heartland in the Steppe of central Asia, by the late 13th century it spanned from the Pacific Ocean in the east to the Danube River and the shores of the Persian Gulf in the west. At its peak, it covered some 9 million square miles (23 million square km) of territory, making it the largest contiguous land empire in world history The year 1206, when Temüjin, son of Yesügei, was elected Genghis Khan of a federation of tribes on the banks of the Onon River, must be regarded as the beginning of the Mongol empire. This federation not only consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal leaders fighting for supremacy in the steppe regions south and southeast of 

### Splitting Pages Into Sentences

Easier to handle than larger pages of text, especially if pages are densely filled with text.
There is no one best way of processing text before embedding and there are multiple ways to do it

A simple method I've found helpful is to break the text into chunks. As in, chunk a page of text into groups of 10 or more sentences (this value is variable and can be changed to fit your needs).

We can use an NLP library such as spaCy to do this 



In [16]:
!pip install -U pip setuptools wheel
!pip install -U spacy
-m pip install -U pip setuptools wheel

Collecting pip
  Downloading pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Collecting setuptools
  Downloading setuptools-69.2.0-py3-none-any.whl.metadata (6.3 kB)
Collecting wheel
  Using cached wheel-0.43.0-py3-none-any.whl.metadata (2.2 kB)
Downloading pip-24.0-py3-none-any.whl (2.1 MB)
   ---------------------------------------- 0.0/2.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/2.1 MB ? eta -:--:--
    --------------------------------------- 0.0/2.1 MB 495.5 kB/s eta 0:00:05
   -- ------------------------------------- 0.1/2.1 MB 1.0 MB/s eta 0:00:02
   ---- ----------------------------------- 0.3/2.1 MB 1.6 MB/s eta 0:00:02
   -------- ------------------------------- 0.5/2.1 MB 2.3 MB/s eta 0:00:01
   ----------- ---------------------------- 0.6/2.1 MB 2.4 MB/s eta 0:00:01
   --------------- ------------------------ 0.8/2.1 MB 2.8 MB/s eta 0:00:01
   ----------------------- ---------------- 1.2/2.1 MB 3.6 MB/s eta 0:00:01
   ---------------------------------- -

ERROR: To modify pip, please run the following command:
C:\Users\user\anaconda3_2\python.exe -m pip install -U pip setuptools wheel


Collecting spacy
  Downloading spacy-3.7.4-cp311-cp311-win_amd64.whl.metadata (27 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.10-cp311-cp311-win_amd64.whl.metadata (2.0 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.8-cp311-cp311-win_amd64.whl.metadata (8.6 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.9-cp311-cp311-win_amd64.whl.metadata (2.2 kB)
Collecting thinc<8.3.0,>=8.2.2 (from spacy)
  Downloading thinc-8.2.3-cp311-cp311-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.2-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.4.8-cp311-cp311-win_amd64.

In [102]:
from spacy.lang.en import English 

nlp = English()

# Add a sentencizer pipeline
nlp.add_pipe("sentencizer")

# Create a document instance as an example
doc = nlp("This is a sentence. This another sentence.")
assert len(list(doc.sents)) == 2

# Access the sentences of the document
list(doc.sents)

[This is a sentence., This another sentence.]

In [20]:
for item in tqdm(pages_and_texts):
    item["sentences"] = list(nlp(item["text"]).sents)
    
    # Make sure all sentences are strings
    item["sentences"] = [str(sentence) for sentence in item["sentences"]]
    
    # Count the sentences 
    item["page_sentence_count_spacy"] = len(item["sentences"])

  0%|          | 0/2 [00:00<?, ?it/s]

In [28]:
import random
random.sample(pages_and_texts, k=1)

[{'page_number': 1,
  'page_char_count': 1664,
  'page_word_count': 260,
  'page_sentence_count_raw': 10,
  'page_token_count': 416.0,
  'text': 'sacked by Mongol armies (1220–21). Advance troops (after crossing the Caucasus) even penetrated into southern Russia and raided cities in Crimea (1223). The once prosperous region of Khwārezm suffered for centuries from the effects of the Mongol invasion which brought about not only the destruction of the prosperous towns but also the disintegration of the irrigation system on which agriculture in those parts depended. A similarly destructive campaign was launched against Xi Xia in 1226–27 because the Xi Xia king had refused to assist the Mongols in their expedition against Khwārezm. The death of Genghis Khan during that campaign (1227) increased the vindictiveness of the Mongols. The Xi Xia culture, a mixture of Chinese and Tibetan elements, with Buddhism as the state religion, was virtually annihilated. In 1227 the Mongol dominions stretche

# Chunking Sentences

Why do we do this?

Easier to manage similar sized chunks of text.
Don't overload the embedding models capacity for tokens (e.g. if an embedding model has a capacity of 384 tokens, there could be information loss if you try to embed a sequence of 400+ tokens).
Our LLM context window (the amount of tokens an LLM can take in) may be limited and requires compute power so we want to make sure we're using it as well as possible.


In [29]:
# Define split size to turn groups of sentences into chunks
num_sentence_chunk_size = 10 

# Create a function that recursively splits a list into desired sizes
def split_list(input_list: list, 
               slice_size: int) -> list[list[str]]:
    """
    Splits the input_list into sublists of size slice_size (or as close as possible).

    For example, a list of 17 sentences would be split into two lists of [[10], [7]]
    """
    return [input_list[i:i + slice_size] for i in range(0, len(input_list), slice_size)]

# Loop through pages and texts and split sentences into chunks
for item in tqdm(pages_and_texts):
    item["sentence_chunks"] = split_list(input_list=item["sentences"],
                                         slice_size=num_sentence_chunk_size)
    item["num_chunks"] = len(item["sentence_chunks"])

  0%|          | 0/2 [00:00<?, ?it/s]

In [30]:
random.sample(pages_and_texts, k=1)

[{'page_number': 0,
  'page_char_count': 1999,
  'page_word_count': 344,
  'page_sentence_count_raw': 13,
  'page_token_count': 499.75,
  'text': 'Mongol empire, empire founded by Genghis Khan in 1206. Originating from the Mongol heartland in the Steppe of central Asia, by the late 13th century it spanned from the Pacific Ocean in the east to the Danube River and the shores of the Persian Gulf in the west. At its peak, it covered some 9 million square miles (23 million square km) of territory, making it the largest contiguous land empire in world history The year 1206, when Temüjin, son of Yesügei, was elected Genghis Khan of a federation of tribes on the banks of the Onon River, must be regarded as the beginning of the Mongol empire. This federation not only consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal leaders fighting for supremacy in the steppe regions south and southeast of 

In [31]:
import re

# Split each chunk into its own item
pages_and_chunks = []
for item in tqdm(pages_and_texts):
    for sentence_chunk in item["sentence_chunks"]:
        chunk_dict = {}
        chunk_dict["page_number"] = item["page_number"]
        
        # Join the sentences together into a paragraph-like structure, aka a chunk (so they are a single string)
        joined_sentence_chunk = "".join(sentence_chunk).replace("  ", " ").strip()
        joined_sentence_chunk = re.sub(r'\.([A-Z])', r'. \1', joined_sentence_chunk) # ".A" -> ". A" for any full-stop/capital letter combo 
        chunk_dict["sentence_chunk"] = joined_sentence_chunk

        # Get stats about the chunk
        chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
        chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])
        chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4 # 1 token = ~4 characters
        
        pages_and_chunks.append(chunk_dict)

# How many chunks do we have?
len(pages_and_chunks)

  0%|          | 0/2 [00:00<?, ?it/s]

3

In [32]:
random.sample(pages_and_chunks, k=1)

[{'page_number': 1,
  'sentence_chunk': 'sacked by Mongol armies (1220–21). Advance troops (after crossing the Caucasus) even penetrated into southern Russia and raided cities in Crimea (1223). The once prosperous region of Khwārezm suffered for centuries from the effects of the Mongol invasion which brought about not only the destruction of the prosperous towns but also the disintegration of the irrigation system on which agriculture in those parts depended. A similarly destructive campaign was launched against Xi Xia in 1226–27 because the Xi Xia king had refused to assist the Mongols in their expedition against Khwārezm. The death of Genghis Khan during that campaign (1227) increased the vindictiveness of the Mongols. The Xi Xia culture, a mixture of Chinese and Tibetan elements, with Buddhism as the state religion, was virtually annihilated. In 1227 the Mongol dominions stretched over the vast regions between the Caspian and China seas, bordering in the north on the sparsely popula

## Embedding our text chunks
While humans understand text, machines understand numbers best

The most powerful thing about modern embeddings is that they are learned representations.

Meaning rather than directly mapping words/tokens/characters to numbers directly (e.g. {"a": 0, "b": 1, "c": 3...}), the numerical representation of tokens is learned by going through large corpuses of text and figuring out how different tokens relate to each other.

Ideally, embeddings of text will mean that similar meaning texts have similar numerical representati
Our goal is to turn each of our chunks into a numerical representation (an embedding vector, where a vector is a sequence of numbers arranged in order).

Once our text samples are in embedding vectors, us humans will no longer be able to understand the.

However, we don't needto.

The embedding vectors are for our computers to undertand.

We'll use our computers to find patterns in the embeddings and then we can use their text mappings to further our unders

To do so, we'll use the sentence-transformers library which contains many pre-trained embedding models.

Specifically, we'll get the all-mpnet-base-v2 model (you can see the model's intended use on the Hugging Face model card).tanding.on.
.

In [33]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-2.6.1-py3-none-any.whl.metadata (11 kB)
Downloading sentence_transformers-2.6.1-py3-none-any.whl (163 kB)
   ---------------------------------------- 0.0/163.3 kB ? eta -:--:--
   -- ------------------------------------- 10.2/163.3 kB ? eta -:--:--
   ------- ------------------------------- 30.7/163.3 kB 640.0 kB/s eta 0:00:01
   ------------------- ------------------- 81.9/163.3 kB 762.6 kB/s eta 0:00:01
   ------------------------------------- -- 153.6/163.3 kB 1.0 MB/s eta 0:00:01
   -------------------------------------- 163.3/163.3 kB 975.0 kB/s eta 0:00:00
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.6.1


In [104]:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(model_name_or_path="all-mpnet-base-v2", 
                                      device="cpu") # choose the device to load the model to (note: GPU will often be *much* faster than CPU)

# Create a list of sentences to turn into numbers
sentences = [
    "The Sentences Transformers library provides an easy and open-source way to create embeddings."
]

# Sentences are encoded/embedded by calling model.encode()
embeddings = embedding_model.encode(sentences)
embeddings_dict = dict(zip(sentences, embeddings))

# See the embeddings
for sentence, embedding in embeddings_dict.items():
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

Sentence: The Sentences Transformers library provides an easy and open-source way to create embeddings.
Embedding: [-2.07982566e-02  3.03164721e-02 -2.01217886e-02  6.86484799e-02
 -2.55256146e-02 -8.47687945e-03 -2.07216086e-04 -6.32377639e-02
  2.81606950e-02 -3.33353765e-02  3.02634221e-02  5.30721694e-02
 -5.03526740e-02  2.62288526e-02  3.33313718e-02 -4.51577567e-02
  3.63045111e-02 -1.37120660e-03 -1.20171243e-02  1.14947073e-02
  5.04510999e-02  4.70856875e-02  2.11913846e-02  5.14606386e-02
 -2.03746744e-02 -3.58889401e-02 -6.67769345e-04 -2.94393916e-02
  4.95859198e-02 -1.05639435e-02 -1.52014121e-02 -1.31760724e-03
  4.48197052e-02  1.56023102e-02  8.60379259e-07 -1.21393567e-03
 -2.37978958e-02 -9.09396622e-04  7.34487548e-03 -2.53931922e-03
  5.23370430e-02 -4.68043461e-02  1.66214649e-02  4.71579656e-02
 -4.15599123e-02  9.01964959e-04  3.60278338e-02  3.42214145e-02
  9.68227461e-02  5.94829284e-02 -1.64984427e-02 -3.51249203e-02
  5.92515618e-03 -7.07914936e-04 -2.4103

### Note: No matter the size of the text input to our all-mpnet-base-v2 model, it will be turned into an embedding size of (768,). This value is fixed. So whether a sentence is 1 token long or 1000 tokens long, it will be truncated/padded with zeros to size 384 and then turned into an embedding vector of size (768,). Of course, other embedding models may have different input/output shapes.

In [40]:
# Uncomment to see how long it takes to create embeddings on CPU
# Make sure the model is on the CPU
embedding_model.to("cpu")

# Embed each chunk one by one
for item in tqdm(pages_and_chunks):
    item["embedding"] = embedding_model.encode(item["sentence_chunk"])

  0%|          | 0/3 [00:00<?, ?it/s]

In [97]:
# Turn text chunks into a single list
text_chunks = [item["sentence_chunk"] for item in pages_and_chunks]
text_chunks

['Mongol empire, empire founded by Genghis Khan in 1206. Originating from the Mongol heartland in the Steppe of central Asia, by the late 13th century it spanned from the Pacific Ocean in the east to the Danube River and the shores of the Persian Gulf in the west. At its peak, it covered some 9 million square miles (23 million square km) of territory, making it the largest contiguous land empire in world history The year 1206, when Temüjin, son of Yesügei, was elected Genghis Khan of a federation of tribes on the banks of the Onon River, must be regarded as the beginning of the Mongol empire. This federation not only consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal leaders fighting for supremacy in the steppe regions south and southeast of Lake Baikal; his victories over the Kereit and then the Naiman Turks, however, gave him undisputed authority over the whole of what is now Mongol

In [38]:
%%time

# Embed all texts in batches
text_chunk_embeddings = embedding_model.encode(text_chunks,
                                               batch_size=32, # you can use different batch sizes here for speed/performance, I found 32 works well for this use case
                                               convert_to_tensor=True) # optional to return embeddings as tensor instead of array

text_chunk_embeddings

CPU times: total: 7.86 s
Wall time: 1.06 s


tensor([[-0.0156, -0.0241, -0.0123,  ...,  0.0545, -0.0024, -0.0283],
        [ 0.0277, -0.0080,  0.0032,  ...,  0.0597, -0.0321,  0.0079],
        [ 0.0235, -0.0350,  0.0030,  ...,  0.0582, -0.0390, -0.0166]])

### Saving embeddings to csv for later use

In [98]:
# Save embeddings to file
import pandas as pd
text_chunks_and_embeddings_df = pd.DataFrame(pages_and_chunks)
embeddings_df_save_path = "text_chunks_and_embeddings_df.csv"
text_chunks_and_embeddings_df.to_csv(embeddings_df_save_path, index=False)

Unnamed: 0,page_number,sentence_chunk,chunk_char_count,chunk_word_count,chunk_token_count,embedding
0,0,"Mongol empire, empire founded by Genghis Khan ...",1698,295,424.5,"[-0.0156282205, -0.0241277367, -0.0123113962, ..."
1,0,In 1218 the Khara-Khitai state in east Turkist...,300,49,75.0,"[0.0276959464, -0.00797973946, 0.00317786518, ..."
2,1,sacked by Mongol armies (1220–21). Advance tro...,1664,260,416.0,"[0.0234714299, -0.0350027271, 0.00303653511, -..."


### Turning embeddings to a tensor

In [94]:
import random

import torch
import numpy as np 

device = "cpu"

# Import texts and embedding df
text_chunks_and_embedding_df = pd.read_csv("text_chunks_and_embeddings_df.csv")

# Convert embedding column back to np.array (it got converted to string when it got saved to CSV)
text_chunks_and_embedding_df["embedding"] = text_chunks_and_embedding_df["embedding"].apply(lambda x: np.fromstring(x.strip("[]"), sep=" "))

# Convert texts and embedding df to list of dicts
pages_and_chunks = text_chunks_and_embedding_df.to_dict(orient="records")

# Convert embeddings to torch tensor and send to device (note: NumPy arrays are float64, torch tensors are float32 by default)
embeddings = torch.tensor(np.array(text_chunks_and_embedding_df["embedding"].tolist()), dtype=torch.float32).to(device)
embeddings.shape

torch.Size([3, 768])

In [95]:
from sentence_transformers import util, SentenceTransformer

"""Readying Our Embedding Model"""
embedding_model = SentenceTransformer(model_name_or_path="all-mpnet-base-v2", 
                                      device=device) # choose the device to load the model to

## Similarity measures: dot product
- Measure of magnitude and direction between two vectors
- Vectors that are aligned in direction and magnitude have a higher positive value
- Vectors that are opposite in direction and magnitude have a higher negative value

In [105]:
# 1. Define the query
query = "Attack"
print(f"Query: {query}")

# 2. Embed the query to the same numerical space as the text examples 
# Note: It's important to embed your query with the same model you embedded your examples with.
query_embedding = embedding_model.encode(query, convert_to_tensor=True)

# 3. Get similarity scores with the dot product (wLet's time this to see)
from time import perf_counter as timer

start_time = timer()
dot_scores = util.dot_score(a=query_embedding, b=embeddings)[0]
end_time = timer()

print(f"Time take to get scores on {len(embeddings)} embeddings: {end_time-start_time:.5f} seconds.")

# 4. Get the top-k results (we'll keep this to 5)
top_results_dot_product = torch.topk(dot_scores, k=1)
top_results_dot_product

Query: Attack
Time take to get scores on 1 embeddings: 0.00090 seconds.


torch.return_types.topk(
values=tensor([0.0568]),
indices=tensor([0]))

In [60]:
# Define helper function to print wrapped text 
import textwrap

def print_wrapped(text, wrap_length=80):
    wrapped_text = textwrap.fill(text, wrap_length)
    print(wrapped_text)

In [106]:
print(f"Query: '{query}'\n")
print("Results:")
# Loop through zipped together scores and indicies from torch.topk
for score, idx in zip(top_results_dot_product[0], top_results_dot_product[1]):
    print(f"Score: {score:.4f}")
    # Print relevant sentence chunk (since the scores are in descending order, the most relevant chunk will be first)
    print("Text:")
    print_wrapped(pages_and_chunks[idx]["sentence_chunk"])

Query: 'Attack'

Results:
Score: 0.0568
Text:
Mongol empire, empire founded by Genghis Khan in 1206. Originating from the
Mongol heartland in the Steppe of central Asia, by the late 13th century it
spanned from the Pacific Ocean in the east to the Danube River and the shores of
the Persian Gulf in the west. At its peak, it covered some 9 million square
miles (23 million square km) of territory, making it the largest contiguous land
empire in world history The year 1206, when Temüjin, son of Yesügei, was elected
Genghis Khan of a federation of tribes on the banks of the Onon River, must be
regarded as the beginning of the Mongol empire. This federation not only
consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but
also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal
leaders fighting for supremacy in the steppe regions south and southeast of Lake
Baikal; his victories over the Kereit and then the Naiman Turks, however, gave
him undisputed au

### Running Our LLM

In [68]:
from ctransformers import AutoModelForCausalLM
from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "K:/llama2_GGUF_cacheDir/models--TheBloke--Llama-2-7B-Chat-GGUF/TheBloke/Mistral-7B-OpenOrca-GGUF",
    model_file="mistral-7b-openorca.Q5_K_M.gguf",
    model_type="mistral",
    gpu_layers = 12                                                               
    )

' long and has a rhyme scheme of ABABCDCD.\n'

In [110]:
def retrieve_relevant_resources(query: str,
                                embeddings: torch.tensor,
                                model: SentenceTransformer=embedding_model,
                                n_resources_to_return: int=1,
                                print_time: bool=True):
    """
    Embeds a query with model and returns top k scores and indices from embeddings.
    """

    # Embed the query
    query_embedding = model.encode(query, 
                                   convert_to_tensor=True) 

    # Get dot product scores on embeddings
    start_time = timer()
    dot_scores = util.dot_score(query_embedding, embeddings)
    end_time = timer()

    if print_time:
        print(f"[INFO] Time taken to get scores on {len(embeddings)} embeddings: {end_time-start_time:.5f} seconds.")

    scores, indices = torch.topk(input=dot_scores, 
                                 k=n_resources_to_return)

    return scores, indices

def print_top_results_and_scores(query: str,
                                 embeddings: torch.tensor,
                                 pages_and_chunks: list[dict]=pages_and_chunks,
                                 n_resources_to_return: int=1):
    """
    Takes a query, retrieves most relevant resources and prints them out in descending order.
    """
    
    scores, indices = retrieve_relevant_resources(query=query,
                                                  embeddings=embeddings,
                                                  n_resources_to_return=n_resources_to_return)
    
    print(f"Query: {query}\n")
    print("Results:")
    # Loop through zipped together scores and indicies
    for score, index in zip(scores, indices):
        print(f"Score: {score:.4f}")
        # Print relevant sentence chunk (since the scores are in descending order, the most relevant chunk will be first)
        print_wrapped(pages_and_chunks[index]["sentence_chunk"])
        # Print the page number too so we can reference the textbook further and check the results
        print(f"Page number: {pages_and_chunks[index]['page_number']}")
        print("\n")

In [111]:
query = "Loss of territory"

# Get just the scores and indices of top related results
scores, indices = retrieve_relevant_resources(query=query,
                                              embeddings=embeddings)
scores, indices

[INFO] Time taken to get scores on 1 embeddings: 0.00012 seconds.


(tensor([[-0.0059]]), tensor([[0]]))

In [80]:
# Print out the texts of the top scores
print_top_results_and_scores(query=query,
                             embeddings=embeddings)

[INFO] Time taken to get scores on 3 embeddings: 0.00006 seconds.
Query: Loss of territory

Results:
Score: 0.2860
In 1218 the Khara-Khitai state in east Turkistan was absorbed into the empire.
The assassination of Muslim subjects of Genghis Khan by the Khwārezmians in
Otrar led to a war with the sultanate of Khwārezm (Khiva) in west Turkistan
(1219–25). Bukhara, Samarkand, and the capital Urgench were taken and
Page number: 0


Score: 0.2619
sacked by Mongol armies (1220–21). Advance troops (after crossing the Caucasus)
even penetrated into southern Russia and raided cities in Crimea (1223). The
once prosperous region of Khwārezm suffered for centuries from the effects of
the Mongol invasion which brought about not only the destruction of the
prosperous towns but also the disintegration of the irrigation system on which
agriculture in those parts depended. A similarly destructive campaign was
launched against Xi Xia in 1226–27 because the Xi Xia king had refused to assist
the Mongols 

In [112]:
context_items = [pages_and_chunks[i] for i in indices]
context = "- " + "\n- ".join([item["sentence_chunk"] for item in context_items])

In [113]:
context

'- Mongol empire, empire founded by Genghis Khan in 1206. Originating from the Mongol heartland in the Steppe of central Asia, by the late 13th century it spanned from the Pacific Ocean in the east to the Danube River and the shores of the Persian Gulf in the west. At its peak, it covered some 9 million square miles (23 million square km) of territory, making it the largest contiguous land empire in world history The year 1206, when Temüjin, son of Yesügei, was elected Genghis Khan of a federation of tribes on the banks of the Onon River, must be regarded as the beginning of the Mongol empire. This federation not only consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal leaders fighting for supremacy in the steppe regions south and southeast of Lake Baikal; his victories over the Kereit and then the Naiman Turks, however, gave him undisputed authority over the whole of what is now Mongo

In [114]:
prompt = "Based on the given context, answer the following question delimited by backticks: `Where did the mongols attack first?` context : {context}".format(context = context)

In [115]:
prompt

'Based on the given context, answer the following question delimited by backticks: `Where did the mongols attack first?` context : - Mongol empire, empire founded by Genghis Khan in 1206. Originating from the Mongol heartland in the Steppe of central Asia, by the late 13th century it spanned from the Pacific Ocean in the east to the Danube River and the shores of the Persian Gulf in the west. At its peak, it covered some 9 million square miles (23 million square km) of territory, making it the largest contiguous land empire in world history The year 1206, when Temüjin, son of Yesügei, was elected Genghis Khan of a federation of tribes on the banks of the Onon River, must be regarded as the beginning of the Mongol empire. This federation not only consisted of Mongols in the proper sense—that is, Mongol-speaking tribes—but also other Turkic tribes. Before 1206 Genghis Khan was but one of the tribal leaders fighting for supremacy in the steppe regions south and southeast of Lake Baikal; h