# Local Retrieval-Augmented Generation (RAG) Basics


### What is Retrieval-Augmented Generation (RAG)?
RAG = Combine retrieval (finding relevant information) with generation (producing responses).
Useful for question-answering, creative text generation, etc.

# Retrieval

First, let's focus on the retrieval part 

## Embeddings Basics

We use sentence-transformer for our embeddings.  
Sentence Transformers (a.k.a. SBERT) is the go-to Python module for accessing, using, and training state-of-the-art embedding and reranker models.  
You can check it out here:  
https://www.sbert.net/index.html 

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2') # a small general purpose model (80MB)
# this will download the model to your machine, might take a while

Next, let's define some sentences and encode them.
Encoding means that we convert our sentences into vectors (embeddings)

In [None]:
sentences = [
    "Ping me if you need anything.",
    "Let me know if you have any questions.",
    "this email thread was used to train a drone",
]

embeddings = model.encode(sentences)

When we print the shape of our embeddings we can see we have three sentences with 384 dimensions each

In [None]:
print(embeddings.shape)

with our embeddings ready, we can now compute the semantic similarity between all sentences.

In [None]:
similarities = model.similarity(embeddings, embeddings)
print(similarities)

Let's visualize it

In [None]:
import matplotlib.pyplot as plt

plt.imshow(similarities, cmap="copper")
plt.colorbar()
plt.xticks(range(len(sentences)))
plt.yticks(range(len(sentences)))
plt.title("Similarity Matrix")
plt.show()

Each row and column corresponds to one sentence. The number at position [i, j] shows similarity between sentence i and sentence j.
As you can see sentence 0 and 1 are somwhat related but sentence 2 is not related to either of them. the diagonal line are the sentences being fully related to themselves.
Sentence 0 (“ping me”) and Sentence 1 (“let me know”) → Similarity 0.435 (related).
Sentence 0 and Sentence 2 (“train a drone”) → Similarity 0.1194 (unrelated).

## Chunking (splitting text)

Let's say we want to work with longer texts, it's a good idea to split or *chunk* it into smaller parts.

In [None]:
example_text = """
A measure of uncertainty of an outcome, rather than the perceived lack of order. 
A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. 
Randomness exists when some outcomes occur without any order, unpredictably, or by chance. 
These notions are distinct, but they all have a close connection to probability. 
Individual random events are unpredictable, but since they often follow a probability distribution, the frequency of different outcomes over numerous events (or “trials”) is predictable: 
when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4.
"""

We can write our own chunking functions, for example using a max character length per chunk:

In [None]:
def chunk_text_by_length(text, max_length):
    chunks = []
    current_chunk = ""

    for word in text.split():
        # Check if adding the next word exceeds the max length
        if len(current_chunk) + len(word) + 1 <= max_length:
            current_chunk += (word + " ")
        else:
            chunks.append(current_chunk.strip())
            current_chunk = word + " "
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

In [None]:
print(chunk_text_by_length(example_text, 50))

as you can see that might not be the best approach as a lot of meaning is getting lost between chunks.
Another approach could be to simply chunk by periods, so that we ideally end up with single sentences.

In [None]:
def chunk_text_by_period(text):
    sentences = text.split('.')
    chunks = []

    for sentence in sentences:
        sentence = sentence.strip()
        if sentence:
            chunks.append(sentence + '.')

    return chunks

In [None]:
print(chunk_text_by_period(example_text))

this is already an improvement, however it can result in chunks of varying length. Also it might be good in many cases to have bigger chunks, what about paragraphs for example? since text can have so many forms, there is no one-size-fits-all solution.
There are some prewritten methods that you can use to chunk text that we will cover later.

In [None]:
# here add chunking functions

chunks = chunk_text_by_period(example_text)

## Vector Databases (Chroma DB)

We use Chroma as a vector database. When working with a large number of embeddings it's a good idea to store them in a DB. Chroma is an open-source search and retrieval database that you can easily deploy locally.  
You can check it out here: https://www.trychroma.com/ 

In [None]:
import chromadb

# Create client and collection
client = chromadb.Client()
collection = client.create_collection("example_collection")

# Add embeddings manually
collection.add(
    embeddings=embeddings,
    documents=chunks,
    ids=["chunk1", "chunk2", "chunk3", "chunk4", "chunk5"] # improve this
)

## Similarity Search 

In [None]:
query = "What is random?"
query_embedding = model.encode(query)

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=2
)

print("Results:", results)

In [None]:
# Next Steps: Parsing PDF & Using LangChain (covered later)
