## **Baby steps: Introduction to Retrieval-Augmented Generation (RAG)**

RAG is a powerful technique in modern NLP that combines **retrieval of relevant information** with **generative language models**. This approach improves the model's ability to give accurate, factual, and context-aware responses.

### Objective

In this lesson, we’ll walk through the core idea of RAG by:
- Exploring token counting
- Creating embeddings for a question and a document
- Calculating their similarity using cosine similarity


#### 1. **Understanding the Input**
We begin with two simple text snippets:

In [1]:
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

### 2. **Counting Tokens**

Large Language Models (LLMs) work with *tokens*, not just raw text. It's important to know how many tokens your input contains.


In [2]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(string))

num_tokens = num_tokens_from_string(question, "cl100k_base")

### 3. **Creating Embeddings**

Embeddings are vector representations of text that capture the meaning of the input. We use Hugging Face’s sentence transformer for this.

In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
query_result = embedding.embed_query(question)
document_result = embedding.embed_query(document)

In [None]:
# View embeddings

document_result

[-0.007852751761674881,
 -0.01840461790561676,
 0.060350432991981506,
 0.01421717181801796,
 -0.09984562546014786,
 0.004160252399742603,
 0.08463755995035172,
 -0.021178409457206726,
 0.04404458776116371,
 0.058976877480745316,
 -0.030047299340367317,
 -0.03176412731409073,
 -0.03454701974987984,
 0.048566900193691254,
 0.04431091248989105,
 -0.03737654536962509,
 -0.04251202568411827,
 -0.012810299172997475,
 0.011346356943249702,
 -0.0005846027634106576,
 -0.10687334090471268,
 0.030345534905791283,
 0.06017252802848816,
 -0.01764993742108345,
 -0.12543296813964844,
 0.03252345323562622,
 0.024271735921502113,
 -0.02137298509478569,
 -0.009406570345163345,
 -0.026591738685965538,
 -0.12005410343408585,
 0.011454201303422451,
 -0.04182516038417816,
 0.011113758198916912,
 -0.0009443189483135939,
 0.013203891925513744,
 0.021611636504530907,
 -0.02105608582496643,
 0.03535083681344986,
 0.04604317620396614,
 0.005173682700842619,
 0.003269157139584422,
 0.03985467553138733,
 0.0183232

### 4. **Measuring Similarity**

We then measure how similar the question is to the document using **cosine similarity**, a common method in vector math. *A cosine similarity close to 1 means the question and document are very similar.*


In [6]:
import numpy as np 

def cosine_similarity(vec1, vec2):
    # Compute the dot product (numerator of cosine similarity)
    dot_product = np.dot(vec1, vec2)
    
    # Compute the magnitude of each vector using Linear Algebra (linalg)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    
    # Divide the dot product by the product of the magnitudes
    return dot_product / (norm_vec1 * norm_vec2)

#### Calculate the cosine similarity between the embedded question and document


In [None]:
similarity = cosine_similarity(query_result, document_result)

print("Cosine Similarity:", similarity)

Cosine Similarity: 0.5782657054688973


##### **Explanation:**
The cosine similarity score of **0.578** shows that the question *"What kinds of pets do I like?"* and the document *"My favorite pet is a cat."* are **moderately similar in meaning**. This is because both talk about **pets**, and the document could be seen as a partial answer to the question. The score reflects a **semantic match**, useful in retrieval systems like RAG.

### Summary

| Concept | What It Does |
|--------|--------------|
| Tokenization | Breaks input into model-understandable pieces |
| Embedding | Converts text into vector form |
| Cosine Similarity | Measures closeness of meaning |