---

## üìö Step 1: Install Required Libraries

### What Does This Code Do?
Installs all the Python libraries we need to build our RAG system.

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Install the libraries I need to build a RAG system: langchain, chromadb, pypdf, openai, and sentence-transformers
```

### Explanation of Each Library:
| Library | What Is It For? |
|---------|----------------|
| `langchain` | Main framework for working with LLMs |
| `langchain-community` | Connectors for ChromaDB and other services |
| `chromadb` | Vector database (stores the embeddings) |
| `pypdf` | Read PDF files |
| `openai` | Client to connect with LM Studio |
| `sentence-transformers` | Embedding models |

In [28]:
# Install all required libraries
# The -q flag means "quiet" (less output)

!pip install -q langchain langchain-community chromadb pypdf openai sentence-transformers tiktoken

---

## üìÑ Step 2: Load the PDF Document

### What Does This Code Do?
Reads a PDF file and extracts all the text it contains. Each page becomes a separate "document".

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Load a PDF file called "Company Policies.pdf" that is in the Docs folder
```

### Key Concepts:
- **Document Loader**: Tool that reads files and converts them into objects that LangChain can use
- **docs**: List of documents, where each document contains the content of one page

In [29]:
from langchain_community.document_loaders import PyPDFLoader
import os

# First, let's check where we are
print(f"üìÇ Current directory: {os.getcwd()}")

# Use absolute path to be safe
file_name = "/home/user/RAG Course Enhaced/Docs/Company Policies.pdf"

# Verify the file exists
if os.path.exists(file_name):
    print(f"‚úÖ File found: {file_name}")
else:
    print(f"‚ùå File not found: {file_name}")
    print(f"Available files in Docs folder:")
    docs_folder = "/home/user/RAG Course Enhaced/Docs"
    if os.path.exists(docs_folder):
        for f in os.listdir(docs_folder):
            print(f"  - {f}")

# Create the PDF loader
loader = PyPDFLoader(file_name)

# Load the document (this reads all pages)
docs = loader.load()

# Show how many pages were loaded
print(f"‚úÖ Loaded {len(docs)} pages from the PDF")

‚úÖ Loaded 8 pages from the PDF


### üîç Let's Explore What a Document Contains

Let's look at the first page to understand the structure:

In [30]:
# Let's see the content of the first page
print("=" * 50)
print("CONTENT OF THE FIRST PAGE:")
print("=" * 50)
print(docs[0].page_content[:500])  # First 500 characters
print("\n...")
print("\nüìã Metadata:", docs[0].metadata)

CONTENT OF THE FIRST PAGE:
COMPANY POLICIES 
Employee Handbook 
TABLE OF CONTENTS 
1. Introduction and Purpose 
2. Code of Conduct 
3. Attendance and Punctuality 
4. Leave Policy 
5. Workplace Health and Safety 
6. Anti-Harassment and Non-Discrimination 
7. Dress Code 
8. Conflict of Interest 
9. Disciplinary Procedures 
10. Grievance Procedures 
11. Employee Benefits Overview

...

üìã Metadata: {'producer': 'PyPDF', 'creator': 'Microsoft Word', 'creationdate': '2025-12-17T10:11:12-08:00', 'title': '(anonymous)', 'author': '(anonymous)', 'subject': '(unspecified)', 'moddate': '2025-12-17T10:11:12-08:00', 'source': './Docs/Company Policies.pdf', 'total_pages': 8, 'page': 0, 'page_label': '1'}


---

## ‚úÇÔ∏è Step 3: Split the Text into Chunks

### Why Do We Need to Split the Text?

Embedding models have a **token limit** they can process. If we give them a very long text, they'll only process part of it and we'll lose information.

### What Is a "Chunk"?
A chunk is a fragment of the original document. We split the text into smaller chunks to:
1. Fit within the embedding model's limit
2. Make searches more precise (find exactly the relevant part)

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Split the docs variable into smaller chunks using langchain. Use 500 characters per chunk with 50 character overlap.
```

### Important Parameters:
| Parameter | Value | Explanation |
|-----------|-------|-------------|
| `chunk_size` | 500 | Maximum characters per chunk |
| `chunk_overlap` | 50 | Characters that repeat between consecutive chunks |

### Why Do We Use Overlap?
Overlap ensures we don't cut ideas in half. If an important sentence is between two chunks, the overlap makes it appear complete in at least one of them.

In [31]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create the text splitter
# chunk_size=500: each fragment will have a maximum of 500 characters
# chunk_overlap=50: fragments overlap by 50 characters
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len
)

# Split the documents into chunks
chunks = text_splitter.split_documents(docs)

# Show how many chunks were created
print(f"‚úÖ Created {len(chunks)} chunks from {len(docs)} pages")

‚úÖ Created 32 chunks from 8 pages


### üîç Let's See Some Example Chunks

In [32]:
# Let's see the first chunk
print("FIRST CHUNK:")
print("-" * 40)
print(chunks[0].page_content)
print(f"\nüìè Length: {len(chunks[0].page_content)} characters")

FIRST CHUNK:
----------------------------------------
COMPANY POLICIES 
Employee Handbook 
TABLE OF CONTENTS 
1. Introduction and Purpose 
2. Code of Conduct 
3. Attendance and Punctuality 
4. Leave Policy 
5. Workplace Health and Safety 
6. Anti-Harassment and Non-Discrimination 
7. Dress Code 
8. Conflict of Interest 
9. Disciplinary Procedures 
10. Grievance Procedures 
11. Employee Benefits Overview

üìè Length: 352 characters


In [33]:
# Let's see the last chunk
print("LAST CHUNK:")
print("-" * 40)
print(chunks[-1].page_content)
print(f"\nüìè Length: {len(chunks[-1].page_content)} characters")

LAST CHUNK:
----------------------------------------
company property may result in deductions from final pay as permitted by law. 
11.8 Final Pay 
Final pay will be processed in accordance with applicable state and federal laws. This includes 
payment for all hours worked, accrued but unused vacation time as applicable, and any other earned 
compensation. Information about benefit continuation options will be provided.

üìè Length: 370 characters


---

## üßÆ Step 4: Create the Embedding Function

### What Are Embeddings?

**Embeddings** are numerical representations of text. They convert words and sentences into vectors (lists of numbers) that capture semantic meaning.

**Simple Example:**
- "dog" and "puppy" will have similar embeddings (they're close in vector space)
- "dog" and "mathematics" will have very different embeddings (they're far apart)

### What Model Will We Use?

We'll use **nomic-embed-text:v1.5** through LM Studio. This model:
- Is open source and free
- Has good performance for English and Spanish texts
- Can process up to 8192 tokens

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Create an embedding function that connects to LM Studio running on localhost port 1234. Use the nomic-embed-text model.
```

### ‚ö†Ô∏è Important:
Make sure you have the `nomic-embed-text-v1.5` model loaded in LM Studio before running this code.

In [34]:
import requests
from typing import List
from langchain_core.embeddings import Embeddings

class LMStudioEmbeddings(Embeddings):
    """Custom embedding class for LM Studio compatibility."""
    
    def __init__(self, base_url: str, model: str):
        self.base_url = base_url
        self.model = model
        self.url = f"{base_url}/embeddings"
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents."""
        embeddings = []
        for text in texts:
            response = requests.post(
                self.url,
                json={"input": text, "model": self.model}
            )
            response.raise_for_status()
            embeddings.append(response.json()["data"][0]["embedding"])
        return embeddings
    
    def embed_query(self, text: str) -> List[float]:
        """Embed a single query."""
        response = requests.post(
            self.url,
            json={"input": text, "model": self.model}
        )
        response.raise_for_status()
        return response.json()["data"][0]["embedding"]

# Configure the embedding function to connect with LM Studio
embedding_function = LMStudioEmbeddings(
    base_url="http://127.0.0.1:1234/v1",
    model="nomic-embed-text-v1.5"
)

print("‚úÖ Embedding function configured successfully")

‚úÖ Embedding function configured successfully


### üß™ Let's Test the Embeddings

Let's see how embeddings work with a simple example:

In [35]:
# Test with a simple sentence
test_text = "I love programming in Python"

# Get the embedding
embedding = embedding_function.embed_query(test_text)

# Show embedding information
print(f"üìù Text: '{test_text}'")
print(f"üìä Embedding dimensions: {len(embedding)}")
print(f"üî¢ First 5 values: {embedding[:5]}")

üìù Text: 'I love programming in Python'
üìä Embedding dimensions: 768
üî¢ First 5 values: [-0.016399774700403214, 0.08437343686819077, -0.12448632717132568, -0.020135633647441864, 0.02330748550593853]


### üéØ Let's See the Similarity Between Texts

Embeddings allow us to measure how similar two texts are:

In [36]:
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors.
    Returns a value between -1 and 1, where 1 = identical."""
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Test texts
texts = [
    "I really enjoy coffee in the morning",
    "I like drinking coffee when I wake up",
    "Programming is my passion",
    "Cats are independent animals"
]

# Get the embeddings
embeddings = embedding_function.embed_documents(texts)

# Compare the first text with all others
print(f"üìù Base text: '{texts[0]}'\n")
print("Similarity with other texts:")
print("-" * 50)

for i in range(1, len(texts)):
    sim = cosine_similarity(embeddings[0], embeddings[i])
    print(f"  '{texts[i]}'")
    print(f"  ‚Üí Similarity: {sim:.4f}\n")

üìù Base text: 'I really enjoy coffee in the morning'

Similarity with other texts:
--------------------------------------------------
  'I like drinking coffee when I wake up'
  ‚Üí Similarity: 0.8733

  'Programming is my passion'
  ‚Üí Similarity: 0.5398

  'Cats are independent animals'
  ‚Üí Similarity: 0.3898



---

## üóÑÔ∏è Step 5: Create the Vector Database (ChromaDB)

### What Is a Vector Database?

A vector database stores embeddings and allows you to quickly search for the most similar ones to a query. It's like a very efficient index for finding similar texts.

### What Does ChromaDB Do?
1. **Stores** the chunks and their embeddings
2. **Indexes** the embeddings for fast searches
3. **Searches** for the most similar chunks to a question

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Create a chromadb database using the chunks and embedding_function variables. Save it to a folder called my_database.
```

### üí° Note:
This step may take a few minutes depending on how many chunks you have.

In [37]:
from langchain_community.vectorstores import Chroma

# Directory where the database will be saved
db_directory = "./my_database"

print("üîÑ Creating vector database...")
print("   (This may take a few moments)")

# Create the vector database
# This: 1) generates embeddings for each chunk, 2) saves them in ChromaDB
vector_database = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_function,
    persist_directory=db_directory
)

print(f"\n‚úÖ Database created successfully!")
print(f"üìÅ Saved to: {db_directory}")
print(f"üìä Total documents indexed: {len(chunks)}")

üîÑ Creating vector database...
   (This may take a few moments)

‚úÖ Database created successfully!
üìÅ Saved to: ./my_database
üìä Total documents indexed: 32


---

## üîé Step 6: Search for Relevant Documents

### How Does the Search Work?

When you ask a question:
1. The question is converted into an embedding
2. ChromaDB searches for chunks whose embeddings are most similar
3. It returns the K most relevant chunks

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Search the vector_database for documents related to "What is the vacation policy?" and show me the results.
```

In [38]:
# Define our question
question = "What is the vacation policy?"

# Search for the 3 most relevant chunks
relevant_documents = vector_database.similarity_search(question, k=3)

# Display the results
print(f"üîç Question: '{question}'")
print(f"\nüìö Found {len(relevant_documents)} relevant documents:\n")

for i, doc in enumerate(relevant_documents, 1):
    print(f"{'='*50}")
    print(f"üìÑ Document {i}:")
    print(f"{'='*50}")
    print(doc.page_content)
    print()

üîç Question: 'What is the vacation policy?'

üìö Found 3 relevant documents:

üìÑ Document 1:
4. LEAVE POLICY 
4.1 Annual Leave 
Full-time employees are entitled to paid annual leave based on their length of service. Leave accrual 
begins from the first day of employment. Employees must submit leave requests through the 
appropriate system and obtain approval from their supervisor before taking time off. 
4.2 Sick Leave 
Sick leave is provided to employees who are unable to work due to illness or injury. A medical certificate

üìÑ Document 2:
4. LEAVE POLICY 
4.1 Annual Leave 
Full-time employees are entitled to paid annual leave based on their length of service. Leave accrual 
begins from the first day of employment. Employees must submit leave requests through the 
appropriate system and obtain approval from their supervisor before taking time off. 
4.2 Sick Leave 
Sick leave is provided to employees who are unable to work due to illness or injury. A medical certificate

üìÑ Do

---

## üîå Step 7: Connect to the LLM in LM Studio

### What Does This Step Do?

We create a client to communicate with the language model running in LM Studio. We use the `openai` library because LM Studio exposes a compatible API.

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Connect to the LLM running in LM Studio on localhost port 1234. Test it with a simple question.
```

### ‚ö†Ô∏è Important:
Make sure LM Studio is running and has a model loaded.

In [39]:
from openai import OpenAI

# Configure the client to connect with LM Studio
llm_client = OpenAI(
    base_url="http://127.0.0.1:1234/v1",  # LM Studio URL
    api_key="lm-studio"  # No real API key required
)

print("‚úÖ LLM client configured")
print("üîó Connected to: http://127.0.0.1:1234")

‚úÖ LLM client configured
üîó Connected to: http://127.0.0.1:1234


### üß™ Let's Test the Connection with a Simple Question

In [40]:
# Test with a simple question
test_messages = [
    {"role": "system", "content": "You are a helpful assistant. Answer briefly."},
    {"role": "user", "content": "What is Python in one sentence?"}
]

try:
    response = llm_client.chat.completions.create(
        model="lm-studio",  # LM Studio uses whatever model is loaded
        messages=test_messages,
        temperature=0.7,
        max_tokens=100
    )
    
    print("‚úÖ Connection successful!")
    print(f"\nü§ñ Response: {response.choices[0].message.content}")
    
except Exception as e:
    print(f"‚ùå Connection error: {e}")
    print("\nüí° Make sure LM Studio is running and has a model loaded.")

‚úÖ Connection successful!

ü§ñ Response: Python is a high-level, interpreted programming language for general-purpose computing, guiding the development of efficient and readable code.


---

## üìù Step 8: Create the Augmented Prompt

### What Is an Augmented Prompt?

It's the combination of:
1. **Context**: The relevant documents we found
2. **Instructions**: How we want the model to respond
3. **Question**: What the user wants to know

This is the **"A" in RAG** - the **Augmentation**.

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Create a function that combines a question with the retrieved documents into a prompt for the LLM.
```

In [41]:
def create_augmented_prompt(question, documents):
    """
    Creates an augmented prompt by combining the question with context.
    
    Args:
        question: The user's question
        documents: List of relevant documents from ChromaDB
    
    Returns:
        String with the complete prompt to send to the LLM
    """
    # Combine the content of all documents
    context = "\n\n".join([doc.page_content for doc in documents])
    
    # Create the augmented prompt
    prompt = f"""Use the following information to answer the user's question.
If you can't find the answer in the provided information, say you don't know.
Answer clearly and concisely.

=== CONTEXT INFORMATION ===
{context}
=== END OF CONTEXT ===

Question: {question}

Answer:"""
    
    return prompt

print("‚úÖ create_augmented_prompt function defined")

‚úÖ create_augmented_prompt function defined


### üîç Let's See What an Augmented Prompt Looks Like

In [42]:
# Create an example prompt
example_question = "What is the vacation policy?"
example_docs = vector_database.similarity_search(example_question, k=2)

augmented_prompt = create_augmented_prompt(example_question, example_docs)

print("üìù AUGMENTED PROMPT:")
print("=" * 60)
print(augmented_prompt)
print("=" * 60)

üìù AUGMENTED PROMPT:
Use the following information to answer the user's question.
If you can't find the answer in the provided information, say you don't know.
Answer clearly and concisely.

=== CONTEXT INFORMATION ===
4. LEAVE POLICY 
4.1 Annual Leave 
Full-time employees are entitled to paid annual leave based on their length of service. Leave accrual 
begins from the first day of employment. Employees must submit leave requests through the 
appropriate system and obtain approval from their supervisor before taking time off. 
4.2 Sick Leave 
Sick leave is provided to employees who are unable to work due to illness or injury. A medical certificate

4. LEAVE POLICY 
4.1 Annual Leave 
Full-time employees are entitled to paid annual leave based on their length of service. Leave accrual 
begins from the first day of employment. Employees must submit leave requests through the 
appropriate system and obtain approval from their supervisor before taking time off. 
4.2 Sick Leave 
Sick leav

---

## ü§ñ Step 9: Get the LLM Response

### What Does This Step Do?

We send the augmented prompt to the language model and get its response. This is the **"G" in RAG** - the **Generation**.

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Create a function that sends a prompt to the LLM and returns the response.
```

In [43]:
def get_response(client, prompt):
    """
    Sends the prompt to the LLM and gets the response.
    
    Args:
        client: OpenAI client configured for LM Studio
        prompt: The augmented prompt with context and question
    
    Returns:
        String with the model's response, or None if there's an error
    """
    messages = [
        {
            "role": "system", 
            "content": "You are a helpful assistant that answers questions based on the provided context."
        },
        {
            "role": "user", 
            "content": prompt
        }
    ]
    
    try:
        response = client.chat.completions.create(
            model="lm-studio",
            messages=messages,
            temperature=0.3,  # Lower = more consistent responses
            max_tokens=500    # Response length limit
        )
        
        return response.choices[0].message.content
        
    except Exception as e:
        print(f"‚ùå Error getting response: {e}")
        return None

print("‚úÖ get_response function defined")

‚úÖ get_response function defined


---

## üîó Step 10: Create the Complete RAG Pipeline

### What Is a Pipeline?

It's a function that joins all the previous steps into a continuous flow:

```
Question ‚Üí Search ‚Üí Create Prompt ‚Üí Get Response ‚Üí Final Answer
```

### ü§ñ How to Ask the Jupyter Agent:
```
/generate Create a function that takes a question, searches for relevant documents, creates the prompt, and gets the LLM response.
```

In [44]:
def rag_pipeline(question, database, client, num_documents=3):
    """
    Complete RAG pipeline: Retrieval ‚Üí Augmentation ‚Üí Generation
    
    Args:
        question: The user's question
        database: ChromaDB vector database
        client: OpenAI client for LM Studio
        num_documents: Number of documents to retrieve
    
    Returns:
        String with the model's response
    """
    print("üîÑ Starting RAG pipeline...")
    print(f"\nüìù Question: {question}")
    
    # Step 1: RETRIEVAL - Search for relevant documents
    print("\nüîç Step 1: Searching for relevant documents...")
    documents = database.similarity_search(question, k=num_documents)
    print(f"   ‚úÖ Found {len(documents)} documents")
    
    # Step 2: AUGMENTATION - Create the augmented prompt
    print("\nüìã Step 2: Creating augmented prompt...")
    prompt = create_augmented_prompt(question, documents)
    print("   ‚úÖ Prompt created")
    
    # Step 3: GENERATION - Get response from LLM
    print("\nü§ñ Step 3: Generating response...")
    response = get_response(client, prompt)
    
    if response:
        print("   ‚úÖ Response generated")
    else:
        print("   ‚ùå Error generating response")
    
    return response

print("‚úÖ rag_pipeline function defined")

‚úÖ rag_pipeline function defined


---

## üéâ Step 11: Let's Test Our RAG System!

Now let's ask questions about our documents and see the RAG system's responses.

In [45]:
# Ask a question about the document
my_question = "What is the difference between the vacation policy and sick day policy?"

# Run the RAG pipeline
response = rag_pipeline(
    question=my_question,
    database=vector_database,
    client=llm_client
)

# Display the response
print("\n" + "=" * 60)
print("üì£ FINAL RESPONSE:")
print("=" * 60)
print(response)

üîÑ Starting RAG pipeline...

üìù Question: What is the difference between the vacation policy and sick day policy?

üîç Step 1: Searching for relevant documents...
   ‚úÖ Found 3 documents

üìã Step 2: Creating augmented prompt...
   ‚úÖ Prompt created

ü§ñ Step 3: Generating response...
   ‚úÖ Response generated

üì£ FINAL RESPONSE:
The difference between the vacation policy and sick leave policy is as follows:

* Annual Leave (Vacation): Full-time employees are entitled to paid annual leave based on their length of service. Leave accrual begins from the first day of employment, and employees must submit leave requests through the appropriate system and obtain approval from their supervisor before taking time off.
* Sick Leave: Sick leave is provided to employees who are unable to work due to illness or injury. A medical certificate may be required for absences exceeding three consecutive working days, and sick leave should not be used for purposes other than legitimate medical

### üîÑ Let's Ask More Questions

In [None]:
# Try with different questions
test_questions = [
    "How many vacation days do I get per year?",
    "What should I do if I'm sick?",
    "Can I work from home?"
]

for question in test_questions:
    print("\n" + "#" * 70)
    response = rag_pipeline(question, vector_database, llm_client)
    print("\nüì£ RESPONSE:")
    print(response)
    print("#" * 70)

---

## üìö Summary: What Did We Learn?

### The 3 Components of RAG:

1. **R - Retrieval**
   - We loaded documents (PDF)
   - We split them into chunks
   - We created embeddings and stored them in ChromaDB
   - We searched for the most relevant chunks for each question

2. **A - Augmentation**
   - We combined the found documents with the question
   - We created a prompt with context for the LLM

3. **G - Generation**
   - We sent the prompt to the LLM
   - The model generated a response based on the context

### Technologies Used:
| Technology | Use |
|------------|-----|
| LM Studio | Run the LLM locally |
| nomic-embed-text | Convert text to embeddings |
| ChromaDB | Store and search embeddings |
| LangChain | Orchestrate the entire process |

