## Install Dependencies

Install the necessary libraries for our simple RAG system.

| Library | What it does |
|---------|--------------|
| **requests** | Makes HTTP calls to LM Studio's API (like a messenger between your code and the LLM) |
| **sentence-transformers** | Converts text into numerical vectors (embeddings) so we can measure similarity between questions and documents |
| **chromadb** | A vector database that stores embeddings and finds the most similar documents quickly |
| **pypdf** | Reads and extracts text from PDF files |

In [3]:
%pip install requests sentence-transformers chromadb pypdf -q

Note: you may need to restart the kernel to use updated packages.


## Import Libraries

Import the tools we'll use in our RAG pipeline.

In [4]:
import requests
from sentence_transformers import SentenceTransformer
import chromadb
from pypdf import PdfReader
import os

  from .autonotebook import tqdm as notebook_tqdm


## Initial Configuration

Configure LM Studio as our local LLM provider. 

**Important**: Make sure LM Studio is running with a model loaded and the local server enabled (default: `http://localhost:1234`).

In [5]:
# LM Studio Configuration
# LM Studio provides an OpenAI-compatible REST API running locally
LM_STUDIO_URL = "http://127.0.0.1:1234/v1/chat/completions"

def chat_with_llm(messages, max_tokens=256, temperature=0.3):
    """Send messages to LM Studio and get a response."""
    payload = {
        "messages": messages,
        "temperature": temperature,
        "max_tokens": max_tokens
    }
    
    response = requests.post(LM_STUDIO_URL, json=payload)
    response.raise_for_status()
    
    return response.json()["choices"][0]["message"]["content"].strip()

# Model for generating embeddings (runs locally, no API needed)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

print("‚úì LM Studio endpoint configured")
print(f"  URL: {LM_STUDIO_URL}")
print("‚úì Embedding model loaded")

‚úì LM Studio endpoint configured
  URL: http://127.0.0.1:1234/v1/chat/completions
‚úì Embedding model loaded


## Test LM Studio Connection

Let's verify that LM Studio is running and responding correctly.

In [6]:
# Test connection to LM Studio
try:
    test_response = chat_with_llm([{"role": "user", "content": "Say 'OK' if you can hear me."}], max_tokens=10)
    print("‚úì LM Studio connection successful!")
    print(f"  Response: {test_response}")
except requests.exceptions.ConnectionError:
    print("‚úó Could not connect to LM Studio")
    print("\n  Make sure:")
    print("  1. LM Studio is running")
    print("  2. A model is loaded")
    print("  3. Local server is enabled (Settings ‚Üí Local Server ‚Üí Start Server)")
except Exception as e:
    print(f"‚úó Error: {e}")

‚úì LM Studio connection successful!
  Response: OK


## Configuration Parameters

Configure chunking and retrieval parameters. Adjust these values to experiment with different settings.

In [7]:
# =============================================================================
# CONFIGURATION - Adjust these parameters to experiment!
# =============================================================================

# Chunking parameters
CHUNK_SIZE = 500          # Maximum characters per chunk
CHUNK_OVERLAP = 50        # Characters to overlap between chunks
MIN_CHUNK_LENGTH = 50     # Minimum characters to keep a chunk (filters noise)

# Retrieval parameters
NUM_RESULTS = 3           # Number of documents to retrieve per query

# Document paths
DOCS_PATH = "Docs"
HR_PDF = "Company Policies.pdf"      # HR knowledge base
TECH_PDF = "Corporate Policies.pdf"  # TECH knowledge base

# Metadata to extract (set to True/False)
METADATA_OPTIONS = {
    "source": True,       # Source filename
    "page": True,         # Page number
    "chunk_id": True,     # Chunk identifier within document
    "char_count": True,   # Character count of chunk
    "category": True,     # HR or TECH category
}

print("‚úì Configuration loaded")
print(f"  Chunk size: {CHUNK_SIZE} chars")
print(f"  Chunk overlap: {CHUNK_OVERLAP} chars")
print(f"  Min chunk length: {MIN_CHUNK_LENGTH} chars")
print(f"  Results per query: {NUM_RESULTS}")

‚úì Configuration loaded
  Chunk size: 500 chars
  Chunk overlap: 50 chars
  Min chunk length: 50 chars
  Results per query: 3


## Load Knowledge Bases from PDFs

Load documents from the PDF files with configurable chunking:
- **Company Policies.pdf** ‚Üí HR knowledge base
- **Corporate Policies.pdf** ‚Üí TECH knowledge base

The chunking algorithm:
1. Splits text into chunks of `CHUNK_SIZE` characters
2. Tries to break at sentence boundaries (periods, newlines)
3. Overlaps chunks by `CHUNK_OVERLAP` characters for context continuity
4. Filters out chunks smaller than `MIN_CHUNK_LENGTH`

In [8]:
def chunk_text(text, chunk_size=CHUNK_SIZE, overlap=CHUNK_OVERLAP):
    """Split text into overlapping chunks of specified size."""
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        
        # Try to break at sentence end if possible
        if end < len(text):
            last_period = chunk.rfind('.')
            last_newline = chunk.rfind('\n')
            break_point = max(last_period, last_newline)
            if break_point > chunk_size // 2:  # Only if we found a good break point
                chunk = chunk[:break_point + 1]
                end = start + break_point + 1
        
        chunks.append(chunk.strip())
        start = end - overlap
    
    return chunks

def load_pdf(filepath, category):
    """Extract text from a PDF file and split into configurable chunks."""
    reader = PdfReader(filepath)
    all_chunks = []
    chunk_counter = 0
    
    for page_num, page in enumerate(reader.pages, 1):
        text = page.extract_text()
        if text:
            # Clean up the text
            text = text.replace('\x00', '')  # Remove null characters
            
            # Split into chunks
            page_chunks = chunk_text(text)
            
            for chunk in page_chunks:
                # Skip very short chunks
                if len(chunk) < MIN_CHUNK_LENGTH:
                    continue
                
                chunk_counter += 1
                
                # Build metadata based on configuration
                metadata = {}
                if METADATA_OPTIONS.get("source"):
                    metadata["source"] = os.path.basename(filepath)
                if METADATA_OPTIONS.get("page"):
                    metadata["page"] = page_num
                if METADATA_OPTIONS.get("chunk_id"):
                    metadata["chunk_id"] = chunk_counter
                if METADATA_OPTIONS.get("char_count"):
                    metadata["char_count"] = len(chunk)
                if METADATA_OPTIONS.get("category"):
                    metadata["category"] = category
                
                all_chunks.append({
                    "text": chunk,
                    "metadata": metadata
                })
    
    return all_chunks

# Load HR documents (Company Policies)
hr_pdf_path = os.path.join(DOCS_PATH, HR_PDF)
hr_chunks = load_pdf(hr_pdf_path, category="HR")
print(f"‚úì Loaded {len(hr_chunks)} chunks from {HR_PDF} (HR)")

# Load TECH documents (Corporate Policies)  
tech_pdf_path = os.path.join(DOCS_PATH, TECH_PDF)
tech_chunks = load_pdf(tech_pdf_path, category="TECH")
print(f"‚úì Loaded {len(tech_chunks)} chunks from {TECH_PDF} (TECH)")

# General documents (for OTHER intent)
general_chunks = [{
    "text": "For questions not related to HR or IT policies, please contact your manager or the reception desk.",
    "metadata": {"source": "General", "page": 0, "chunk_id": 1, "char_count": 97, "category": "OTHER"}
}]
print(f"‚úì Loaded {len(general_chunks)} chunks for General queries")

# Show sample chunk
print(f"\nüìÑ Sample chunk from HR:")
print(f"   Length: {hr_chunks[0]['metadata'].get('char_count', 'N/A')} chars")
print(f"   Page: {hr_chunks[0]['metadata'].get('page', 'N/A')}")
print(f"   Preview: {hr_chunks[0]['text'][:100]}...")

‚úì Loaded 34 chunks from Company Policies.pdf (HR)
‚úì Loaded 31 chunks from Corporate Policies.pdf (TECH)
‚úì Loaded 1 chunks for General queries

üìÑ Sample chunk from HR:
   Length: 352 chars
   Page: 1
   Preview: COMPANY POLICIES 
Employee Handbook 
TABLE OF CONTENTS 
1. Introduction and Purpose 
2. Code of Cond...


## Initialize Vector Database

Use ChromaDB to store and search documents using embeddings.

In [11]:
# Initialize ChromaDB in memory
chroma_client = chromadb.Client()

# Create collections for each knowledge base
# Note: ChromaDB requires collection names to be at least 3 characters
hr_collection = chroma_client.get_or_create_collection(name="hr_base")
tech_collection = chroma_client.get_or_create_collection(name="tech_base")
general_collection = chroma_client.get_or_create_collection(name="general_base")

print("‚úì Vector database initialized")
print("  Collections: hr_base, tech_base, general_base")

‚úì Vector database initialized
  Collections: hr_base, tech_base, general_base


## Load Documents into Vector Database

Convert document chunks to embeddings and store them in their respective collections with metadata (source file and page number).

In [12]:
def load_chunks_to_collection(collection, chunks, prefix):
    """Load document chunks into a collection with embeddings and metadata."""
    if not chunks:
        print(f"‚ö† No chunks to load for {prefix.upper()}")
        return
    
    texts = [chunk["text"] for chunk in chunks]
    embeddings = embedding_model.encode(texts, show_progress_bar=True)
    
    for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
        collection.add(
            embeddings=[emb.tolist()],
            documents=[chunk["text"]],
            metadatas=[chunk["metadata"]],
            ids=[f"{prefix}_{i}"]
        )
    print(f"‚úì Loaded {len(chunks)} chunks into {prefix.upper()} collection")

# Load all knowledge bases
print("Loading embeddings (this may take a moment)...")
load_chunks_to_collection(hr_collection, hr_chunks, "hr")
load_chunks_to_collection(tech_collection, tech_chunks, "tech")
load_chunks_to_collection(general_collection, general_chunks, "general")
print("\n‚úì All collections loaded!")

Loading embeddings (this may take a moment)...


Batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:00<00:00,  2.97it/s]


‚úì Loaded 34 chunks into HR collection


Batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  2.28it/s]


‚úì Loaded 31 chunks into TECH collection


Batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 58.05it/s]

‚úì Loaded 1 chunks into GENERAL collection

‚úì All collections loaded!





## Step 1: Intent Classifier

Use an LLM to classify the user's question into one of three categories:

- **HR**: Questions about employees, attendance, leave, benefits, dress code, workplace behavior, harassment, conflicts, disciplinary procedures, grievances, or the employee handbook.

- **TECH**: Questions about data protection, GDPR/CCPA, information security, acceptable use of technology, passwords, MFA, VPN, incident response, data retention, or remote work security.

- **OTHER**: Questions not related to HR policies or data privacy/technology policies, or that are too general or unclear.

In [13]:
# Intent Classifier Prompt (System + User)
CLASSIFIER_SYSTEM_PROMPT = """You are an intent classifier for a corporate assistant. Your ONLY job is to classify questions into exactly one category.

Categories:
- HR: Questions about employees, attendance, leave, benefits, dress code, workplace behavior, 
  harassment, conflicts, disciplinary procedures, grievances, or the employee handbook.
- TECH: Questions about data protection, GDPR/CCPA, information security, acceptable use of technology, 
  passwords, MFA, VPN, incident response, data retention, or remote work security.
- OTHER: Questions not related to HR policies or data privacy/technology policies, or that are too general or unclear.

Rules:
- Respond with ONLY one word: HR, TECH, or OTHER
- Do not explain your reasoning
- Do not add any other text"""

def classify_intent(question):
    """Classify the user's question intent using local LLM."""
    
    messages = [
        {"role": "system", "content": CLASSIFIER_SYSTEM_PROMPT},
        {"role": "user", "content": f"Classify this question: {question}"}
    ]
    
    intent = chat_with_llm(messages, max_tokens=10, temperature=0).upper()
    
    # Ensure we get a valid intent
    if "HR" in intent:
        return "HR"
    elif "TECH" in intent:
        return "TECH"
    else:
        return "OTHER"

## Step 2: Router (Switch)

Based on the classified intent, select the correct knowledge base.

In [14]:
def route_collection(intent):
    """Route the query to the appropriate collection based on intent."""
    
    if intent == "HR":
        return hr_collection, "HR Base"
    elif intent == "TECH":
        return tech_collection, "IT Support Base"
    else:
        return general_collection, "General Base"

## Step 3: Retrieval (Document Search)

Search for the most relevant documents in the selected knowledge base.

In [15]:
def search_relevant_documents(collection, question, n_results=NUM_RESULTS):
    """Search for relevant documents using embeddings."""
    
    # Generate question embedding
    question_embedding = embedding_model.encode([question])[0]
    
    # Search for similar documents
    results = collection.query(
        query_embeddings=[question_embedding.tolist()],
        n_results=n_results,
        include=["documents", "metadatas", "distances"]
    )
    
    # Return documents with their metadata and similarity score
    docs_with_meta = []
    for doc, meta, dist in zip(results['documents'][0], results['metadatas'][0], results['distances'][0]):
        docs_with_meta.append({
            "text": doc,
            "source": meta.get("source", "Unknown"),
            "page": meta.get("page", 0),
            "chunk_id": meta.get("chunk_id", 0),
            "char_count": meta.get("char_count", len(doc)),
            "category": meta.get("category", "Unknown"),
            "similarity": round(1 - dist, 3)  # Convert distance to similarity
        })
    
    return docs_with_meta

## Step 4: Response Generator

Use an LLM to generate a response based on the retrieved context.

The generator uses a specialized prompt that:
- Acts as the Official HR & IT Assistant
- Answers based ONLY on the provided context
- Cites sources at the end of the answer
- Refuses to invent information
- Follows safety guidelines

In [17]:
# Response Generator System Prompt
GENERATOR_SYSTEM_PROMPT = """<role>
You are the Official HR & IT Assistant for the company. Your goal is to answer employee questions ACCURATELY based
ONLY on the provided context. You represent the company, so be professional, polite, and concise.
</role>

<instructions>
1. **Analyze the Context:** Read the provided document chunks carefully.

2. **Direct Answer:** Answer the user's question directly. Do not start with "Based on the documents...". Just say the answer.

3. **Cite Your Source:** At the end of your answer, mention which policy or section supports your statement (
    e.g., "Source: Company Policies, Page 3").

4. **Be Honest:** If the answer is NOT in the provided context, state clearly: "I cannot find specific information about that in 
    the current policies. Please check with HR directly." DO NOT invent information.

5. **Formatting:** Use bullet points for lists (like requirements or steps) to make it easy to read.

6. **Tone:** Helpful, clear, and safe.
</instructions>

<safety_check>
- Do not provide medical or legal advice.
- Do not share passwords or sensitive keys if they appear in the text.
- If the user asks something unethical (how to bypass security), politely refuse based on the "Acceptable Use Policy".
</safety_check>"""

def generate_response(question, context_docs, intent):
    """Generate a response using the local LLM with retrieved context."""
    
    # Format context with source citations
    context_parts = []
    for doc in context_docs:
        source_info = f"[{doc['source']}, Page {doc['page']}]"
        context_parts.append(f"{source_info}\n{doc['text']}")
    
    context_text = "\n\n---\n\n".join(context_parts)
    
    # Customize context header based on intent
    if intent == "HR":
        context_header = "Relevant excerpts from the Employee Handbook (Company Policies):"
    elif intent == "TECH":
        context_header = "Relevant excerpts from Tech/Security Policies (Corporate Policies):"
    else:
        context_header = "Relevant company information:"
    
    user_message = f"""<context>
{context_header}

{context_text}
</context>

<question>
{question}
</question>"""
    
    messages = [
        {"role": "system", "content": GENERATOR_SYSTEM_PROMPT},
        {"role": "user", "content": user_message}
    ]
    
    return chat_with_llm(messages, max_tokens=512, temperature=0.3)

## Complete RAG Pipeline

Combine all steps into a function that executes the complete flow.

In [18]:
def rag_pipeline(question):
    """Execute the complete RAG pipeline."""
    
    print("\n" + "="*60)
    print(f"üìù QUESTION: {question}")
    print("="*60)
    
    # Step 1: Classify intent
    print("\nüîç Step 1: Classifying intent...")
    intent = classify_intent(question)
    print(f"   ‚Üí Detected intent: {intent}")
    
    # Step 2: Route to the correct collection
    print("\nüîÄ Step 2: Routing to knowledge base...")
    collection, base_name = route_collection(intent)
    print(f"   ‚Üí Selected base: {base_name}")
    
    # Step 3: Search for relevant documents
    print("\nüìö Step 3: Retrieving relevant documents...")
    documents = search_relevant_documents(collection, question)
    for i, doc in enumerate(documents, 1):
        print(f"   {i}. [{doc['source']}, p.{doc['page']}, chunk #{doc['chunk_id']}]")
        print(f"      Similarity: {doc['similarity']} | Chars: {doc['char_count']}")
        print(f"      Preview: {doc['text'][:60]}...")
    
    # Step 4: Generate response
    print("\nü§ñ Step 4: Generating response with LLM...")
    response = generate_response(question, documents, intent)
    
    print("\n" + "="*60)
    print("‚úÖ FINAL ANSWER:")
    print(response)
    print("="*60 + "\n")
    
    return response

## Usage Examples

Let's test our RAG with different types of questions.

### Example 1: HR Question

In [None]:
rag_pipeline("How many vacation days do I have per year?")

### Example 2: IT Support Question

In [None]:
rag_pipeline("How do I reset my corporate password?")

### Example 3: General Question

In [None]:
rag_pipeline("Where is the cafeteria?")

## Test Your Own Question

In [19]:
# Change the question to whatever you want to test
my_question = "Can I work from home?"
rag_pipeline(my_question)


üìù QUESTION: Can I work from home?

üîç Step 1: Classifying intent...
   ‚Üí Detected intent: OTHER

üîÄ Step 2: Routing to knowledge base...
   ‚Üí Selected base: General Base

üìö Step 3: Retrieving relevant documents...
   1. [General, p.0, chunk #1]
      Similarity: -0.487 | Chars: 97
      Preview: For questions not related to HR or IT policies, please conta...

ü§ñ Step 4: Generating response with LLM...

‚úÖ FINAL ANSWER:
* Direct Answer: Yes, you can work from home with prior approval from your manager and HR. (Source: Company Policies, Page 2)
* Cite Your Source: Company Policies, Page 2
* Be Honest: If the answer is NOT in the provided context, state clearly: I cannot find specific information about that in the current policies. Please check with HR directly.
* Formatting: Bullet points for lists (like requirements or steps) to make it easy to read.
* Tone: Helpful, clear, and safe.



'* Direct Answer: Yes, you can work from home with prior approval from your manager and HR. (Source: Company Policies, Page 2)\n* Cite Your Source: Company Policies, Page 2\n* Be Honest: If the answer is NOT in the provided context, state clearly: I cannot find specific information about that in the current policies. Please check with HR directly.\n* Formatting: Bullet points for lists (like requirements or steps) to make it easy to read.\n* Tone: Helpful, clear, and safe.'

In [None]:
# Install necessary libraries for a simple local RAG setup
# - langchain, langchain-community, langchain-ollama: LangChain core and community integrations (ollama helps connect to a local LLM)
# - chromadb: vector store for embeddings
# - pypdf: PDF document loading
# - sentence-transformers: embeddings models
# - directory-loader: helper for loading documents from a directory
# Install quietly (-q). Use %pip (recommended within Jupyter) rather than !pip.
%pip install -q --upgrade pip
%pip install -q langchain langchain-community langchain-ollama chromadb pypdf sentence-transformers directory-loader

# Try to restart the kernel automatically so newly installed packages can be imported.
# Automatic restart works in classic Jupyter; if it fails, please restart the kernel manually (Kernel -> Restart).
from IPython.display import display, Javascript
try:
    display(Javascript("Jupyter.notebook.kernel.restart()"))
except Exception:
    try:
        display(Javascript("IPython.notebook.kernel.restart()"))
    except Exception:
        print("Install complete. Please restart the kernel manually (Kernel -> Restart) to use the new packages.")


## RAG Flow Summary

Our system implements the following flow:

```
1. User asks a question
   ‚Üì
2. LLM Classifier (LM Studio) ‚Üí identifies intent (HR/TECH/OTHER)
   ‚Üì
3. Switch/Router ‚Üí selects knowledge base
   ‚Üì
4. Retrieval ‚Üí searches relevant documents using embeddings
   ‚Üì
5. Generator (LM Studio) ‚Üí LLM creates response with context
   ‚Üì
6. User receives grounded answer
```

### Components used:
- üñ•Ô∏è **LM Studio**: Local LLM for classification and generation (privacy-friendly!)
- üî¢ **Sentence Transformers**: Local embeddings (all-MiniLM-L6-v2)
- üì¶ **ChromaDB**: In-memory vector database
- üåê **Requests**: Simple HTTP client (works globally, no restrictions)

### Advantages of this approach:
- ‚úÖ **100% Local**: No data leaves your machine
- ‚úÖ **No API costs**: Everything runs locally
- ‚úÖ **Global availability**: Uses standard `requests` library (works in China, etc.)
- ‚úÖ **Grounded responses**: Based on real documents
- ‚úÖ **Targeted search**: Each question goes to the right base
- ‚úÖ **Traceability**: We can see which documents were used

### Possible improvements:
- üìä Add confidence metrics
- üîÑ Implement document re-ranking
- üìù Load documents from real PDFs
- üéØ Improve classifier with few-shot examples
- üíæ Use a persistent database