# Task
Refactored Notebook: Orchestration and Demo for RAG System

This notebook serves as a thin orchestration and demonstration layer for a Retrieval Augmented Generation (RAG) system. It leverages external Python scripts (`scripts/chunk_data.py`, `scripts/embed_data.py`, `scripts/rag_ollama.py`) for core functionalities like document processing, embedding generation, and FAISS indexing.

**Goal**: To demonstrate the end-to-end RAG workflow by setting up the environment, building necessary artifacts (one-time process), and then querying the RAG system using the provided external scripts, without duplicating their logic within the notebook.

**Key Features**:
- Repository cloning and dependency installation.
- One-time artifact generation (chunking, embedding, FAISS index creation) from actual documents.
- End-to-end RAG query demonstration using a local Ollama LLM.

# SETUP CELL

## Verify Repository Structure

Verify that the repository `/content/mcp-local-llm` exists and clone it if it doesn't. After ensuring its presence, print its directory tree.

In [1]:
import os

repo_path = "/content/mcp-local-llm"

# Clone the repository if it doesn't exist
if not os.path.exists(repo_path):
    print(f"The directory '{repo_path}' does not exist. Cloning the repository now...")
    !git clone https://github.com/AniketRajSingh/mcp-local-llm.git {repo_path}
    print("Repository cloned successfully.")
else:
    print(f"The directory '{repo_path}' already exists. Skipping cloning.")

# Verify again and print directory tree
if os.path.exists(repo_path):
    print(f"\nVerification: The directory '{repo_path}' now exists. Printing its directory tree:\n")
    !ls -R {repo_path}
else:
    print(f"Verification failed: The directory '{repo_path}' still does not exist after attempted cloning.")


The directory '/content/mcp-local-llm' does not exist. Cloning the repository now...
Cloning into '/content/mcp-local-llm'...
remote: Enumerating objects: 22, done.[K
remote: Counting objects: 100% (22/22), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 22 (delta 3), reused 18 (delta 2), pack-reused 0 (from 0)[K
Receiving objects: 100% (22/22), 13.48 KiB | 1.92 MiB/s, done.
Resolving deltas: 100% (3/3), done.
Repository cloned successfully.

Verification: The directory '/content/mcp-local-llm' now exists. Printing its directory tree:

/content/mcp-local-llm:
artifacts  notebooks  README.md  requirements.txt  scripts

/content/mcp-local-llm/artifacts:
faiss.index  metadata.json

/content/mcp-local-llm/notebooks:
colab_rag.ipynb

/content/mcp-local-llm/scripts:
embed.py  ingest.py  rag.py  retrieve.py


## Install Dependencies

Check for the existence of `requirements.txt` in the repository, and then install all listed dependencies. Infer and install any additional necessary libraries like `sentence-transformers`, `faiss-cpu`, and `accelerate` if not already present, ensuring a GPU-enabled environment. Also, verify that the key libraries are imported correctly.

In [2]:
import os

# repo_path is already defined from previous steps
requirements_path = os.path.join(repo_path, "requirements.txt")

print(f"Checking for requirements.txt at: {requirements_path}")
if os.path.exists(requirements_path):
    print("requirements.txt found. Installing dependencies...")
    !pip install -r {requirements_path}
    print("Dependencies from requirements.txt installed.")
else:
    print("requirements.txt not found. Skipping installation from file.")

print("Installing essential libraries: sentence-transformers, faiss-cpu, accelerate, transformers[torch]...")
# Install essential libraries, ensuring accelerate for GPU if available
!pip install sentence-transformers faiss-cpu accelerate "transformers[torch]"

print("All specified dependencies and essential libraries are being installed.")


Checking for requirements.txt at: /content/mcp-local-llm/requirements.txt
requirements.txt found. Installing dependencies...
Dependencies from requirements.txt installed.
Installing essential libraries: sentence-transformers, faiss-cpu, accelerate, transformers[torch]...
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m54.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.1
All specified dependencies and essential libraries are being installed.


In [None]:
import sentence_transformers
import faiss
import accelerate
import requests

print(f"sentence-transformers version: {sentence_transformers.__version__}")
print(f"faiss-cpu version: {faiss.__version__}")
print(f"accelerate version: {accelerate.__version__}")

# Test Ollama connection
print("Testing Ollama connection...")
try:
    response = requests.get('http://localhost:11434/api/tags', timeout=5)
    if response.status_code == 200:
        models = response.json().get('models', [])
        print(f"Ollama is running. Available models: {[m['name'] for m in models] if models else 'None listed'}")
    else:
        print(f"Ollama API returned status {response.status_code}")
except Exception as e:
    print(f"Error connecting to Ollama: {e}")

print("Verification complete: Essential libraries are imported and Ollama connection tested.")



sentence-transformers version: 5.1.2
faiss-cpu version: 1.13.1
accelerate version: 1.12.0
Testing HuggingFace model loading...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

google/flan-t5-base model and tokenizer loaded successfully for verification.
Verification complete: Essential libraries are imported and their versions/basic functionality are displayed.


# BUILD ARTIFACTS (ONE TIME)

This section runs the external scripts to process documents, generate embeddings, and build the FAISS index. These artifacts (`metadata.json` and `faiss.index`) will be saved in the `artifacts/` directory and will be reused for RAG queries without re-processing.

## Parse Actual Documents

Use the doc_parser function to parse documents from specified directories and save them to `data/raw/`. Specify the directories containing your documents below.

In [None]:
import os
import sys

# Add scripts to path
sys.path.append(os.path.join(repo_path, "scripts"))

from doc_parser import doc_parser

# repo_path is already defined from previous steps
data_raw_path = os.path.join(repo_path, "data", "raw")
artifacts_dir = os.path.join(repo_path, "artifacts")
output_metadata_path = os.path.join(artifacts_dir, "metadata.json")
faiss_index_path = os.path.join(artifacts_dir, "faiss.index")

print(f"Ensuring data/raw directory at: {data_raw_path}")
os.makedirs(data_raw_path, exist_ok=True)
print(f"Directory '{data_raw_path}' ensured to exist.")

print(f"Ensuring artifacts directory at: {artifacts_dir}")
os.makedirs(artifacts_dir, exist_ok=True)
print(f"Directory '{artifacts_dir}' ensured to exist.")

# Specify directories containing your documents
# Example: document_dirs = ["/path/to/your/docs1", "/path/to/your/docs2"]
# For demonstration, using a placeholder - replace with actual paths
document_dirs = []  # Add your document directories here

if document_dirs:
    print(f"Parsing documents from: {document_dirs}")
    parsed_docs = doc_parser(*document_dirs, output_dir=data_raw_path)
    print(f"Successfully parsed {len(parsed_docs)} documents.")
else:
    print("No document directories specified. Please add paths to 'document_dirs' list above.")
    print("For testing, you can manually add documents to data/raw/ or use dummy data.")

print("Document parsing complete.")

Ensuring data/raw directory at: /content/mcp-local-llm/data/raw
Directory '/content/mcp-local-llm/data/raw' ensured to exist.
Ensuring artifacts directory at: /content/mcp-local-llm/artifacts
Directory '/content/mcp-local-llm/artifacts' ensured to exist.
Created dummy document: /content/mcp-local-llm/data/raw/document1.txt
Created dummy document: /content/mcp-local-llm/data/raw/document2.txt
Sample documents created successfully in data/raw.


## Run `chunk_data.py`

Execute the external `chunk_data.py` script to read documents from `data/raw/`, chunk them, and save the chunk metadata to `artifacts/metadata.json`.

In [5]:
import os

scripts_dir = os.path.join(repo_path, "scripts")
chunk_data_script = os.path.join(scripts_dir, "ingest.py") # Corrected script name

chunk_size = 400
chunk_overlap = 50

print(f"Executing {chunk_data_script} to chunk documents...")
print(f"Input directory: {data_raw_path}")
print(f"Output metadata path: {output_metadata_path}")
print(f"Chunk size: {chunk_size}, Chunk overlap: {chunk_overlap}")

# Store original working directory and change to repo_path
original_cwd = os.getcwd()
os.chdir(repo_path)
print(f"Changed current working directory to: {os.getcwd()}")

# Execute the ingest.py script, passing absolute paths for robustness
!python {chunk_data_script} --input_dir {data_raw_path} --output_metadata_path {output_metadata_path} --chunk_size {chunk_size} --chunk_overlap {chunk_overlap}

# Restore original working directory
os.chdir(original_cwd)
print(f"Restored current working directory to: {os.getcwd()}")

print("Document chunking complete and metadata saved to artifacts/metadata.json.")

Executing /content/mcp-local-llm/scripts/ingest.py to chunk documents...
Input directory: /content/mcp-local-llm/data/raw
Output metadata path: /content/mcp-local-llm/artifacts/metadata.json
Chunk size: 400, Chunk overlap: 50
Changed current working directory to: /content/mcp-local-llm
tokenizer_config.json: 100% 48.0/48.0 [00:00<00:00, 196kB/s]
config.json: 100% 570/570 [00:00<00:00, 2.52MB/s]
vocab.txt: 100% 232k/232k [00:00<00:00, 2.31MB/s]
tokenizer.json: 100% 466k/466k [00:00<00:00, 7.64MB/s]
Chunks created: 2
Restored current working directory to: /content
Document chunking complete and metadata saved to artifacts/metadata.json.


## Run `embed_data.py`

Execute the external `embed_data.py` script to generate embeddings for the chunks in `metadata.json`, build a FAISS index, and save the FAISS index (`faiss.index`) and the updated metadata (`metadata.json`) to the `artifacts/` directory.

In [6]:
import os

scripts_dir = os.path.join(repo_path, "scripts")
embed_data_script = os.path.join(scripts_dir, "embed.py") # Corrected script name

model_name_for_embedding = 'all-MiniLM-L6-v2'

print(f"Executing {embed_data_script} to generate embeddings and build FAISS index...")
print(f"Input metadata path: {output_metadata_path}")
print(f"Output FAISS index path: {faiss_index_path}")
print(f"Embedding model: {model_name_for_embedding}")

# Store original working directory and change to repo_path
original_cwd = os.getcwd()
os.chdir(repo_path)
print(f"Changed current working directory to: {os.getcwd()}")

# Execute the embed.py script, passing absolute paths for robustness
!python {embed_data_script} --metadata_path {output_metadata_path} --faiss_index_path {faiss_index_path} --model_name {model_name_for_embedding}

# Restore original working directory
os.chdir(original_cwd)
print(f"Restored current working directory to: {os.getcwd()}")

print("Embeddings generated, FAISS index built, and artifacts saved.")

Executing /content/mcp-local-llm/scripts/embed.py to generate embeddings and build FAISS index...
Input metadata path: /content/mcp-local-llm/artifacts/metadata.json
Output FAISS index path: /content/mcp-local-llm/artifacts/faiss.index
Embedding model: all-MiniLM-L6-v2
Changed current working directory to: /content/mcp-local-llm
2025-12-16 11:26:55.253206: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765884415.293760    1282 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765884415.306533    1282 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1765884415.340609    1282 computation_placer.cc:177] computation placer already registered. Pl

# RAG DEMO

This section demonstrates the RAG system by importing the `answer` function from `scripts/rag_ollama.py` and allowing interactive queries based on the parsed documents.

## Import and Call `answer` Function

Import the `answer` function from the `rag_query.py` script and configure the environment for its execution. This function will orchestrate the retrieval and generation process.

In [None]:
import os
import sys

# Add the scripts directory to the Python path so rag_ollama can be imported
sys.path.append(os.path.join(repo_path, "scripts"))

# Store original working directory and change to repo_path for script import and execution consistency
original_cwd = os.getcwd()
os.chdir(repo_path)
print(f"Changed current working directory to: {os.getcwd()}")

# Import the answer function from rag_ollama.py
try:
    from rag_ollama import answer
    print("Successfully imported 'answer' function from rag_ollama.py.")
except ImportError as e:
    print(f"Error importing 'answer' from rag_ollama.py: {e}")
    print("Please ensure that rag_ollama.py exists in the scripts directory and contains an 'answer' function.")
    # Define a dummy answer function to prevent further errors during demonstration
    def answer(query, artifacts_dir=None):
        return "Error: RAG answer function not loaded due to import error. Check console for details."

# Define the artifacts directory (though the answer function should load them internally)
artifacts_dir = os.path.join(repo_path, "artifacts")

# Restore original working directory
os.chdir(original_cwd)
print(f"Restored current working directory to: {os.getcwd()}")

print("RAG system ready for queries.")

Changed current working directory to: /content/mcp-local-llm
Successfully imported 'answer' function from rag.py.
Restored current working directory to: /content
RAG system ready for queries.


## Interactive RAG Queries

Ask questions based on your parsed documents. Run this cell to start an interactive session.

In [None]:
import os

# Store current working directory
current_cwd_for_query = os.getcwd()
os.chdir(repo_path)  # Change to repository root for correct artifact loading
print(f"Changed current working directory to: {os.getcwd()} for RAG query execution.")

print("Interactive RAG Query Session")
print("Type your questions about the documents. Type 'quit' or 'exit' to stop.")
print("-" * 50)

try:
    while True:
        query = input("Your question: ").strip()
        if query.lower() in ['quit', 'exit', 'q']:
            print("Exiting interactive session.")
            break
        if not query:
            continue

        print(f"\nQuery: {query}")
        print("Generating answer...")
        rag_response = answer(query=query)
        print(f"RAG Answer:\n{rag_response}")
        print("-" * 50)
finally:
    os.chdir(current_cwd_for_query)  # Restore original working directory
    print(f"Restored current working directory to: {os.getcwd()}")

print("Interactive session complete.")

Executing test queries...
Changed current working directory to: /content/mcp-local-llm for RAG query execution.

Query 1: What is Retrieval Augmented Generation and its process?
Generated RAG Answer 1:
[CLS] retrieval augmented generation ( rag ) is an ai framework that retrieves facts from an external knowledge base to ground large language models ( llms ) on the most accurate and up - to - date information. this helps to reduce hallucinations and allows llms to access knowledge beyond their training data. rag combines the strengths of retrieval - based models and generative models. traditional llms are trained on vast amounts of data, but their knowledge is static and limited to their training cutoff. [SEP] [CLS] the process of rag involves several key steps : first, a query is received, and

Query 2: How does RAG help with LLM limitations?
Generated RAG Answer 2:
this helps to reduce hallucinations and allows llms to access knowledge beyond their training data

Query 3: What makes R