This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:

- Chromadb for efficient vector-based retrieval.
- OpenAI API for response generation.
- T5 (FLAN-T5-small) for query expansion.
- Sentence-BERT (SBERT) for embedding generation.

# 1. Install and Import Dependencies

  First, ensure that all required libraries are installed before running the code.

In [None]:
!pip install chromadb openai==0.28 sentence-transformers transformers



Now, import the necessary libraries:

In [None]:
import os
import openai
import chromadb
import json
import numpy as np
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from google.colab import files

# 2. Set Up OpenAI API Key

Replace 'your-api-key' with your actual OpenAI API key.




In [None]:
openai.api_key = "Your-api-key"


# 3. Initialize ChromaDB

We use ChromaDB as a vector database and SBERT (all-MiniLM-L6-v2) for text embeddings.


In [None]:
# Initialize ChromaDB client and collection
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection(
    name="rag_collection",
    embedding_function=SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# 4. Query Expansion using FLAN-T5
FLAN-T5 helps rewrite the user query for better document retrieval.

In [None]:
# Load query expansion model (FLAN-T5)
query_expansion_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

# Expand the query to improve search results
def expand_query(query):
    input_text = f"Expand this query: {query}"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    output_ids = query_expansion_model.generate(input_ids, max_length=50)
    expanded_query = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return expanded_query

# 5.  Store Documents in ChromaDB
Here, we add sample knowledge base documents to ChromaDB.

In [None]:
documents = [
    "AI is transforming the world with automation and intelligence.",
    "Machine learning helps in predictive analytics and decision-making.",
    "Retrieval-Augmented Generation (RAG) improves LLM response accuracy.",
    "Vector databases store embeddings for efficient information retrieval."
]

# Store documents in ChromaDB
for i, doc in enumerate(documents):
    collection.add(ids=[str(i)], documents=[doc])



# 6. Retrieve Similar Documents
Now, we retrieve the top K most relevant documents using semantic search.



In [None]:
def retrieve_documents(query, k=2):
    results = collection.query(
        query_texts=[query],
        n_results=k
    )
    return results["documents"][0] if results["documents"] else []

# 7. Generate Responses using OpenAI API
The retrieved documents act as context for the AI model.

In [None]:
def generate_response(query, retrieved_docs):
    context = "\n".join(retrieved_docs)
    prompt = f"Context: {context}\n\nUser Query: {query}\n\nAnswer:"
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": prompt}]
    )
    return response['choices'][0]['message']['content']


# 8. Run a Sample Query


*   Expands the query
*   Retrieves relevant documents
*   Generates a response





In [None]:
query = "How does RAG improve AI?"
expanded_query = expand_query(query)
retrieved_docs = retrieve_documents(expanded_query)
response = generate_response(query, retrieved_docs)

# Prepare result data
result_data = {
    "Original Query": query,
    "Expanded Query": expanded_query,
    "Retrieved Documents": retrieved_docs,
    "Generated Response": response
}

# Print the results
print("Original Query:", query)
print("Expanded Query:", expanded_query)
print("Retrieved Docs:", retrieved_docs)
print("Generated Response:", response)

Original Query: How does RAG improve AI?
Expanded Query: RAG is a software company that provides software to companies that provide services to the public. RAG is a software company that provides software to companies that provide services to the public.
Retrieved Docs: ['Retrieval-Augmented Generation (RAG) improves LLM response accuracy.', 'AI is transforming the world with automation and intelligence.']
Generated Response: RAG, or Retrieval-Augmented Generation, improves AI by combining the strengths of both retrieval and generative models. Retrieval models are excellent at retrieving relevant information from large datasets, while generative models are skilled at producing human-like text. By combining these two approaches, RAG can provide more accurate and relevant responses compared to traditional language models. RAG uses a retriever to search for relevant information and then a generator to create a response based on that information, resulting in more contextually accurate and

# 9. Code to Save and Download Results
Add this after running the query to save the results and download the file:

In [None]:
# Save as a text file
with open("RAG_result.txt", "w") as f:
    for key, value in result_data.items():
        f.write(f"{key}: {value}\n\n")

# Save as a JSON file
with open("RAG_result.json", "w") as f:
    json.dump(result_data, f, indent=4)

# Download the files
files.download("RAG_result.txt")
files.download("RAG_result.json")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

🔹 What This Code Does
1. Uses ChromaDB instead of FAISS for storing and retrieving vector embeddings.
2. Expands user queries using FLAN-T5.
3. Stores and retrieves documents using SBERT embeddings.
4. Uses OpenAI API to generate responses.
5. Saves the vector database persistently.


Why Use ChromaDB?
1. Easier to manage than FAISS (No need to manually store index files).
2. Scalable (Works in cloud environments).
3. Flexible (Supports multiple embedding models).
4. Persistent storage (Unlike FAISS, it keeps data even after restarting).