<a href="https://colab.research.google.com/github/apoorvapu/data_science/blob/main/rag_tutorial_gist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
from sentence_transformers import SentenceTransformer

# Initialize the embedding model used in production RAG systems
model = SentenceTransformer('all-MiniLM-L6-v2')

# Example knowledge base documents and user query
documents = [
    "Neural networks learn through backpropagation and gradient descent",
    "Deep learning models require large datasets for effective training",
    "Transformers use attention mechanisms to process sequential data"
]
user_query = "How do AI models improve their performance?"

# Convert text to mathematical representations
doc_embeddings = model.encode(documents)
query_embedding = model.encode(user_query)

print(f"Query embedding shape: {query_embedding.shape}")
print(f"Each document embedding shape: {doc_embeddings[0].shape}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Query embedding shape: (384,)
Each document embedding shape: (384,)


  return forward_call(*args, **kwargs)


In [2]:
from sklearn.metrics.pairwise import cosine_similarity

# Calculate semantic similarity between query and documents
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

# Rank documents by relevance
for i, (doc, score) in enumerate(zip(documents, similarities)):
    print(f"Document {i+1} (Similarity: {score:.3f}):")

Document 1 (Similarity: 0.335):
Document 2 (Similarity: 0.476):
Document 3 (Similarity: 0.243):


In [4]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-1.0.15-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.6 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.36.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb)
  Downloading opentelemetry_sdk-1.36.0-py3-none-any.whl.metadata (1.5 k

In [5]:
import chromadb
from chromadb.config import Settings

# Initialize vector database for RAG
client = chromadb.Client(Settings(anonymized_telemetry=False))
collection = client.create_collection(
    name="knowledge_base",
    metadata={"description": "RAG document storage"}
)

# Store documents with automatic embedding generation
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
    metadatas=[{"source": f"document_{i}"} for i in range(len(documents))]
)

def rag_retrieve(query, n_results=2):
    """Retrieve most relevant documents for RAG pipeline"""
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results['documents'][0], results['distances'][0]

# Demonstrate retrieval
retrieved_docs, relevance_scores = rag_retrieve(
    "What techniques help neural networks learn effectively?"
)

print("Retrieved documents for RAG:")
for doc, score in zip(retrieved_docs, relevance_scores):
    print(f"Relevance: {1-score:.3f} | Content: {doc}")


/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:02<00:00, 32.4MiB/s]


Retrieved documents for RAG:
Relevance: 0.261 | Content: Neural networks learn through backpropagation and gradient descent
Relevance: -0.108 | Content: Deep learning models require large datasets for effective training


In [6]:
def complete_rag_pipeline(user_question, max_context_docs=3):
    """Complete RAG implementation from query to response"""

    # Step 1: Retrieve relevant context
    context_docs, scores = rag_retrieve(user_question, max_context_docs)

    # Step 2: Construct RAG prompt
    context_section = "\n".join([
        f"[Source {i+1}]: {doc}"
        for i, doc in enumerate(context_docs)
    ])

    rag_prompt = f"""Context Information:
{context_section}

User Question: {user_question}

Instructions: Answer the question using the provided context. If the context doesn't contain relevant information, acknowledge this limitation.

Response:"""

    return rag_prompt, context_docs, scores

# Example usage
question = "What methods do neural networks use to learn from data?"
prompt, sources, relevance = complete_rag_pipeline(question)

print("Generated RAG Prompt:")
print(prompt)
print(f"\nRetrieved {len(sources)} relevant sources")

Generated RAG Prompt:
Context Information:
[Source 1]: Neural networks learn through backpropagation and gradient descent
[Source 2]: Deep learning models require large datasets for effective training
[Source 3]: Transformers use attention mechanisms to process sequential data

User Question: What methods do neural networks use to learn from data?

Instructions: Answer the question using the provided context. If the context doesn't contain relevant information, acknowledge this limitation.

Response:

Retrieved 3 relevant sources
