<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/[NOTEBOOK_ID])

## Master Generative AI in 8 Weeks
**What You'll Learn:**
- Master cutting-edge AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship sessions
- No coding experience required
- Join Innovation Community

Transform your AI ideas into reality through hands-on projects and expert mentorship.

[Start Your Journey](https://www.buildfastwithai.com/genai-course)

---

# Gemini 3 Pro - Simple RAG Implementation

**Created by:** @BuildFastWithAI  
**Model:** Google Gemini 3 Pro  
**Last Updated:** November 2025

Complete guide to building a RAG (Retrieval-Augmented Generation) system with Gemini 3 Pro.

In [None]:
!pip install -q google-generativeai langchain langchain-google-genai faiss-cpu pypdf

In [None]:
import google.generativeai as genai
from google.colab import userdata
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader, PyPDFLoader

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## 1. RAG Fundamentals & Architecture

RAG combines:
- **Retrieval**: Finding relevant documents from a knowledge base
- **Augmentation**: Adding retrieved context to the prompt
- **Generation**: Using LLM to generate response with context

## 2. Document Loading

In [None]:
# Sample documents
sample_docs = """
Gemini 3 Pro Overview:
Gemini 3 Pro is Google's latest large language model featuring enhanced reasoning capabilities.
It supports multimodal inputs including text, images, audio, and video.

Key Features:
- Context window up to 1 million tokens
- Advanced function calling
- Native multimodal understanding
- Low latency responses
- Cost-effective pricing

Performance:
Gemini 3 Pro excels at complex reasoning tasks, code generation, and data analysis.
It outperforms previous models on mathematical reasoning and scientific tasks.

Use Cases:
- Customer support automation
- Content generation
- Code assistance
- Data analysis and insights
- Research and summarization
"""

# Save to file
with open('gemini_docs.txt', 'w') as f:
    f.write(sample_docs)

print("âœ… Documents created")

## 3. Text Chunking

In [None]:
# Load and split documents
from langchain.schema import Document

# Create documents
docs = [Document(page_content=sample_docs)]

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

chunks = text_splitter.split_documents(docs)

print(f"Created {len(chunks)} chunks")
print(f"\nFirst chunk:\n{chunks[0].page_content}")

## 4. Embedding Generation

In [None]:
# Initialize embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=GOOGLE_API_KEY
)

# Test embedding
test_embedding = embeddings.embed_query("What is Gemini 3 Pro?")
print(f"Embedding dimension: {len(test_embedding)}")
print(f"First 5 values: {test_embedding[:5]}")

## 5. Vector Store Setup

In [None]:
# Create FAISS vector store
vectorstore = FAISS.from_documents(chunks, embeddings)

# Save vector store
vectorstore.save_local("faiss_index")
print("âœ… Vector store created and saved")

# Load vector store (for later use)
# vectorstore = FAISS.load_local("faiss_index", embeddings)

## 6. Retrieval - Similarity Search

In [None]:
# Simple similarity search
query = "What are the key features of Gemini 3 Pro?"
docs = vectorstore.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(docs, 1):
    print(f"Result {i}:")
    print(doc.page_content)
    print("---")

In [None]:
# Search with scores
docs_with_scores = vectorstore.similarity_search_with_score(query, k=3)

for doc, score in docs_with_scores:
    print(f"Score: {score:.4f}")
    print(f"Content: {doc.page_content[:100]}...\n")

## 7. Generation - RAG Pipeline

In [None]:
# Initialize LLM
llm = ChatGoogleGenerativeAI(
    model="gemini-3-pro",
    google_api_key=GOOGLE_API_KEY,
    temperature=0.3
)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

# Query
query = "What makes Gemini 3 Pro unique?"
result = qa_chain({"query": query})

print(f"Question: {query}")
print(f"\nAnswer: {result['result']}")
print(f"\nSources used: {len(result['source_documents'])}")

## 8. Complete RAG System

In [None]:
class SimpleRAG:
    def __init__(self, api_key: str):
        self.embeddings = GoogleGenerativeAIEmbeddings(
            model="models/embedding-001",
            google_api_key=api_key
        )
        self.llm = ChatGoogleGenerativeAI(
            model="gemini-3-pro",
            google_api_key=api_key,
            temperature=0.3
        )
        self.vectorstore = None
    
    def load_documents(self, texts: list):
        """Load and process documents."""
        docs = [Document(page_content=text) for text in texts]
        
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=500,
            chunk_overlap=50
        )
        chunks = text_splitter.split_documents(docs)
        
        self.vectorstore = FAISS.from_documents(chunks, self.embeddings)
        print(f"âœ… Loaded {len(chunks)} chunks")
    
    def query(self, question: str, k: int = 3) -> dict:
        """Query the RAG system."""
        if not self.vectorstore:
            return {"error": "No documents loaded"}
        
        # Retrieve
        docs = self.vectorstore.similarity_search(question, k=k)
        context = "\n\n".join([doc.page_content for doc in docs])
        
        # Generate
        prompt = f"""
Answer the question based on the context below.

Context:
{context}

Question: {question}

Answer:
"""
        
        response = self.llm.predict(prompt)
        
        return {
            "question": question,
            "answer": response,
            "sources": [doc.page_content[:100] for doc in docs]
        }

# Initialize and test
rag = SimpleRAG(GOOGLE_API_KEY)
rag.load_documents([sample_docs])

# Test queries
questions = [
    "What is the context window size?",
    "What are the main use cases?",
    "How does it perform on reasoning tasks?"
]

for q in questions:
    result = rag.query(q)
    print(f"\nQ: {result['question']}")
    print(f"A: {result['answer']}")
    print("---")

## 9. Key Takeaways

âœ… **RAG Pipeline:**
1. Load and chunk documents
2. Generate embeddings
3. Store in vector database
4. Retrieve relevant chunks
5. Generate answer with context

ðŸ“Œ **Best Practices:**
- Choose appropriate chunk size (500-1000 tokens)
- Use overlap to maintain context
- Retrieve 3-5 most relevant chunks
- Persist vector store for reuse

ðŸ”— **Resources:**
- Follow [@BuildFastWithAI](https://twitter.com/BuildFastWithAI)