<a href="https://colab.research.google.com/github/Shabeehak/AI_Agent-Personal-Finance-Analyzer-Agent/blob/main/RAG_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üìÑ RAG ‚Äî Document Q&A with Generative AI
### Project: Retrieval-Augmented Generation (RAG)
**Model: Google Gemini 2.0 Flash (Free API) | New google-genai SDK**

---

## What is this project?

This notebook demonstrates **RAG (Retrieval-Augmented Generation)**:

1. Load a **text document** (your knowledge base)
2. User asks a **question**
3. System finds the **most relevant chunk** using TF-IDF similarity
4. **Gemini LLM generates** a natural language answer from that chunk

> üí° **Why RAG instead of fine-tuning?**  
> Fine-tuning needs hours of GPU compute and costs money.  
> RAG achieves the same result in seconds ‚Äî used by ChatGPT, Bing AI, and Google Gemini.

---

## RAG Pipeline

```
Document ‚îÄ‚îÄ‚ñ∫ Split into Chunks ‚îÄ‚îÄ‚ñ∫ TF-IDF Index
                                        ‚îÇ
Question ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ Similarity Search
                                        ‚îÇ
                              Relevant Chunks
                                        ‚îÇ
                    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                    ‚îÇ  Gemini LLM generates the answer  ‚îÇ
                    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

---

## How to get a FREE Gemini API Key

1. Go to üëâ https://aistudio.google.com/apikey
2. Sign in with your Google account
3. Click **Create API Key**
4. Copy and paste it in **Step 2** below

> ‚úÖ No credit card required ‚Äî free tier: 15 requests/min, 1500 requests/day

## Step 1 ‚Äî Install & Import Libraries

> ‚ö†Ô∏è Note: The old `google-generativeai` library is **deprecated** as of Nov 2025.  
> We use the new official SDK: `google-genai`

In [None]:
# Install the NEW official Google GenAI SDK + other libraries
!pip install google-genai scikit-learn numpy --quiet


[notice] A new release of pip is available: 23.2.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import os
import time
import numpy as np
from google import genai                              # New SDK
from google.genai import types                        # For config
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

print('‚úì All libraries imported successfully')
print('  SDK: google-genai (new official SDK)')

‚úì All libraries imported successfully
  SDK: google-genai (new official SDK)


## Step 2 ‚Äî Set Your FREE Gemini API Key

Get your free key from üëâ https://aistudio.google.com/apikey

In [None]:
from google import genai

client = genai.Client(api_key=GEMINI_API_KEY)

print("Available Gemini models that support generateContent:\n")
for model in client.models.list():
    if 'generateContent' in model.supported_actions:
        print(f"  {model.name}")

Available Gemini models that support generateContent:

  models/gemini-2.5-flash
  models/gemini-2.5-pro
  models/gemini-2.0-flash
  models/gemini-2.0-flash-001
  models/gemini-2.0-flash-exp-image-generation
  models/gemini-2.0-flash-lite-001
  models/gemini-2.0-flash-lite
  models/gemini-2.5-flash-preview-tts
  models/gemini-2.5-pro-preview-tts
  models/gemma-3-1b-it
  models/gemma-3-4b-it
  models/gemma-3-12b-it
  models/gemma-3-27b-it
  models/gemma-3n-e4b-it
  models/gemma-3n-e2b-it
  models/gemini-flash-latest
  models/gemini-flash-lite-latest
  models/gemini-pro-latest
  models/gemini-2.5-flash-lite
  models/gemini-2.5-flash-image
  models/gemini-2.5-flash-lite-preview-09-2025
  models/gemini-3-pro-preview
  models/gemini-3-flash-preview
  models/gemini-3.1-pro-preview
  models/gemini-3.1-pro-preview-customtools
  models/gemini-3-pro-image-preview
  models/nano-banana-pro-preview
  models/gemini-robotics-er-1.5-preview
  models/gemini-2.5-computer-use-preview-10-2025
  models/dee

In [None]:
# Gemini API key here
import getpass
GEMINI_API_KEY = getpass.getpass("Enter Gemini API Key: ")

# Initialize the new Gemini client
client = genai.Client(api_key=GEMINI_API_KEY)

MODEL = 'gemini-2.5-flash'

print(f'‚úì Gemini client ready')
print(f'‚úì Model: {MODEL}')
print(f'‚úì Free tier: 15 req/min, 1500 req/day')

Enter Gemini API Key:  ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


‚úì Gemini client ready
‚úì Model: gemini-2.5-flash
‚úì Free tier: 15 req/min, 1500 req/day


## Step 3 ‚Äî Load the Document

A sample LLM study document is ready to go.  
Replace `DOCUMENT` with any text ‚Äî research paper, notes, textbook chapter.

In [None]:
DOCUMENT = """
Large Language Models (LLMs) are AI models trained on massive amounts of text data.
They learn patterns in language and can generate human-like text.
Examples include GPT-4, Claude, Gemini, and LLaMA.

Architecture: LLMs are based on the Transformer architecture introduced in the 2017 paper
Attention Is All You Need. Transformers use self-attention to weigh the importance of
different words when processing each word. The architecture has encoder and decoder layers
with multiple attention heads that capture different relationships in text.

Training: LLMs are trained using unsupervised pre-training on massive text datasets from
the internet, books, and other sources. The model learns to predict the next word in a
sequence. This forces the model to understand grammar, facts, and reasoning. Training
requires thousands of GPUs running for weeks and costs millions of dollars.

Fine-tuning: After pre-training, LLMs can be fine-tuned on specific tasks or datasets.
This adapts the model for use cases like medical diagnosis or customer support.
RLHF (Reinforcement Learning from Human Feedback) is a popular fine-tuning method
used by ChatGPT to make models more helpful and safe.

Advantages: LLMs can understand and generate text in many languages, perform zero-shot
learning, follow complex instructions, write code, summarize documents, and hold
multi-turn conversations. They are highly versatile across many industries.

Disadvantages: LLMs can hallucinate, generating plausible but false information.
They have a knowledge cutoff date. They are expensive to train and run.
They can reflect biases in training data and struggle with precise math.

RAG (Retrieval-Augmented Generation): RAG combines information retrieval with text
generation. Instead of relying only on training knowledge, RAG fetches relevant documents
at query time and includes them in the prompt. This lets LLMs answer questions about
documents they were never trained on and reduces hallucination.

LLM Comparison: GPT-4 by OpenAI is known for strong reasoning and code generation.
Claude by Anthropic focuses on safety and long context windows. Gemini by Google is
multimodal and integrates with Google services. LLaMA by Meta is open-source and free.
Each model has different strengths, pricing, and context window sizes.

Text Generation: LLMs generate text token by token. A token is roughly 4 characters.
The model predicts the probability of the next token given all previous tokens.
Temperature controls randomness ‚Äî low temperature gives predictable output,
high temperature gives more creative and varied output.
"""

print(f'‚úì Document loaded: {len(DOCUMENT.split())} words')
print('Preview:', DOCUMENT.strip()[:200], '...')

‚úì Document loaded: 382 words
Preview: Large Language Models (LLMs) are AI models trained on massive amounts of text data.
They learn patterns in language and can generate human-like text.
Examples include GPT-4, Claude, Gemini, and LLaMA. ...


## Step 4 ‚Äî Split Document into Chunks

We split the document into paragraphs. We only send the **most relevant chunks**  
to Gemini ‚Äî not the whole document ‚Äî to stay within token limits.

In [None]:
def split_into_chunks(text):
    """Split document into paragraph-level chunks."""
    paragraphs = [p.strip() for p in text.strip().split('\n\n') if p.strip()]
    return paragraphs

chunks = split_into_chunks(DOCUMENT)

print(f'‚úì Document split into {len(chunks)} chunks\n')
for i, chunk in enumerate(chunks):
    print(f'Chunk {i+1}: {chunk[:80]}...')

‚úì Document split into 9 chunks

Chunk 1: Large Language Models (LLMs) are AI models trained on massive amounts of text da...
Chunk 2: Architecture: LLMs are based on the Transformer architecture introduced in the 2...
Chunk 3: Training: LLMs are trained using unsupervised pre-training on massive text datas...
Chunk 4: Fine-tuning: After pre-training, LLMs can be fine-tuned on specific tasks or dat...
Chunk 5: Advantages: LLMs can understand and generate text in many languages, perform zer...
Chunk 6: Disadvantages: LLMs can hallucinate, generating plausible but false information....
Chunk 7: RAG (Retrieval-Augmented Generation): RAG combines information retrieval with te...
Chunk 8: LLM Comparison: GPT-4 by OpenAI is known for strong reasoning and code generatio...
Chunk 9: Text Generation: LLMs generate text token by token. A token is roughly 4 charact...


## Step 5 ‚Äî Build TF-IDF Retrieval Index

**TF-IDF** converts text into numbers (vectors) so we can measure similarity.

| Term | Meaning |
|------|---------|
| TF (Term Frequency) | How often a word appears in a chunk |
| IDF (Inverse Document Frequency) | How rare the word is across all chunks |
| Cosine Similarity | Higher score = more relevant to the question |

In [None]:
vectorizer = TfidfVectorizer(stop_words='english')
chunk_vectors = vectorizer.fit_transform(chunks)

print('‚úì TF-IDF index built')
print(f'  Vocabulary size : {len(vectorizer.vocabulary_)} unique terms')
print(f'  Matrix shape    : {chunk_vectors.shape}  (chunks x terms)')

‚úì TF-IDF index built
  Vocabulary size : 187 unique terms
  Matrix shape    : (9, 187)  (chunks x terms)


## Step 6 ‚Äî Retrieval Function

Finds the **top-N most relevant chunks** for any question using cosine similarity.

In [None]:
def retrieve_chunks(question, top_n=3):
    """
    Find most relevant chunks for a question.
    1. Convert question to TF-IDF vector
    2. Compute cosine similarity with all chunks
    3. Return top N most similar chunks
    """
    question_vector = vectorizer.transform([question])
    similarities = cosine_similarity(question_vector, chunk_vectors)[0]
    top_indices = np.argsort(similarities)[::-1][:top_n]
    return [{'chunk': chunks[i], 'score': round(float(similarities[i]), 4)} for i in top_indices]

# Quick test
test_results = retrieve_chunks('What is fine-tuning?')
print('Test: "What is fine-tuning?"\n')
for i, r in enumerate(test_results):
    print(f'Rank {i+1} [score: {r["score"]}]: {r["chunk"][:100]}...')

Test: "What is fine-tuning?"

Rank 1 [score: 0.5734]: Fine-tuning: After pre-training, LLMs can be fine-tuned on specific tasks or datasets.
This adapts t...
Rank 2 [score: 0.0]: Text Generation: LLMs generate text token by token. A token is roughly 4 characters.
The model predi...
Rank 3 [score: 0.0]: LLM Comparison: GPT-4 by OpenAI is known for strong reasoning and code generation.
Claude by Anthrop...


## Step 7 ‚Äî Full RAG Pipeline with Gemini ü§ñ

```
R ‚Äî Retrieve  ‚Üí TF-IDF finds relevant chunks
A ‚Äî Augment   ‚Üí Chunks are injected into the prompt
G ‚Äî Generate  ‚Üí Gemini LLM generates the answer
```

In [None]:
def ask(question, top_n=3):
    """
    Full RAG Pipeline:
    R - Retrieve relevant chunks from document
    A - Augment the prompt with those chunks
    G - Generate answer using Gemini LLM
    """
    print('=' * 60)
    print(f'QUESTION: {question}')
    print('=' * 60)

    # ‚îÄ‚îÄ R: Retrieve ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    relevant = retrieve_chunks(question, top_n=top_n)
    context = '\n\n'.join([r['chunk'] for r in relevant])

    print('\nRELEVANT CHUNKS RETRIEVED:')
    for i, r in enumerate(relevant):
        print(f'  [{i+1}] score={r["score"]} | {r["chunk"][:80]}...')

    # ‚îÄ‚îÄ A: Augment ‚Äî build the prompt ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    prompt = f"""You are an academic assistant. Answer the question using ONLY the context below.
If the answer is not in the context, say: This information is not in the document.

CONTEXT FROM DOCUMENT:
{context}

QUESTION: {question}

ANSWER:"""

    # ‚îÄ‚îÄ G: Generate ‚Äî call Gemini API (new SDK) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    response = client.models.generate_content(
        model=MODEL,
        contents=prompt,
        config=types.GenerateContentConfig(
            max_output_tokens=300,
            temperature=0.3      # Low = focused, factual answers
        )
    )

    print('\nGEMINI GENERATED ANSWER:')
    print('-' * 40)
    print(response.text)
    time.sleep(5)
    print()

print('‚úì RAG pipeline ready!')

‚úì RAG pipeline ready!


## Step 8 ‚Äî Ask Questions! üöÄ

In [None]:
ask('What is a Large Language Model?')

QUESTION: What is a Large Language Model?

RELEVANT CHUNKS RETRIEVED:
  [1] score=0.412 | Large Language Models (LLMs) are AI models trained on massive amounts of text da...
  [2] score=0.0992 | Training: LLMs are trained using unsupervised pre-training on massive text datas...
  [3] score=0.0458 | LLM Comparison: GPT-4 by OpenAI is known for strong reasoning and code generatio...

GEMINI GENERATED ANSWER:
----------------------------------------
Large Language Models (LLMs) are AI models trained on massive amounts of text data.



In [None]:
ask('What are the advantages and disadvantages of LLMs?')

QUESTION: What are the advantages and disadvantages of LLMs?

RELEVANT CHUNKS RETRIEVED:
  [1] score=0.1897 | Disadvantages: LLMs can hallucinate, generating plausible but false information....
  [2] score=0.175 | Advantages: LLMs can understand and generate text in many languages, perform zer...
  [3] score=0.026 | Large Language Models (LLMs) are AI models trained on massive amounts of text da...

GEMINI GENERATED ANSWER:
----------------------------------------
Advantages of LLMs include their ability to understand and generate text in many languages, perform zero-shot learning, follow complex instructions, write code, summarize documents, and hold multi-turn conversations. They are also highly versatile across many industries.

Disadvantages of LLMs include their tendency to hallucinate (generating plausible but false information), having a knowledge cutoff date, being expensive to train and run, reflecting biases in training data, and struggling with precise math.



In [None]:
ask('How does RAG work and why is it better than fine-tuning?')

QUESTION: How does RAG work and why is it better than fine-tuning?

RELEVANT CHUNKS RETRIEVED:
  [1] score=0.4682 | Fine-tuning: After pre-training, LLMs can be fine-tuned on specific tasks or dat...
  [2] score=0.2893 | RAG (Retrieval-Augmented Generation): RAG combines information retrieval with te...
  [3] score=0.0 | Text Generation: LLMs generate text token by token. A token is roughly 4 charact...

GEMINI GENERATED ANSWER:
----------------------------------------
RAG combines information retrieval with text generation. It fetches



In [None]:
ask('Compare GPT-4, Claude and Gemini')

QUESTION: Compare GPT-4, Claude and Gemini

RELEVANT CHUNKS RETRIEVED:
  [1] score=0.3126 | Large Language Models (LLMs) are AI models trained on massive amounts of text da...
  [2] score=0.2474 | LLM Comparison: GPT-4 by OpenAI is known for strong reasoning and code generatio...
  [3] score=0.0 | Text Generation: LLMs generate text token by token. A token is roughly 4 charact...

GEMINI GENERATED ANSWER:
----------------------------------------
GPT-4 by OpenAI is known for strong reasoning and code generation. Claude by Anthropic focuses on safety and long context windows. Gemini by Google is multimodal and integrates with Google services.



## Step 9 ‚Äî Ask Your Own Question üí¨

In [None]:
# Change the question and run!
ask('How does text generation work in LLMs?')

QUESTION: How does text generation work in LLMs?

RELEVANT CHUNKS RETRIEVED:
  [1] score=0.2567 | RAG (Retrieval-Augmented Generation): RAG combines information retrieval with te...
  [2] score=0.1795 | Text Generation: LLMs generate text token by token. A token is roughly 4 charact...
  [3] score=0.1547 | Large Language Models (LLMs) are AI models trained on massive amounts of text da...

GEMINI GENERATED ANSWER:
----------------------------------------
LLMs generate text token by token. A token is roughly 4 characters. The model predicts the probability of the next token given all previous tokens. Temperature controls randomness; low temperature gives predictable output, while high temperature gives more creative and varied output.



## Step 10 ‚Äî Use Your Own Document üìÅ

In [None]:
# Option A: Load from .txt file
# with open('my_document.txt', 'r') as f:
#     DOCUMENT = f.read()

# Option B: Paste your own text
DOCUMENT = """
A non-linear data structure is a data structure in which data elements are not arranged sequentially or linearly. Instead, elements can be connected to multiple other elements, forming complex relationships, hierarchies, or networks.
Key Characteristics
Non-sequential arrangement: Elements are arranged in random order and not in a straight line.
Multiple connections: Each element (node) can be linked to more than one other element, allowing for multiple paths between nodes.
Multi-level storage: Data elements are present at multiple levels, typically in a hierarchical manner.
Complex traversal: Traversing all elements often requires specialized algorithms like Depth First Search (DFS) or Breadth First Search (BFS), and cannot be done in a single linear run.
Efficient memory usage: Non-linear structures can utilize memory more efficiently by dynamically allocating space based on the data's structure, reducing memory wastage seen in some fixed-size linear structures like arrays.
Common Examples and Applications
The main types of non-linear data structures are trees and graphs.
Trees
Description: A hierarchical structure with a single root node at the top, and subsequent nodes organized in a parent-child relationship.
Examples: Binary trees, Binary Search Trees (BST), AVL trees, B-trees, and heaps.
Applications:
Organizing file systems
Organizational charts
Indexing in databases
Syntax trees in compilers
Source: GeeksforGeeks, Naukri Code 360, CMU School of Computer Science
Graphs
Description: A collection of vertices (nodes) connected by edges, used to model relationships between entities.
Examples: Directed graphs, undirected graphs, weighted graphs.
Applications:
Modeling social networks
Transportation and road networks
Mapping web pages and links (World Wide Web)
Artificial Intelligence and image processing
"""

# Rebuild index with new document
chunks = split_into_chunks(DOCUMENT)
chunk_vectors = vectorizer.fit_transform(chunks)
print(f'‚úì New document loaded: {len(chunks)} chunks ready')

‚úì New document loaded: 1 chunks ready


## Summary

| Task | Concept | Demonstrated Here |
|------|---------|-------------------|
| Task 1 | Text Generation | Gemini generates answers token by token |
| Task 2 | Generative Model Working | `client.models.generate_content()` = text generation API |
| Task 3 | Large Language Models | Used Gemini 2.0 Flash (free LLM) |
| Task 4 | Architecture & Fine-tuning | RAG as alternative; `temperature` controls output |
| Task 5 | LLM Comparison | Change `MODEL` to `gemini-2.0-flash` vs `gemini-1.5-pro` |

---

### RAG vs Fine-Tuning

| | Fine-Tuning | RAG |
|---|---|---|
| Cost | Expensive ($$$) | Free ‚úÖ |
| Time | Hours of GPU training | Seconds setup |
| Update knowledge | Retrain the model | Just change the text |

---
### Key Formula
```
RAG = Retrieve (TF-IDF) + Augment (prompt) + Generate (Gemini LLM)
```