## Install the SDK

The Python SDK for the Gemini API is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:

In [None]:
!pip install -q -U google-generativeai

## Set up your API key

To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.

<a class="button" href="https://aistudio.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `GOOGLE_API_KEY`. Then pass the key to the SDK:

In [None]:
# Import the Python SDK
import google.generativeai as genai
# Used to securely store your API key
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Initialize the Generative Model

Before you can make any API calls, you need to initialize the Generative Model.

In [None]:
model = genai.GenerativeModel('gemini-2.0-flash')

In [None]:
!pip install faiss-cpu



In [None]:
from huggingface_hub import login

# Log in to your Hugging Face account
login(token="key")  #add your Hugging Face key here

In [None]:
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, pipeline
import torch
import faiss
import numpy as np

# 1. Load tokenizer and models
embedding_model_id = "google/gemma-2-9b-it"
llm_model_id = "google/gemma-2-9b-it"

import torch


tokenizer = AutoTokenizer.from_pretrained(embedding_model_id)
embedding_model = AutoModel.from_pretrained(
        "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
qa_model =AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)


# 2. Sample documents
documents = [
    "The Eiffel Tower is located in Paris and was built in 1889.",
    "The Great Wall of China is over 13,000 miles long.",
    "Python is a popular programming language for data science.",
    "The Moon is Earth's only natural satellite."
]

# 3. Generate embeddings
def embed(texts):
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = embedding_model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :].type(torch.float32).numpy()  # CLS token or first token
    return embeddings

doc_embeddings = embed(documents)

# 4. Create FAISS index
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(doc_embeddings)

faiss.write_index(index, "/content/drive/MyDrive/my_faiss_index.bin")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/857 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/39.1k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.67G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [None]:
index = faiss.read_index("/content/drive/MyDrive/my_faiss_index.bin")


# 5. Retrieve relevant docs
def retrieve(query, k=1):
    query_embedding = embed([query])
    distances, indices = index.search(query_embedding, k)
    return [documents[i] for i in indices[0]]

# 6. RAG: Retrieve, then generate
def answer_question(query, k=1):
    context_docs = retrieve(query, k)
    context = "\n".join(context_docs)
    prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"

    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        output = qa_model.generate(**inputs, max_new_tokens=100)

    answer = tokenizer.decode(output[0], skip_special_tokens=True)
    return answer[len(prompt):].strip()

# 🔍 Example
question = "Where is the Eiffel Tower located?"
response = answer_question(question, k=2)
print(f"Q: {question}\nA: {response}")


Q: Where is the Eiffel Tower located?
A: Paris
