# Ejercicio 9: Uso de la API de Google Gemini

En este ejercicio vamos a aprender a utilizar la API de OpenAI

## 1. Uso b√°sico

El siguiente c√≥digo sirve para conectarse con la API de Google Gemini de forma b√°sica

In [1]:
!pip install -q google-generativeai


In [3]:
from kaggle_secrets import UserSecretsClient
import google.generativeai as genai

# Obtener API Key
API_KEY = UserSecretsClient().get_secret("GEMINI_API_KEY")

# Configurar Gemini
genai.configure(api_key=API_KEY)

model = genai.GenerativeModel("gemini-3-flash-preview")

response = model.generate_content(
    "Di hola desde Gemini funcionando en Kaggle"
)

print(response.text)


¬°Hola! üëã Aqu√≠ Gemini salud√°ndote desde el entorno de **Kaggle**. 

Es un gusto saludarte desde esta plataforma dedicada a la ciencia de datos y el aprendizaje autom√°tico. ¬øEn qu√© puedo ayudarte con tu notebook o proyecto hoy? Env√≠ame tus dudas sobre Python, R, datasets o modelos de IA. ¬°Estoy listo para trabajar! üöÄ carbon copy!


## 2. Retrieval

### 2.1 Cargo el corpus de 20 News Groups

In [4]:
from sklearn.datasets import fetch_20newsgroups

datos_grupos = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
docs_originales = datos_grupos.data

import pandas as pd

df = pd.DataFrame(docs_originales, columns=['text'])
df

Unnamed: 0,text
0,\n\nI am sure some bashers of Pens fans are pr...
1,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...
4,1) I have an old Jasmine drive which I cann...
...,...
18841,DN> From: nyeda@cnsvax.uwec.edu (David Nye)\nD...
18842,\nNot in isolated ground recepticles (usually ...
18843,I just installed a DX2-66 CPU in a clone mothe...
18844,\nWouldn't this require a hyper-sphere. In 3-...


In [5]:
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
import re



# Limpieza b√°sica
def normalize_text(s: str) -> str:
    s = re.sub(r"\s+", " ", s).strip()
    return s

df["text_norm"] = df["text"].astype(str).map(normalize_text)

df.head()

Unnamed: 0,text,text_norm
0,\n\nI am sure some bashers of Pens fans are pr...,I am sure some bashers of Pens fans are pretty...
1,My brother is in the market for a high-perform...,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...,Finally you said what you dream about. Mediter...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,Think! It's the SCSI card doing the DMA transf...
4,1) I have an old Jasmine drive which I cann...,1) I have an old Jasmine drive which I cannot ...


In [6]:
def chunk_text(text: str, max_chars: int = 800, overlap: int = 100):
    """
    Chunking por caracteres.
    max_chars ~ 600-1000 suele funcionar bien.
    overlap ayuda a no cortar ideas a la mitad.
    """
    chunks = []
    start = 0
    n = len(text)
    while start < n:
        end = min(start + max_chars, n)
        chunk = text[start:end]
        chunk = chunk.strip()
        if len(chunk) > 0:
            chunks.append(chunk)
        if end == n:
            break
        start = max(0, end - overlap)
    return chunks

records = []
for i, row in df.iterrows():
    chunks = chunk_text(row["text_norm"], max_chars=800, overlap=100)
    for j, ch in enumerate(chunks):
        records.append({
            "doc_id": int(i),
            "chunk_id": j,
            "text": ch
        })

chunks_df = pd.DataFrame(records)
chunks_df.head(), len(chunks_df)

(   doc_id  chunk_id                                               text
 0       0         0  I am sure some bashers of Pens fans are pretty...
 1       1         0  My brother is in the market for a high-perform...
 2       2         0  Finally you said what you dream about. Mediter...
 3       2         1  urds and Turks once upon a time! Ohhhh so swed...
 4       3         0  Think! It's the SCSI card doing the DMA transf...,
 38871)

### 2.2 Transformo a embeddings

In [7]:
from sentence_transformers import SentenceTransformer

MODEL_NAME = "intfloat/e5-base-v2"   # recomendado para retrieval
model = SentenceTransformer(MODEL_NAME)

# Textos a indexar (pasajes)
passages = ["passage: " + t for t in chunks_df["text"].tolist()]

2026-01-09 03:10:06.266396: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767928206.525386      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767928206.598999      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767928207.204644      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767928207.204721      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767928207.204726      55 computation_placer.cc:177] computation placer alr

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [8]:
# Embeddings (N x D)
# Se debe usar normalize_embeddings=True para similitud coseno
embeddings = model.encode(
    passages[:1000],
    batch_size=16,
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype("float32")

Batches:   0%|          | 0/63 [00:00<?, ?it/s]

### 2.3 Creo una query y hago la b√∫squeda

In [9]:
def embed_query(query: str) -> np.ndarray:
    q = "query: " + query
    vec = model.encode(
        [q],
        convert_to_numpy=True,
        normalize_embeddings=True
    ).astype("float32")
    return vec

query_text = "you dream"

query_vec = embed_query(query_text)
query_vec.shape

(1, 768)

In [10]:
!pip install -q faiss-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.8/23.8 MB[0m [31m91.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h

In [13]:
import faiss

dim = embeddings.shape[1]  # dimensi√≥n del embedding
index = faiss.IndexFlatIP(dim)  

index.add(embeddings)

print("Embeddings indexados:", index.ntotal)

TOP_K = 5

D, I = index.search(query_vec, TOP_K)

print("Scores:", D[0])
print("Indices:", I[0])

retrieved_chunks = chunks_df.iloc[I[0]][["doc_id", "chunk_id", "text"]].copy()
retrieved_chunks["score"] = D[0]

retrieved_chunks


Embeddings indexados: 1000
Scores: [0.7984054  0.774282   0.7642971  0.76075095 0.75950146]
Indices: [368 673 846 255 303]


Unnamed: 0,doc_id,chunk_id,text,score
368,232,0,"Oh, and us with the big degrees don't got imag...",0.798405
673,406,2,today do it to anyone. Do you consider yoursel...,0.774282
846,500,1,you ended up paralyzed? Would you have attribu...,0.764297
255,156,0,It also works great to put under your kickstan...,0.760751
303,189,2,t they are not. Be assured that beyond your pr...,0.759501


Obtengo los 5 documentos m√°s similares a mi query

In [14]:
for rank, (idx, score) in enumerate(zip(I[0], D[0]), 1):
    print(f"#{rank} | score: {score:.4f}")
    print(passages[idx][:200])
    print("-" * 50)


#1 | score: 0.7984
passage: Oh, and us with the big degrees don't got imagination, huh? The alleged dichotomy between imagination and knowledge is one of the most pernicious fallacys of the New Age. Michael, thanks for 
--------------------------------------------------
#2 | score: 0.7743
passage: today do it to anyone. Do you consider yourself above the Holy Prophet Muhammad (PBUH) ?? Sincerely, Nabeel.
--------------------------------------------------
#3 | score: 0.7643
passage: you ended up paralyzed? Would you have attributed that to god as well? Or would that have been the work of satan? If you believe that would have been so, why ONLY good from god, and ONLY evil
--------------------------------------------------
#4 | score: 0.7608
--------------------------------------------------
#5 | score: 0.7595
passage: t they are not. Be assured that beyond your present comprehension, there lies such deep reasons that once you see them, you will indeed be satisfied. I will personally guar

### 2.4 RAG: Conectar Retrieval + Gemini

In [15]:
context = "\n\n".join(
    f"- {row.text}" for _, row in retrieved_chunks.iterrows()
)


In [20]:
prompt = f"""
Eres un asistente acad√©mico. Responde usando √öNICAMENTE la informaci√≥n
contenida en los fragmentos proporcionados.

Consulta del usuario:
"{query_text}"

Fragmentos recuperados:
{context}

Tarea:
- Resume la informaci√≥n relevante para la consulta.
- Si los fragmentos no contienen informaci√≥n suficiente, ind√≠calo claramente.
"""


In [21]:
from kaggle_secrets import UserSecretsClient
import google.generativeai as genai

API_KEY = UserSecretsClient().get_secret("GEMINI_API_KEY")
genai.configure(api_key=API_KEY)

gemini_model = genai.GenerativeModel("gemini-3-flash-preview")

response = gemini_model.generate_content(prompt)

print(response.text)



Basado en los fragmentos proporcionados, la informaci√≥n relevante sobre el tema de los sue√±os es la siguiente:

*   **Capacidad de so√±ar en acad√©micos:** Se afirma que las personas con t√≠tulos acad√©micos poseen sus propios sue√±os y que la supuesta dicotom√≠a entre la imaginaci√≥n y el conocimiento es una falacia.
*   **Desperdicio de sue√±os:** Se le indica a un individuo (referido como Michael) que est√° dejando que sus propios sue√±os se desperdicien por no adquirir conocimientos en √°reas como matem√°ticas, termodin√°mica o qu√≠mica.
*   **Relaci√≥n con el conocimiento:** Seg√∫n el texto, poseer conocimientos cient√≠ficos es lo que le dar√≠a "alas" a la imaginaci√≥n vinculada a esos sue√±os.
