# Ejercicio 9: Uso de la API de Google Gemini

En este ejercicio vamos a aprender a utilizar la API de OpenAI

## 1. Uso básico

El siguiente código sirve para conectarse con la API de Google Gemini de forma básica

In [2]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("Gemini_API_KEY")


In [3]:
from google import genai

# 1. Configuramos el cliente usando la variable que ya tienes
client = genai.Client(api_key=secret_value_0)

# 2. Realizamos la consulta al modelo
# Usaremos 'gemini-2.0-flash', que es la versión más reciente y rápida
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="¿Cuál es la capital de Ecuador?"
)

# 3. Imprimimos la respuesta
print(response.text)



La capital de Ecuador es **Quito**.


## 2. Retrieval

### 2.1 Cargo el corpus de 20 News Groups

In [4]:
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
newsgroupsdocs = newsgroups.data

In [5]:
import pandas as pd

df = pd.DataFrame(newsgroupsdocs, columns=['text'])
df

Unnamed: 0,text
0,\n\nI am sure some bashers of Pens fans are pr...
1,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...
4,1) I have an old Jasmine drive which I cann...
...,...
18841,DN> From: nyeda@cnsvax.uwec.edu (David Nye)\nD...
18842,\nNot in isolated ground recepticles (usually ...
18843,I just installed a DX2-66 CPU in a clone mothe...
18844,\nWouldn't this require a hyper-sphere. In 3-...


### 2.2 Transformo a embeddings

In [6]:
import pandas as pd
import numpy as np
import re
from tqdm.auto import tqdm

# Eliminar filas sin texto
df = df.dropna(subset=["text"]).reset_index(drop=True)

def normalize_text(s: str) -> str:
    s = str(s)
    s = re.sub(r"\s+", " ", s).strip()
    return s

# Aplicamos la normalización
df["text_norm"] = df["text"].map(normalize_text)

In [7]:
def chunk_text(text: str, max_chars: int = 800, overlap: int = 100):
    chunks = []
    start = 0
    n = len(text)
    while start < n:
        end = min(start + max_chars, n)
        chunk = text[start:end].strip()
        if len(chunk) > 0:
            chunks.append(chunk)
        if end == n:
            break
        start = max(0, end - overlap)
    return chunks

# Generamos la nueva tabla de chunks
records = []
for i, row in tqdm(df.iterrows(), total=len(df), desc="Fragmentando textos"):
    chunks = chunk_text(row["text_norm"], max_chars=800, overlap=100)
    for j, ch in enumerate(chunks):
        records.append({
            "doc_id": i,
            "chunk_id": j,
            "text": ch
        })

chunks_df = pd.DataFrame(records)
print(f"Total de chunks generados: {len(chunks_df)}")

Fragmentando textos:   0%|          | 0/18846 [00:00<?, ?it/s]

Total de chunks generados: 38871


In [8]:
from sentence_transformers import SentenceTransformer

# Cargamos el modelo (Kaggle lo descargará automáticamente)
MODEL_NAME = "intfloat/e5-base-v2"
model = SentenceTransformer(MODEL_NAME)

# Preparar los textos con el prefijo obligatorio para E5
passages = ["passage: " + t for t in chunks_df["text"].tolist()]

# Generar embeddings
# Si activaste la GPU, esto será muy rápido
embeddings = model.encode(
    passages,
    batch_size=64,
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype("float32")

print("Matriz de embeddings creada con éxito:", embeddings.shape)

2026-01-09 02:09:58.984682: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767924599.178815      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767924599.233401      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767924599.691920      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767924599.691966      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767924599.691969      55 computation_placer.cc:177] computation placer alr

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Batches:   0%|          | 0/608 [00:00<?, ?it/s]

Matriz de embeddings creada con éxito: (38871, 768)


### 2.3 Creo una query y hago la búsqueda

In [10]:
def embed_query(query: str) -> np.ndarray:
    # E5 requiere el prefijo 'query: ' para la búsqueda
    q = "query: " + query
    vec = model.encode(
        [q],
        convert_to_numpy=True,
        normalize_embeddings=True
    ).astype("float32")
    return vec

# Ejemplo de uso:
pregunta_vector = embed_query("Opiniones sobre la regularizacino de armas")


In [11]:
!pip -q install faiss-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m97.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h

In [12]:
# código base para FAISS
import faiss
import numpy as np

# Asumiendo `embeddings` en un array NxD
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

D, I = index.search(pregunta_vector, k=10)

In [13]:
print(I[0][0])

28857


In [23]:
passages[I[0][0]]
#obtener los 10 passages
# se le manda al llm una query donde se muestra el 

'passage: striction, or a nuke and draw a 5 mile restriction? To me they al suffer from the fundamental flaw that they restrict based upon the instrument rather than placing the responsibility for usage squarely upon the shoulders of the user. Perhaps Sen. Metzenbaum declaring the Barrett Light Fifty an assault rifle has made this more apparent to me, since the Barrett has only range and acurracy going for it. I disagree, on the grounds that a house can be rebuilt much more easily than my family once I have died. I assume that word would get to the citizens that such an attack was planned. If this is not the case, the tactical and strategic implications change quite a bit. Personally, my home is worth, say, twenty Martians intent on taking over the world. My family? All of them. The balancing act he'

In [24]:
top_5_indices = I[0][:5]

contextos_recuperados = chunks_df.iloc[top_5_indices]['text'].tolist()

print(contextos_recuperados)

['striction, or a nuke and draw a 5 mile restriction? To me they al suffer from the fundamental flaw that they restrict based upon the instrument rather than placing the responsibility for usage squarely upon the shoulders of the user. Perhaps Sen. Metzenbaum declaring the Barrett Light Fifty an assault rifle has made this more apparent to me, since the Barrett has only range and acurracy going for it. I disagree, on the grounds that a house can be rebuilt much more easily than my family once I have died. I assume that word would get to the citizens that such an attack was planned. If this is not the case, the tactical and strategic implications change quite a bit. Personally, my home is worth, say, twenty Martians intent on taking over the world. My family? All of them. The balancing act he', "just exaggeration for effect following one or more incidents of someone firing a handful of shots from something that may or may not be an Uzi, semi- or full-auto? Until the root conditions that

In [27]:
contexto_fusionado =  "\n".join(contextos_recuperados)
query_usuario = "Opiniones sobre la regularizacino de armas"

# Fase Final uso del API

In [30]:
prompt_pizarra = f""" La query fue: {query_usuario} y los resultados son: {contexto_fusionado} Dame un resumen."""

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents= prompt_pizarra
)


print(response.text)



Los resultados de la búsqueda presentan una serie de argumentos y debates, mayoritariamente con una **postura favorable al derecho a poseer armas** y crítica hacia las regulaciones estrictas. Aquí tienes un resumen de los puntos principales:

**1. Responsabilidad individual vs. Restricción del objeto:**
Varios usuarios argumentan que las leyes actuales fallan al centrarse en restringir el "instrumento" (el arma) en lugar de poner la responsabilidad en el "usuario". Se critica que los políticos a menudo no entienden técnicamente las armas que intentan prohibir (citando el ejemplo del fusil Barrett .50).

**2. La autodefensa como necesidad básica:**
Se sostiene que mientras existan condiciones de criminalidad o posibles abusos gubernamentales, no hay razón para desarmar a la población. Se utiliza el argumento de que es preferible proteger a la familia que a la propiedad, y que el desarme deja a los ciudadanos vulnerables (citando el caso de los no-serbios que no pudieron defenderse).

**