# Ejercicio 9: Uso de la API de Google Gemini

En este ejercicio vamos a aprender a utilizar la API de OpenAI

## 1. Uso básico

El siguiente código sirve para conectarse con la API de Google Gemini de forma básica

In [1]:
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient

# Configuración
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GEMINI_API_KEY")
genai.configure(api_key=api_key)

# 1. Listar modelos disponibles para tu cuenta
print("Modelos disponibles:")
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(f"- {m.name}")

Modelos disponibles:
- models/gemini-2.5-flash
- models/gemini-2.5-pro
- models/gemini-2.0-flash-exp
- models/gemini-2.0-flash
- models/gemini-2.0-flash-001
- models/gemini-2.0-flash-exp-image-generation
- models/gemini-2.0-flash-lite-001
- models/gemini-2.0-flash-lite
- models/gemini-2.0-flash-lite-preview-02-05
- models/gemini-2.0-flash-lite-preview
- models/gemini-exp-1206
- models/gemini-2.5-flash-preview-tts
- models/gemini-2.5-pro-preview-tts
- models/gemma-3-1b-it
- models/gemma-3-4b-it
- models/gemma-3-12b-it
- models/gemma-3-27b-it
- models/gemma-3n-e4b-it
- models/gemma-3n-e2b-it
- models/gemini-flash-latest
- models/gemini-flash-lite-latest
- models/gemini-pro-latest
- models/gemini-2.5-flash-lite
- models/gemini-2.5-flash-image-preview
- models/gemini-2.5-flash-image
- models/gemini-2.5-flash-preview-09-2025
- models/gemini-2.5-flash-lite-preview-09-2025
- models/gemini-3-pro-preview
- models/gemini-3-flash-preview
- models/gemini-3-pro-image-preview
- models/nano-banana-pr

In [2]:
model = genai.GenerativeModel('gemini-2.5-flash-lite')

try:
    response = model.generate_content("Escribe un poema corto sobre Python.")
    print(response.text)
except Exception as e:
    print(f"Error: {e}")

Con código que fluye, claro y gentil,
Python danza, un lenguaje ágil y sutil.
De sintaxis sencilla, un placer al leer,
Ideas complejas, fácil de tejer.

De ciencia a web, su alcance es profundo,
Un amigo fiel, que ayuda en todo el mundo.


## 2. Retrieval

In [3]:
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
import re

### 2.1 Cargo el corpus de 20 News Groups

In [4]:
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
newsgroupsdocs = newsgroups.data

In [5]:
df = pd.DataFrame(newsgroupsdocs, columns=['text'])
df

Unnamed: 0,text
0,\n\nI am sure some bashers of Pens fans are pr...
1,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...
4,1) I have an old Jasmine drive which I cann...
...,...
18841,DN> From: nyeda@cnsvax.uwec.edu (David Nye)\nD...
18842,\nNot in isolated ground recepticles (usually ...
18843,I just installed a DX2-66 CPU in a clone mothe...
18844,\nWouldn't this require a hyper-sphere. In 3-...


### 2.2 Transformo a embeddings

In [6]:
# Limpieza básica
def normalize_text(s: str) -> str:
    s = re.sub(r"\s+", " ", s).strip()
    return s

df["text_norm"] = df["text"].astype(str).map(normalize_text)

df.head()

Unnamed: 0,text,text_norm
0,\n\nI am sure some bashers of Pens fans are pr...,I am sure some bashers of Pens fans are pretty...
1,My brother is in the market for a high-perform...,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...,Finally you said what you dream about. Mediter...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,Think! It's the SCSI card doing the DMA transf...
4,1) I have an old Jasmine drive which I cann...,1) I have an old Jasmine drive which I cannot ...


In [7]:
def chunk_text(text: str, max_chars: int = 800, overlap: int = 100):
    """
    Chunking por caracteres.
    max_chars ~ 600-1000 suele funcionar bien.
    overlap ayuda a no cortar ideas a la mitad.
    """
    chunks = []
    start = 0
    n = len(text)
    while start < n:
        end = min(start + max_chars, n)
        chunk = text[start:end]
        chunk = chunk.strip()
        if len(chunk) > 0:
            chunks.append(chunk)
        if end == n:
            break
        start = max(0, end - overlap)
    return chunks

records = []
for i, row in df.iterrows():
    chunks = chunk_text(row["text_norm"], max_chars=800, overlap=100)
    for j, ch in enumerate(chunks):
        records.append({
            "doc_id": int(i),
            "chunk_id": j,
            "text": ch
        })

chunks_df = pd.DataFrame(records)
chunks_df.head(), len(chunks_df)

(   doc_id  chunk_id                                               text
 0       0         0  I am sure some bashers of Pens fans are pretty...
 1       1         0  My brother is in the market for a high-perform...
 2       2         0  Finally you said what you dream about. Mediter...
 3       2         1  urds and Turks once upon a time! Ohhhh so swed...
 4       3         0  Think! It's the SCSI card doing the DMA transf...,
 38871)

In [8]:
from sentence_transformers import SentenceTransformer

MODEL_NAME = "intfloat/e5-base-v2"   # recomendado para retrieval
model = SentenceTransformer(MODEL_NAME)

# Textos a indexar (pasajes)
passages = ["passage: " + t for t in chunks_df["text"].tolist()]

2026-01-08 02:27:23.564769: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767839243.760106      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767839243.817650      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767839244.281099      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767839244.281136      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767839244.281138      55 computation_placer.cc:177] computation placer alr

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [9]:
# Embeddings (N x D)
# Se debe usar normalize_embeddings=True para similitud coseno
embeddings = model.encode(
    passages,
    batch_size=16,
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype("float32")

Batches:   0%|          | 0/2430 [00:00<?, ?it/s]

In [10]:
print(embeddings.shape, embeddings.dtype)

(38871, 768) float32


In [17]:
def embed_query(query: str) -> np.ndarray:
    q = "query: " + query
    vec = model.encode(
        [q],
        convert_to_numpy=True,
        normalize_embeddings=True
    ).astype("float32")
    return vec

query_text = "Battery measuring"

query_vec = embed_query(query_text)
query_vec.shape

AttributeError: 'GenerativeModel' object has no attribute 'encode'

In [12]:
query_vec

array([[-1.07935071e-03, -2.07032822e-03, -4.41495404e-02,
        -7.61255762e-03,  3.96465361e-02, -2.97406577e-02,
         4.55089770e-02, -6.22369675e-03, -3.77077386e-02,
        -2.15643505e-03,  2.97867134e-02,  2.51323860e-02,
        -5.72931841e-02, -5.28431730e-03, -4.42375205e-02,
         6.99782521e-02,  5.34066968e-02, -2.66060755e-02,
         2.87609082e-02, -4.07173075e-02, -3.77815701e-02,
        -2.91277766e-02,  7.67262131e-02,  6.65487815e-03,
        -1.75847691e-02, -7.17045739e-03,  1.26900319e-02,
         3.81892733e-02, -6.41361475e-02, -1.40672009e-02,
         1.57396570e-02,  5.44323102e-02,  2.12979708e-02,
        -5.14489561e-02, -1.10170217e-02,  1.95845198e-02,
        -3.99559364e-02, -3.30117345e-02, -5.14741205e-02,
        -3.39408778e-02, -2.94993073e-02, -8.13559722e-03,
        -3.25340964e-02,  2.50287484e-02, -3.93804386e-02,
        -9.49668931e-04, -3.21114920e-02, -6.22754870e-03,
        -2.85882037e-02, -8.97681713e-03, -5.10003194e-0

### 2.3 Creo una query y hago la búsqueda

Obtengo los 5 documentos más similares a mi query

In [13]:
# Calculamos la similitud (producto punto) entre la query y todos los embeddings
# Como los vectores ya están normalizados, esto equivale a la similitud coseno
scores = np.dot(embeddings, query_vec.T).flatten()

# Obtenemos los índices de los 5 resultados con mayor puntuación
top_k = 5
top_indices = np.argsort(scores)[::-1][:top_k]

print(f"Búsqueda finalizada para: '{query_text}'")

Búsqueda finalizada para: 'Battery measuring'


In [14]:
print(f"Resultados más relevantes para: {query_text}\n")

for i, idx in enumerate(top_indices):
    score = scores[idx]
    chunk_text = chunks_df.iloc[idx]["text"]
    doc_id = chunks_df.iloc[idx]["doc_id"]
    
    print(f"{i+1}. [Similitud: {score:.4f}] (Doc ID: {doc_id})")
    print(f"Texto: {chunk_text}")
    print("-" * 80)

Resultados más relevantes para: Battery measuring

1. [Similitud: 0.8232] (Doc ID: 4128)
Texto: bit more of the [mind-boggling] theory? Take care. P.S. My goal is 12V @ ~25A in (car battery) -> 250VAC out and (on the other end) 250V -> +5VDC @ 5A, -5V @ 1A, +12VDC @8A and -12VDC @1A... the distance between the two will be more than 100 feet (of 14-16 gauge) but less than 300 feet. Would like to have a working model in a year or so... :-) (Do I have a chance to make it?)
--------------------------------------------------------------------------------
2. [Similitud: 0.8222] (Doc ID: 2078)
Texto: I hope David isn't going to be too upset with me for sticking my nose in here again, but here goes......:-) It isn't the average temperature that is the key factor here, but rather which is better at transferring the heat out of the (presumably warmer than ground temperature) battery. Call it a question of thermal conductivity, or of insulating ability, or "thermal mass" - whatever you like. Ques

In [15]:
# Construimos el contexto con los fragmentos recuperados
context = "\n\n".join(chunks_df.iloc[top_indices]["text"].tolist())
context

prompt = f"""
Utiliza la siguiente información extraída de un corpus de noticias 
para dar un resumen de los resultados. Si la información no es suficiente, indícalo.

CONTEXTO:
{context}

PREGUNTA DEL USUARIO:
{query_text}

RESPUESTA:
"""

model = genai.GenerativeModel('gemini-2.5-flash-lite')

try:
    response = model.generate_content(prompt)
    print(response.text)
except Exception as e:
    print(f"Error: {e}")

La información proporcionada no es suficiente para ofrecer un resumen sobre "Battery measuring". Los fragmentos de texto abordan temas como:

*   **Proyectos de electrónica:** Se menciona la teoría de conversión de voltaje (12V a 250VAC y viceversa), con requisitos de potencia y distancia específicos.
*   **Transferencia de calor y materiales:** Se discute por qué un piso de concreto se siente más frío que la tierra circundante, relacionándolo con la conductividad térmica y la masa térmica.
*   **Construcción de baterías caseras:** Hay consultas sobre cómo construir una batería casera para un proyecto escolar, preguntando por electrolitos y metales fáciles de conseguir, e incluso se menciona un ejemplo de la infancia con cubeta galvanizada, flotador de cobre y chucrut como electrolito.
*   **Instalación de soportes de batería:** Se da una indicación general sobre la ubicación de puntos de conexión para reemplazar un soporte de batería.

Ninguno de estos fragmentos trata directamente so