# Ejercicio 9: Uso de la API de Google Gemini

En este ejercicio vamos a aprender a utilizar la API de OpenAI

## 1. Uso básico

El siguiente código sirve para conectarse con la API de Google Gemini de forma básica

In [None]:
!pip install python-dotenv



In [None]:
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")

In [None]:
from google import genai

client = genai.Client(api_key="AIzaSyCoHGiv93eV3ItTHWAe_tfLFxFMqEP25xo")

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Explain how AI works in a few words"
)

print(response.text)

AI analyzes data to find patterns and make predictions.


## 2. Retrieval

### 2.1 Cargo el corpus de 20 News Groups

In [None]:
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
newsgroupsdocs = newsgroups.data

In [None]:
type(newsgroupsdocs), len(newsgroupsdocs)

(list, 18846)

In [None]:
import pandas as pd
df = pd.DataFrame(newsgroupsdocs, columns=["text"])
df.head()

Unnamed: 0,text
0,\n\nI am sure some bashers of Pens fans are pr...
1,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...
4,1) I have an old Jasmine drive which I cann...


### 2.2 Transformo a embeddings

In [None]:
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
import re

df = df.dropna(subset=["text"]).reset_index(drop=True)

# Limpieza básica
def normalize_text(s: str) -> str:
    s = re.sub(r"\s+", " ", s).strip()
    return s

df["text_norm"] = df["text"].astype(str).map(normalize_text)

df.head()

Unnamed: 0,text,text_norm
0,\n\nI am sure some bashers of Pens fans are pr...,I am sure some bashers of Pens fans are pretty...
1,My brother is in the market for a high-perform...,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...,Finally you said what you dream about. Mediter...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,Think! It's the SCSI card doing the DMA transf...
4,1) I have an old Jasmine drive which I cann...,1) I have an old Jasmine drive which I cannot ...


In [None]:
def chunk_text(text: str, max_chars: int = 800, overlap: int = 100):
    """
    Chunking por caracteres.
    max_chars ~ 600-1000 suele funcionar bien.
    overlap ayuda a no cortar ideas a la mitad.
    """
    chunks = []
    start = 0
    n = len(text)
    while start < n:
        end = min(start + max_chars, n)
        chunk = text[start:end]
        chunk = chunk.strip()
        if len(chunk) > 0:
            chunks.append(chunk)
        if end == n:
            break
        start = max(0, end - overlap)
    return chunks

records = []
for i, row in df.iterrows():
    chunks = chunk_text(row["text_norm"], max_chars=800, overlap=100)
    for j, ch in enumerate(chunks):
        records.append({
            "doc_id": int(i),
            "chunk_id": j,
            "text": ch
        })

chunks_df = pd.DataFrame(records)
chunks_df.head(), len(chunks_df)

(   doc_id  chunk_id                                               text
 0       0         0  I am sure some bashers of Pens fans are pretty...
 1       1         0  My brother is in the market for a high-perform...
 2       2         0  Finally you said what you dream about. Mediter...
 3       2         1  urds and Turks once upon a time! Ohhhh so swed...
 4       3         0  Think! It's the SCSI card doing the DMA transf...,
 38871)

In [None]:
from sentence_transformers import SentenceTransformer

MODEL_NAME = "intfloat/e5-base-v2"   # recomendado para retrieval
model = SentenceTransformer(MODEL_NAME)

# Textos a indexar (pasajes)
passages = ["passage: " + t for t in chunks_df["text"].tolist()]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [None]:
# Embeddings (N x D)
# Se debe usar normalize_embeddings=True para similitud coseno
embeddings = model.encode(
    passages,
    batch_size=16,
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype("float32")

Batches:   0%|          | 0/2430 [00:00<?, ?it/s]

In [None]:
print(embeddings.shape, embeddings.dtype)

(38871, 768) float32


### 2.3 Creo una query y hago la búsqueda

In [None]:
def embed_query(query: str) -> np.ndarray:
    q = "query: " + query
    vec = model.encode(
        [q],
        convert_to_numpy=True,
        normalize_embeddings=True
    ).astype("float32")
    return vec

query_text = "Battery measuring"

query_vec = embed_query(query_text)
query_vec.shape

(1, 768)

Obtengo los 5 documentos más similares a mi query

In [None]:
!pip install faiss-cpu

import numpy as np
import faiss

# Dimension de los embeddings
D = embeddings.shape[1]

# Creamos un índice FAISS Flat Inner Product (IP) ya que los embeddings están normalizados
index = faiss.IndexFlatIP(D)

# Añadimos los embeddings al índice
index.add(embeddings)

# Número de documentos a recuperar
k = 5

# Realizamos la búsqueda
distances, indices = index.search(query_vec, k)

print(f"Top {k} documentos más relevantes para la consulta '{query_text}':")
for i in range(k):
    doc_index = indices[0][i]
    score = distances[0][i]
    print(f"\n--- Documento {i+1} (Score: {score:.4f}) ---")
    print(f"{passages[doc_index]}")

Top 5 documentos más relevantes para la consulta 'Battery measuring':

--- Documento 1 (Score: 0.8232) ---
passage: bit more of the [mind-boggling] theory? Take care. P.S. My goal is 12V @ ~25A in (car battery) -> 250VAC out and (on the other end) 250V -> +5VDC @ 5A, -5V @ 1A, +12VDC @8A and -12VDC @1A... the distance between the two will be more than 100 feet (of 14-16 gauge) but less than 300 feet. Would like to have a working model in a year or so... :-) (Do I have a chance to make it?)

--- Documento 2 (Score: 0.8222) ---
passage: I hope David isn't going to be too upset with me for sticking my nose in here again, but here goes......:-) It isn't the average temperature that is the key factor here, but rather which is better at transferring the heat out of the (presumably warmer than ground temperature) battery. Call it a question of thermal conductivity, or of insulating ability, or "thermal mass" - whatever you like. Question - why does a concrete floor feel cooler than the surrou

Uso del LLM (Gemini) con los resultados anteriores

In [None]:
context_docs = []
for i in range(k):
    doc_index = indices[0][i]
    context_docs.append(passages[doc_index])

context = "\n\n".join(context_docs)

prompt = f"Based on the following documents and the query '{query_text}', provide a concise summary.\n\nDocuments:\n{context}"

response_gemini = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=prompt
)

print("Summary from Gemini:")
print(response_gemini.text)

Summary from Gemini:
Based on the provided documents, there is **no specific information regarding the measurement of batteries** (such as testing capacity, state of charge, or using a multimeter).

The documents instead focus on:
*   **Power Conversion Goals:** One passage outlines a technical goal to convert a 12V car battery's output to 250VAC and then back to various DC voltages (+5V, -5V, +12V, -12V) over a long distance.
*   **Thermal Properties:** Another passage discusses the thermal conductivity and heat transfer of batteries compared to ground materials like concrete.
*   **DIY Construction:** Two passages describe how to build a simple battery for a science project, suggesting materials like an ice cube tray, galvanized buckets (zinc), copper toilet floats, and sauerkraut as an electrolyte.
*   **Hardware Installation:** A brief mention is made regarding installing a battery holder on a circuit board.
