# Ejercicio 9: Uso de la API de Google Gemini

En este ejercicio vamos a aprender a utilizar la API de OpenAI

## Uso básico

In [20]:
!pip install python-dotenv



In [21]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyBy3j48zjdZtUIvPejBgR9odzd_Yd0m8lw"

In [22]:
from dotenv import load_dotenv

load_dotenv()
api_key = os.environ["GOOGLE_API_KEY"]

In [23]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Explícame qué es machine learning en una frase"
)

print(response.text)


El **machine learning** es una rama de la inteligencia artificial que permite a las computadoras aprender de los datos y mejorar su rendimiento en tareas específicas sin ser programadas explícitamente para ello.


## Retrieval


### Carga del Corpus

In [24]:
from sklearn.datasets import fetch_20newsgroups

# Cargar el conjunto completo (train + test)
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
newsgroupsdocs = newsgroups.data

In [25]:
#Pasar el corpus a DataFrame
import pandas as pd

df = pd.DataFrame({
    "texto": newsgroupsdocs
})

df.head()

Unnamed: 0,texto
0,\n\nI am sure some bashers of Pens fans are pr...
1,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...
4,1) I have an old Jasmine drive which I cann...


In [27]:
#Limpieza y normalizacion del corpus
import re

df = df.dropna(subset=["texto"]).reset_index(drop=True)

def normalize_text(s: str) -> str:
    s = re.sub(r"\s+", " ", s).strip()
    return s

df["text_norm"] = df["texto"].astype(str).map(normalize_text)

df.head()

Unnamed: 0,texto,text_norm
0,\n\nI am sure some bashers of Pens fans are pr...,I am sure some bashers of Pens fans are pretty...
1,My brother is in the market for a high-perform...,My brother is in the market for a high-perform...
2,\n\n\n\n\tFinally you said what you dream abou...,Finally you said what you dream about. Mediter...
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,Think! It's the SCSI card doing the DMA transf...
4,1) I have an old Jasmine drive which I cann...,1) I have an old Jasmine drive which I cannot ...


### Transformo a embeddings

In [28]:
from sentence_transformers import SentenceTransformer

MODEL_NAME = "intfloat/e5-base-v2"   # recomendado para retrieval
model = SentenceTransformer(MODEL_NAME)

# Textos a indexar (pasajes)
passages = ["passage: " + t for t in df["text_norm"].tolist()]

In [29]:
# Embeddings (N x D)
# Se debe usar normalize_embeddings=True para similitud coseno
embeddings = model.encode(
    passages,
    batch_size=16,
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype("float32")

Batches:   0%|          | 0/1178 [00:00<?, ?it/s]

### Creo una query y hago la búsqueda

In [48]:
# Query de búsqueda
query = "market in the world"

# Embedding de la query
query_embedding = model.encode(
    ["query: " + query],
    normalize_embeddings=True,
    convert_to_numpy=True
).astype("float32")

In [31]:
!pip install faiss-cpu



In [49]:
#Búsqueda por similitud coseno usando FAISS

import faiss

# Dimensión del embedding
dim = embeddings.shape[1]

# Índice FAISS para similitud coseno
index = faiss.IndexFlatIP(dim)
index.add(embeddings)

# Número de resultados a recuperar
k = 5

# Búsqueda
scores, indices = index.search(query_embedding, k)


### Top 5 documentos mas similares a la query

In [56]:
# Documentos relevantes

print(f"Query usada: {query}")

for i, idx in enumerate(indices[0], start=1):
    score = scores[0][i-1]
    texto = df.iloc[idx]["text_norm"]

    snippet = texto[:1000].replace("\n", " ")

    print("=" * 80)
    print(f"Resultado {i}")
    print(f"Similitud (coseno): {score:.4f}")
    print(f"Doc:")
    print(f"{snippet}...")


Query usada: market in the world
Resultado 1
Similitud (coseno): 0.8210
Doc:
Original to: szabo@techbook.com G'day szabo@techbook.com 29 Mar 93 07:28, szabo@techbook.com wrote to All: sc> szabo@techbook.com (Nick Szabo), via Kralizec 3:713/602 sc> Here are some longer-term markets to consider: Here are some more: * Terrestrial illumination from orbiting mirrors. * World enviroment and disaster monitering system. (the Japanese have already developed a plan for this, called WEDOS) Although this may be more of a "public good". * Space tourism. * Energy relay satellites ta Ralph...
Resultado 2
Similitud (coseno): 0.7930
Doc:
Tell that to the Japanese, their local market is neatly protected by the Japanese government. Its one very tough nut to crack. In fact the only current way to break into it, is to do it with a Japanese company as a partner in the venture. Gary --...
Resultado 3
Similitud (coseno): 0.7879
Doc:
No! Distribution keywords are case sensitive. What you want is Distribution: 