# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [None]:
# Instalación de dependencias en Kaggle
!pip install beir rank_bm25 sentence-transformers

In [1]:
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

  from tqdm.autonotebook import tqdm


In [2]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

../data/beir_datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

'../data/beir_datasets/scifact'

In [3]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

  0%|          | 0/5183 [00:00<?, ?it/s]

In [4]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [7]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [5]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [8]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

In [11]:
from rank_bm25 import BM25Okapi
from beir.retrieval.evaluation import EvaluateRetrieval
import string
from tqdm import tqdm

In [13]:
#Preprocesamiento simple para BM25 (tokenización)
print("Tokenizando corpus para BM25...")
corpus_ids = list(corpus.keys())
tokenized_corpus = [
    (corpus[doc_id]['title'] + " " + corpus[doc_id]['text']).lower().split() 
    for doc_id in corpus_ids
]

Tokenizando corpus para BM25...


In [14]:
# Indexación
bm25 = BM25Okapi(tokenized_corpus)

In [17]:
# Recuperamos los Top-100 documentos para luego re-rankearlos
TOP_K_CANDIDATES = 100
bm25_results = {}
print("Ejecutando búsqueda inicial BM25...")
for qid, query_text in tqdm(queries.items()):
    tokenized_query = query_text.lower().split()
    # Obtener scores
    doc_scores = bm25.get_scores(tokenized_query)
    # Mapear doc_id -> score
    scores_dict = {corpus_ids[i]: float(doc_scores[i]) for i in range(len(corpus_ids))}
    
    # Ordenar y cortar top-k para guardar memoria y preparar re-ranking
    top_k = sorted(scores_dict.items(), key=lambda item: item[1], reverse=True)[:TOP_K_CANDIDATES]
    bm25_results[qid] = dict(top_k)

Ejecutando búsqueda inicial BM25...


100%|██████████| 300/300 [00:05<00:00, 52.44it/s]


In [18]:
# Evaluación preliminar
ndcg, _map, recall, _ = EvaluateRetrieval.evaluate(qrels, bm25_results, [1, 10, 100])
print(f"BM25 Baseline -> nDCG@10: {ndcg['NDCG@10']:.4f}, Recall@10: {recall['Recall@10']:.4f}")

BM25 Baseline -> nDCG@10: 0.5597, Recall@10: 0.6862


## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [19]:
from sentence_transformers import CrossEncoder

2026-01-13 16:47:35.340315: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768322855.535535     150 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768322855.590026     150 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768322856.078606     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768322856.078644     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768322856.078647     150 computation_placer.cc:177] computation placer alr

In [20]:
# Cargar modelo Cross-Encoder 
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

ce_results = {}

print("Re-rankeando con Cross-Encoder...")
for qid in tqdm(bm25_results):
    query_text = queries[qid]
    # Tomamos los candidatos que BM25 encontró
    candidate_doc_ids = list(bm25_results[qid].keys())
    
    # Preparamos pares [Query, Doc Title + Doc Text]
    pairs = [[query_text, corpus[did]['title'] + " " + corpus[did]['text']] for did in candidate_doc_ids]
    
    # Predecir scores
    scores = cross_encoder.predict(pairs)
    
    # Guardar resultados
    ce_results[qid] = {did: float(scr) for did, scr in zip(candidate_doc_ids, scores)}

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

Re-rankeando con Cross-Encoder...


100%|██████████| 300/300 [02:40<00:00,  1.87it/s]


In [21]:
# Ver cambio de posiciones para una query
qid_ejemplo = list(queries.keys())[0] # Tomamos la primera query
print(f"\nQuery: {queries[qid_ejemplo]}")
print("Top 3 BM25:", list(bm25_results[qid_ejemplo].keys())[:3])


Query: 0-dimensional biomaterials show inductive properties.
Top 3 BM25: ['825728', '10931595', '43385013']


In [22]:
# Ordenamos los resultados del CE para mostrar el top 3 real
sorted_ce = sorted(ce_results[qid_ejemplo].items(), key=lambda x: x[1], reverse=True)
print("Top 3 Cross-Encoder:", [x[0] for x in sorted_ce[:3]])

Top 3 Cross-Encoder: ['43385013', '37437064', '10906636']


## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [23]:
from sentence_transformers import SentenceTransformer, util
import torch

In [24]:
# calcula embeddings por separado
bi_encoder = SentenceTransformer('all-MiniLM-L6-v2')

# Pre-computar embeddings de todos los documentos candidatos 
bi_encoder_results = {}

print("Re-rankeando con Bi-Encoder (Dense Retrieval)...")

for qid in tqdm(bm25_results):
    query_text = queries[qid]
    candidate_doc_ids = list(bm25_results[qid].keys())
    candidate_texts = [corpus[did]['title'] + " " + corpus[did]['text'] for did in candidate_doc_ids]
    
    # 1. Embedding de la query
    query_emb = bi_encoder.encode(query_text, convert_to_tensor=True)
    
    # 2. Embeddings de los documentos candidatos
    doc_embs = bi_encoder.encode(candidate_texts, convert_to_tensor=True)
    
    # 3. Similitud Coseno
    # util.cos_sim devuelve una matriz, tomamos la primera fila
    scores = util.cos_sim(query_emb, doc_embs)[0]
    
    # Guardar resultados
    bi_encoder_results[qid] = {did: float(scores[i]) for i, did in enumerate(candidate_doc_ids)}

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Re-rankeando con Bi-Encoder (Dense Retrieval)...


100%|██████████| 300/300 [01:21<00:00,  3.69it/s]


## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

In [25]:
print("Evaluando resultados finales...")

# Evaluar Cross-Encoder
print("Evaluando Cross-Encoder...")
ndcg_ce, map_ce, recall_ce, _ = EvaluateRetrieval.evaluate(qrels, ce_results, [1, 10])

# Evaluar Bi-Encoder
print("Evaluando Bi-Encoder...")
ndcg_bi, map_bi, recall_bi, _ = EvaluateRetrieval.evaluate(qrels, bi_encoder_results, [1, 10])

# Crear tabla comparativa
results_data = {
    "Modelo": ["BM25 (Baseline)", "Bi-Encoder (Dense)", "Cross-Encoder (Re-ranker)"],
    "nDCG@10": [ndcg['NDCG@10'], ndcg_bi['NDCG@10'], ndcg_ce['NDCG@10']],
    "Recall@10": [recall['Recall@10'], recall_bi['Recall@10'], recall_ce['Recall@10']],
    "MAP@10": [_map['MAP@10'], map_bi['MAP@10'], map_ce['MAP@10']]
}

Evaluando resultados finales...
Evaluando Cross-Encoder...
Evaluando Bi-Encoder...


In [26]:
df_results = pd.DataFrame(results_data)
df_results

Unnamed: 0,Modelo,nDCG@10,Recall@10,MAP@10
0,BM25 (Baseline),0.5597,0.68617,0.51473
1,Bi-Encoder (Dense),0.60903,0.71961,0.56872
2,Cross-Encoder (Re-ranker),0.65092,0.74961,0.61339
