# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [None]:
!pip install beir

In [3]:
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

  from tqdm.autonotebook import tqdm


In [4]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

../data/beir_datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

'../data/beir_datasets/scifact'

In [5]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

  0%|          | 0/5183 [00:00<?, ?it/s]

In [6]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [7]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [8]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [9]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


In [None]:
!pip install rank_bm25 sentence-transformers xgboost beir

## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

In [None]:
import numpy as np
from rank_bm25 import BM25Okapi
from beir.retrieval.evaluation import EvaluateRetrieval
import string

In [None]:
# 1. Preprocesamiento simple (tokenización)
# Convertimos el corpus a una lista de textos tokenizados
tokenized_corpus = [doc.split(" ") for doc in df_corpus['text'].tolist()]

# 2. Indexación con BM25
bm25 = BM25Okapi(tokenized_corpus)

# 3. Función de búsqueda
def search_bm25(query, top_k=100):
    tokenized_query = query.split(" ")
    # Obtener scores
    doc_scores = bm25.get_scores(tokenized_query)
    # Obtener los índices de los top-k documentos
    top_n_indices = np.argsort(doc_scores)[::-1][:top_k]
    
    hits = {}
    for idx in top_n_indices:
        doc_id = df_corpus.iloc[idx]['doc_id']
        score = doc_scores[idx]
        hits[doc_id] = float(score) # Beir espera float
    return hits

In [11]:
# 4. Generar resultados para todas las queries (Retrieval Inicial)
results_bm25 = {}
for index, row in df_queries.iterrows():
    qid = row['query_id']
    query_text = row['query']
    results_bm25[qid] = search_bm25(query_text, top_k=100) # Recuperamos 100 para luego re-rankear

# 5. Evaluación del Baseline
print("--- Evaluación BM25 (Baseline) ---")
ndcg, _map, recall, _ = EvaluateRetrieval.evaluate(qrels, results_bm25, k_values=[10])
print(f"nDCG@10: {ndcg['NDCG@10']:.4f}")
print(f"Recall@10: {recall['Recall@10']:.4f}")

--- Evaluación BM25 (Baseline) ---
nDCG@10: 0.5056
Recall@10: 0.6247


## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [12]:
from sentence_transformers import CrossEncoder

# 1. Cargar el modelo Cross-Encoder (usamos uno ligero para el lab)
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# 2. Función de Re-ranking
results_cross_encoder = {}

# Iteramos sobre los resultados de BM25
for qid, hits in results_bm25.items():
    query_text = df_queries[df_queries['query_id'] == qid]['query'].values[0]
    
    # Preparamos los pares [Query, Documento]
    # hits es un dict {doc_id: score}, necesitamos el texto del doc
    pairs = []
    doc_ids = []
    
    for doc_id in hits:
        doc_text = df_corpus[df_corpus['doc_id'] == doc_id]['text'].values[0]
        pairs.append([query_text, doc_text])
        doc_ids.append(doc_id)
    
    # Predecimos scores con el modelo (re-ranking)
    if len(pairs) > 0:
        scores = cross_encoder.predict(pairs)
        
        # Guardamos los nuevos resultados
        new_hits = {}
        for i, doc_id in enumerate(doc_ids):
            new_hits[doc_id] = float(scores[i])
        
        # Ordenamos por score descendente (opcional, beir lo hace, pero es bueno para ver)
        results_cross_encoder[qid] = dict(sorted(new_hits.items(), key=lambda item: item[1], reverse=True))

# 3. Análisis de cambio de posición (Query de ejemplo 133)
qid_test = "133"
print(f"--- Análisis para Query {qid_test} ---")
top10_bm25 = list(results_bm25[qid_test].keys())[:10]
top10_ce = list(results_cross_encoder[qid_test].keys())[:10]

print("Top 10 BM25:", top10_bm25)
print("Top 10 Cross-Encoder:", top10_ce)

# Ver qué documentos nuevos entraron al top 10 gracias al re-ranking
nuevos_docs = set(top10_ce) - set(top10_bm25)
print("Documentos que entraron al Top 10 con Re-ranking:", nuevos_docs)

2026-01-12 16:34:32.339468: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768235672.678537      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768235672.772211      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768235673.628565      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768235673.628613      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768235673.628616      55 computation_placer.cc:177] computation placer alr

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

--- Análisis para Query 133 ---
Top 10 BM25: ['26688294', '37964706', '5270265', '12785130', '45764440', '9507605', '5821617', '23076291', '29073751', '4399311']
Top 10 Cross-Encoder: ['16280642', '12640810', '6969753', '21295300', '9507605', '17934082', '86694016', '19752008', '9063688', '1410197']
Documentos que entraron al Top 10 con Re-ranking: {'16280642', '21295300', '17934082', '12640810', '9063688', '86694016', '1410197', '19752008', '6969753'}


## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [13]:
import xgboost as xgb

# 1. Construcción del Dataset para LTR (Features: BM25 Score, Doc Length)
X = []
y = []
groups = [] # XGBoost Ranker necesita saber cuántos docs pertenecen a cada query
ids_mapping = [] # Para reconstruir los resultados

for qid in results_bm25:
    if qid in qrels: # Solo usamos queries que tengan relevancia conocida para entrenar
        query_hits = results_bm25[qid]
        relevant_docs = qrels[qid]
        
        current_group_count = 0
        for doc_id, bm25_score in query_hits.items():
            # Feature 1: BM25 Score
            # Feature 2: Largo del documento
            doc_len = len(df_corpus[df_corpus['doc_id'] == doc_id]['text'].values[0].split())
            
            X.append([bm25_score, doc_len])
            
            # Target: Relevancia (1 si está en qrels, 0 si no)
            label = relevant_docs.get(doc_id, 0)
            y.append(label)
            
            ids_mapping.append((qid, doc_id))
            current_group_count += 1
        
        groups.append(current_group_count)

X = np.array(X)
y = np.array(y)

# 2. Entrenar el modelo (XGBRanker)
# Usamos LambdaMART (objective="rank:ndcg")
model = xgb.XGBRanker(  
    tree_method="hist",
    objective="rank:ndcg",
    eval_metric="ndcg@10"
)

# Entrenamos (en práctica real: usar split de train, aquí usamos todo para demo)
model.fit(X, y, group=groups)

# 3. Predecir / Re-rankear
scores_ltr = model.predict(X)

# 4. Reconstruir formato para evaluación
results_ltr = {}
for i, (qid, doc_id) in enumerate(ids_mapping):
    if qid not in results_ltr:
        results_ltr[qid] = {}
    results_ltr[qid][doc_id] = float(scores_ltr[i])

print("Modelo LTR entrenado y predicciones generadas.")

# Comparación rápida para la query de ejemplo
top10_ltr = sorted(results_ltr["133"].items(), key=lambda x: x[1], reverse=True)[:10]
print("\nTop 5 Docs para Query 133 con LTR:", [x[0] for x in top10_ltr[:5]])

Modelo LTR entrenado y predicciones generadas.

Top 5 Docs para Query 133 con LTR: ['16280642', '17829012', '21295300', '37964706', '30379039']


## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

In [14]:
# Lista de modelos a evaluar
methods = {
    "Baseline (BM25)": results_bm25,
    "Re-ranking (Cross-Encoder)": results_cross_encoder,
    "Re-ranking (LTR - XGBoost)": results_ltr
}

metrics_data = []

print("--- Resultados Finales ---")
for name, results in methods.items():
    ndcg, _map, recall, _ = EvaluateRetrieval.evaluate(qrels, results, k_values=[10])
    metrics_data.append({
        "Method": name,
        "nDCG@10": ndcg['NDCG@10'],
        "MAP@10": _map['MAP@10'],
        "Recall@10": recall['Recall@10']
    })

# Mostrar tabla comparativa
df_metrics = pd.DataFrame(metrics_data)
df_metrics

--- Resultados Finales ---


Unnamed: 0,Method,nDCG@10,MAP@10,Recall@10
0,Baseline (BM25),0.50562,0.46273,0.62472
1,Re-ranking (Cross-Encoder),0.62616,0.5889,0.7235
2,Re-ranking (LTR - XGBoost),0.74118,0.72927,0.75733
