# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [2]:
!pip install beir

Collecting beir
  Downloading beir-2.2.0-py3-none-any.whl.metadata (28 kB)
Collecting pytrec-eval-terrier (from beir)
  Downloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Downloading beir-2.2.0-py3-none-any.whl (77 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (304 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m304.8/304.8 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytrec-eval-terrier, beir
Successfully installed beir-2.2.0 pytrec-eval-terrier-0.5.10


In [3]:
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

  from tqdm.autonotebook import tqdm


In [4]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

../data/beir_datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

'../data/beir_datasets/scifact'

In [5]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

  0%|          | 0/5183 [00:00<?, ?it/s]

In [6]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [7]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [8]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [9]:
# Cuenta las repeticiones y ordena de mayor a menor
repeticiones = df_qrels['query_id'].value_counts()
print(repeticiones[repeticiones > 1])

query_id
873     5
133     5
1379    4
971     4
179     4
275     3
1274    3
1029    3
597     3
70      2
261     2
619     2
967     2
1132    2
452     2
540     2
641     2
1041    2
385     2
141     2
728     2
36      2
343     2
Name: count, dtype: int64


In [10]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "873"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Obesity is determined solely by environmental factors.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
213,873,1180972,1
214,873,19307912,1
215,873,27393799,1
216,873,29025270,1
217,873,3315558,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

In [11]:
!pip install elasticsearch beir

Collecting elasticsearch
  Downloading elasticsearch-9.2.1-py3-none-any.whl.metadata (8.9 kB)
Collecting elastic-transport<10,>=9.2.0 (from elasticsearch)
  Downloading elastic_transport-9.2.1-py3-none-any.whl.metadata (3.9 kB)
Downloading elasticsearch-9.2.1-py3-none-any.whl (963 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m963.6/963.6 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading elastic_transport-9.2.1-py3-none-any.whl (65 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.3/65.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-9.2.1 elasticsearch-9.2.1


In [12]:
!pip install rank-bm25

Collecting rank-bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank-bm25
Successfully installed rank-bm25-0.2.2


In [13]:
import numpy as np
from rank_bm25 import BM25Okapi
from beir.retrieval.evaluation import EvaluateRetrieval

# 1. Preparar el corpus para BM25 (tokenización básica)
corpus_ids = df_corpus["doc_id"].tolist()
# Combinamos título y texto para una mejor búsqueda léxica
tokenized_corpus = [
    (doc["title"] + " " + doc["text"]).lower().split()
    for doc in corpus.values()
]

# 2. Inicializar el modelo Rank-BM25
bm25_model = BM25Okapi(tokenized_corpus)

# 3. Implementar la función de búsqueda manual compatible con BEIR
results = {}
top_k = 100 # Recuperamos 100 para tener margen para el re-ranking posterior

print("Iniciando retrieval inicial con BM25...")
for qid, query_text in queries.items():
    query_tokens = query_text.lower().split()
    scores = bm25_model.get_scores(query_tokens)

    # Obtener los top-k índices
    top_n_indices = np.argsort(scores)[::-1][:top_k]

    # Guardar resultados en el formato que BEIR entiende: {qid: {doc_id: score}}
    results[qid] = {
        corpus_ids[idx]: float(scores[idx])
        for idx in top_n_indices
    }

# 4. Evaluación de métricas usando BEIR
# Usamos una clase "dummy" para que EvaluateRetrieval pueda procesar los resultados
retriever = EvaluateRetrieval(None, score_function="dot")
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, [1, 5, 10])

# 5. Mostrar resultados
print(f"\n--- MÉTRICAS BASELINE (BM25) ---")
print(f"nDCG@10: {ndcg['NDCG@10']:.4f}")
print(f"Recall@10: {recall['Recall@10']:.4f}")

Iniciando retrieval inicial con BM25...

--- MÉTRICAS BASELINE (BM25) ---
nDCG@10: 0.5597
Recall@10: 0.6862


## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

###Instalación de la librería necesaria

In [14]:
!pip install sentence-transformers



###Implementación del Re-ranking Cross-Encoder

In [15]:
from sentence_transformers import CrossEncoder
import numpy as np

# 1. Cargar el modelo Cross-Encoder
model_ce = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# 2. Configurar el re-ranking
rerank_results = {}
k_para_rerank = 100  # Re-rankeamos los candidatos obtenidos en la Parte 2

print("Iniciando re-ranking con Cross-Encoder...")

for qid in results:
    query_text = queries[qid]
    # Obtener los IDs de los documentos candidatos de la Parte 2
    candidate_doc_ids = list(results[qid].keys())

    # Preparar los pares (query, doc_text) para el modelo
    # Combinamos título y texto del documento para el modelo
    pairs = []
    for doc_id in candidate_doc_ids:
        doc_data = corpus[doc_id]
        doc_text = doc_data.get("title", "") + " " + doc_data.get("text", "")
        pairs.append([query_text, doc_text])

    # 3. Calcular scores con el Cross-Encoder
    ce_scores = model_ce.predict(pairs)

    # 4. Crear el nuevo diccionario de resultados re-rankeados
    rerank_results[qid] = {
        doc_id: float(score)
        for doc_id, score in zip(candidate_doc_ids, ce_scores)
    }

print("Re-ranking completado.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

Iniciando re-ranking con Cross-Encoder...
Re-ranking completado.


In [16]:
def get_top_n(results_dict, qid, n=10):
    # Ordena los resultados por score de mayor a menor
    sorted_docs = sorted(results_dict[qid].items(), key=lambda item: item[1], reverse=True)
    return [doc[0] for doc in sorted_docs[:n]]

# Elegimos la query 873 para el análisis
target_qid = "873"

top10_bm25 = get_top_n(results, target_qid, 10)
top10_ce = get_top_n(rerank_results, target_qid, 10)

print(f"--- Comparación Top 10 para Query ID: {target_qid} ---")
print(f"{'Posición':<10} | {'BM25 (Original)':<20} | {'Cross-Encoder (Re-ranked)':<20}")
print("-" * 60)

for i in range(10):
    bm25_doc = top10_bm25[i]
    ce_doc = top10_ce[i]
    status = "Mismo" if bm25_doc == ce_doc else "CAMBIO"
    print(f"{i+1:<10} | {bm25_doc:<20} | {ce_doc:<20} | {status}")

# Identificar documentos que subieron al top 10
nuevos = set(top10_ce) - set(top10_bm25)
print(f"\nDocumentos que entraron al Top 10 gracias al re-ranking: {nuevos}")

--- Comparación Top 10 para Query ID: 873 ---
Posición   | BM25 (Original)      | Cross-Encoder (Re-ranked)
------------------------------------------------------------
1          | 13441037             | 41310252             | CAMBIO
2          | 12658073             | 12658073             | Mismo
3          | 38794814             | 4345757              | CAMBIO
4          | 25973484             | 52805891             | CAMBIO
5          | 21874312             | 21641088             | CAMBIO
6          | 4345757              | 5268462              | CAMBIO
7          | 52805891             | 29947146             | CAMBIO
8          | 10365749             | 13944805             | CAMBIO
9          | 41310252             | 25973484             | CAMBIO
10         | 14367469             | 27686445             | CAMBIO

Documentos que entraron al Top 10 gracias al re-ranking: {'27686445', '21641088', '29947146', '5268462', '13944805'}


## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

###Implementación del re-ranking LTR

In [17]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
import numpy as np

# 1. Preparación de Datos
# Dataset donde cada fila es un par (query, documento)
# con dos características: el score de BM25 y el score del Cross-Encoder.

features_list = []
query_ids = list(rerank_results.keys())

print("Extrayendo características para LTR...")

for qid in query_ids:
    for doc_id in rerank_results[qid]:
        # Característica 1: Score BM25 (Normalizado o directo)
        bm25_score = results[qid].get(doc_id, 0)
        # Característica 2: Score Cross-Encoder
        ce_score = rerank_results[qid].get(doc_id, 0)

        # Etiqueta (Target): Relevancia real de qrels (1 si es relevante, 0 si no)
        label = qrels.get(qid, {}).get(doc_id, 0)

        features_list.append({
            "query_id": qid,
            "doc_id": doc_id,
            "bm25_score": bm25_score,
            "ce_score": ce_score,
            "label": label
        })

df_ltr = pd.DataFrame(features_list)

# 2. Entrenamiento del Modelo LTR
# Usaremos el 70% de las queries para entrenar y el resto para evaluar el re-ranking
unique_queries = df_ltr["query_id"].unique()
train_queries = unique_queries[:int(len(unique_queries)*0.7)]

train_data = df_ltr[df_ltr["query_id"].isin(train_queries)]
X_train = train_data[["bm25_score", "ce_score"]]
y_train = train_data["label"]

# Entrenamos un Regresor para predecir la probabilidad de relevancia
model_ltr = RandomForestRegressor(n_estimators=100, random_state=42)
model_ltr.fit(X_train, y_train)

# 3. Generación de Resultados LTR
ltr_results = {}
X_all = df_ltr[["bm25_score", "ce_score"]]
df_ltr["ltr_score"] = model_ltr.predict(X_all)

for qid in unique_queries:
    query_group = df_ltr[df_ltr["query_id"] == qid]
    ltr_results[qid] = dict(zip(query_group["doc_id"], query_group["ltr_score"]))

print("Re-ranking LTR completado.")

Extrayendo características para LTR...
Re-ranking LTR completado.


###Identificar cambios en el Top 10

In [18]:
# Analizar una query específica
target_qid = "873"

def get_top_ids(res_dict, qid, n=10):
    return [item[0] for item in sorted(res_dict[qid].items(), key=lambda x: x[1], reverse=True)[:n]]

top10_bm25 = get_top_ids(results, target_qid)
top10_ce = get_top_ids(rerank_results, target_qid)
top10_ltr = get_top_ids(ltr_results, target_qid)

print(f"--- Comparativa de Rankings (Top 10) - Query {target_qid} ---")
print(f"{'Pos':<4} | {'BM25':<12} | {'Cross-Encoder':<15} | {'LTR (Final)':<12}")
print("-" * 55)
for i in range(10):
    print(f"{i+1:<4} | {top10_bm25[i]:<12} | {top10_ce[i]:<15} | {top10_ltr[i]:<12}")

# Documentos que cambiaron entre CE y LTR
cambios = set(top10_ltr) - set(top10_ce)
print(f"\nDocumentos nuevos en Top 10 tras LTR: {cambios if cambios else 'Ninguno'}")

--- Comparativa de Rankings (Top 10) - Query 873 ---
Pos  | BM25         | Cross-Encoder   | LTR (Final) 
-------------------------------------------------------
1    | 13441037     | 41310252        | 12658073    
2    | 12658073     | 12658073        | 13441037    
3    | 38794814     | 4345757         | 24625323    
4    | 25973484     | 52805891        | 21641088    
5    | 21874312     | 21641088        | 38794814    
6    | 4345757      | 5268462         | 25973484    
7    | 52805891     | 29947146        | 21874312    
8    | 10365749     | 13944805        | 4345757     
9    | 41310252     | 25973484        | 52805891    
10   | 14367469     | 27686445        | 10365749    

Documentos nuevos en Top 10 tras LTR: {'21874312', '24625323', '13441037', '10365749', '38794814'}


## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

In [19]:
# 1. Evaluar los resultados del Cross-Encoder (Parte 3)
retriever_ce = EvaluateRetrieval(None, score_function="dot")
ndcg_ce, _map_ce, recall_ce, _ = retriever_ce.evaluate(qrels, rerank_results, [1, 5, 10])

# 2. Evaluar los resultados del LTR (Parte 4)
retriever_ltr = EvaluateRetrieval(None, score_function="dot")
ndcg_ltr, _map_ltr, recall_ltr, _ = retriever_ltr.evaluate(qrels, ltr_results, [1, 5, 10])

# 3. Tabla Comparativa Final
print(f"{'Métrica':<15} | {'BM25 (Base)':<12} | {'Cross-Encoder':<15} | {'LTR (Final)':<12}")
print("-" * 65)

# Mostramos nDCG@10
print(f"{'nDCG@10':<15} | {ndcg['NDCG@10']:<12.4f} | {ndcg_ce['NDCG@10']:<15.4f} | {ndcg_ltr['NDCG@10']:<12.4f}")

# Mostramos MAP@10
print(f"{'MAP@10':<15} | {_map['MAP@10']:<12.4f} | {_map_ce['MAP@10']:<15.4f} | {_map_ltr['MAP@10']:<12.4f}")

# Mostramos Recall@10
print(f"{'Recall@10':<15} | {recall['Recall@10']:<12.4f} | {recall_ce['Recall@10']:<15.4f} | {recall_ltr['Recall@10']:<12.4f}")

Métrica         | BM25 (Base)  | Cross-Encoder   | LTR (Final) 
-----------------------------------------------------------------
nDCG@10         | 0.5597       | 0.6509          | 0.7523      
MAP@10          | 0.5147       | 0.6134          | 0.7394      
Recall@10       | 0.6862       | 0.7496          | 0.7796      
