# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [1]:
!pip install beir

Collecting beir
  Downloading beir-2.2.0-py3-none-any.whl.metadata (28 kB)
Collecting pytrec-eval-terrier (from beir)
  Downloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Downloading beir-2.2.0-py3-none-any.whl (77 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (304 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m304.8/304.8 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytrec-eval-terrier, beir
Successfully installed beir-2.2.0 pytrec-eval-terrier-0.5.10


In [2]:
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

  from tqdm.autonotebook import tqdm


In [3]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

../data/beir_datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

'../data/beir_datasets/scifact'

In [4]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

  0%|          | 0/5183 [00:00<?, ?it/s]

In [5]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [6]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [7]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [8]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

In [9]:
!pip install rank_bm25 nltk

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [10]:
import pandas as pd
from rank_bm25 import BM25Okapi
import nltk
from nltk.corpus import stopwords
from nltk.stem import SnowballStemmer
import re

nltk.download('punkt_tab')
nltk.download('stopwords')

stemmer = SnowballStemmer('english')
stop_words = set(stopwords.words('english'))

def clean_tokenize(text):
    # Limpieza básica: minúsculas y quitar caracteres no alfanuméricos
    text = re.sub(r'[^\w\s]', '', str(text).lower())
    # Tokenización y Stemming (solo si no es stopword)
    tokens = [stemmer.stem(w) for w in text.split() if w not in stop_words]
    return tokens

# Aplicamos al corpus
df_corpus['tokenized_text'] = df_corpus['text'].apply(clean_tokenize)
bm25 = BM25Okapi(df_corpus['tokenized_text'].tolist())

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [11]:
results_list = []

for _, q_row in df_queries.iterrows():
    q_id = q_row['query_id']
    q_text = q_row['query']

    tokenized_query = clean_tokenize(q_text)
    scores = bm25.get_scores(tokenized_query)

    # Obtenemos los índices de los top 10
    top_n = 10
    top_indices = scores.argsort()[-top_n:][::-1]

    for rank, idx in enumerate(top_indices):
        results_list.append({
            'query_id': q_id,
            'doc_id': df_corpus.iloc[idx]['doc_id'],
            'score': scores[idx],
            'rank': rank + 1
        })

df_results = pd.DataFrame(results_list)

Ahora se procede a evaluar las metricas Recall@10 y nDCG@10

In [12]:
!pip install ir_measures

Collecting ir_measures
  Downloading ir_measures-0.4.3-py3-none-any.whl.metadata (7.0 kB)
Downloading ir_measures-0.4.3-py3-none-any.whl (61 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/61.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ir_measures
Successfully installed ir_measures-0.4.3


In [13]:
import ir_measures
from ir_measures import read_trec_run, nDCG, Recall

# 1. Adaptar formatos (asegúrate de que los IDs sean strings)
df_qrels = df_qrels.astype({'query_id': str, 'doc_id': str})
df_results = df_results.astype({'query_id': str, 'doc_id': str})

# 2. Calcular métricas
metrics = ir_measures.calc_aggregate([Recall@10, nDCG@10], df_qrels, df_results)

print("--- Resultados de la Evaluación ---")
for metric, value in metrics.items():
    print(f"{metric}: {value:.4f}")

--- Resultados de la Evaluación ---
nDCG@10: 0.6720
R@10: 0.8030


## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [14]:
!pip install sentence-transformers



In [15]:
from sentence_transformers import CrossEncoder
import numpy as np

# Cargamos el modelo (puedes elegir uno específico para tu idioma)
model_re_ranker = CrossEncoder('BAAI/bge-reranker-base')

# Supongamos que re-rankeamos los top 50 de BM25 para mejorar el top 10 final
top_k_bm25 = df_results.copy()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

Proceso de Reranking

In [16]:
reranked_data = []

# Agrupamos los resultados de BM25 por query para procesarlos
for query_id, group in top_k_bm25.groupby('query_id'):
    query_text = df_queries.loc[df_queries['query_id'] == query_id, 'query'].values[0]

    # Obtenemos los textos de los documentos candidatos (vía df_corpus)
    doc_ids = group['doc_id'].tolist()
    doc_texts = df_corpus[df_corpus['doc_id'].isin(doc_ids)].set_index('doc_id').loc[doc_ids, 'text'].tolist()

    # Preparamos los pares para el Cross-Encoder
    sentence_pairs = [[query_text, doc] for doc in doc_texts]

    # Predicción de scores de relevancia
    cross_scores = model_re_ranker.predict(sentence_pairs)

    # Crear nuevos resultados
    for i in range(len(doc_ids)):
        reranked_data.append({
            'query_id': query_id,
            'doc_id': doc_ids[i],
            'bm25_rank': group.iloc[i]['rank'], # Guardamos el rango anterior
            'cross_score': cross_scores[i]
        })

df_reranked = pd.DataFrame(reranked_data)

# Ordenar por el nuevo score y asignar nuevo rank
df_reranked = df_reranked.sort_values(by=['query_id', 'cross_score'], ascending=[True, False])
df_reranked['reranked_rank'] = df_reranked.groupby('query_id').cumcount() + 1

Se identifica cambios de posición en el Top 10

In [17]:
# Filtramos solo el nuevo Top 10
top10_changes = df_reranked[df_reranked['reranked_rank'] <= 10].copy()

# Calculamos el desplazamiento
top10_changes['shift'] = top10_changes['bm25_rank'] - top10_changes['reranked_rank']

def describe_change(row):
    if row['shift'] > 0: return f"Subió {int(row['shift'])} posiciones"
    if row['shift'] < 0: return f"Bajó {int(abs(row['shift']))} posiciones"
    return "Sin cambios"

top10_changes['status'] = top10_changes.apply(describe_change, axis=1)

# Mostrar ejemplos donde hubo movimiento
print(top10_changes[top10_changes['shift'] != 0][['query_id', 'doc_id', 'bm25_rank', 'reranked_rank', 'status']].head())

  query_id    doc_id  bm25_rank  reranked_rank              status
6        1  13231899          7              1  Subió 6 posiciones
0        1  21257564          1              2   Bajó 1 posiciones
4        1  36480032          5              3  Subió 2 posiciones
1        1  18953920          2              4   Bajó 2 posiciones
7        1  26071782          8              5  Subió 3 posiciones


## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

Preparación de Features

In [20]:
import lightgbm as lgb
import pandas as pd
import numpy as np

# 1. Asegurémonos de rescatar el score de BM25 original
# Creamos un dataframe limpio con los scores de BM25
df_bm25_scores = df_results[['query_id', 'doc_id', 'score']].rename(columns={'score': 'score_bm25'})

# 2. Unimos con los resultados del Cross-Encoder (df_reranked)
# Usamos 'inner' para asegurarnos de tener ambos scores para cada par query-doc
df_ltr = pd.merge(
    df_reranked,
    df_bm25_scores,
    on=['query_id', 'doc_id'],
    how='inner'
)

# 3. Unimos con las etiquetas reales (qrels) para el entrenamiento
df_ltr = pd.merge(df_ltr, df_qrels, on=['query_id', 'doc_id'], how='left')
df_ltr['relevance'] = df_ltr['relevance'].fillna(0)  # Los no encontrados en qrels son 0

# 4. Ordenar por query_id es CRÍTICO para LTR
df_ltr = df_ltr.sort_values('query_id')

# 5. Ahora definimos las features con los nombres correctos
# 'score_bm25' viene de BM25 y 'cross_score' viene del Cross-Encoder
features = ['score_bm25', 'cross_score']
X = df_ltr[features]
y = df_ltr['relevance']
groups = df_ltr.groupby('query_id').size().to_list()

Entrenamiento e Identificación de Cambios

In [21]:
# Entrenar el Ranker
ranker = lgb.LGBMRanker(
    objective="lambdarank",
    metric="ndcg",
    n_estimators=100,
    learning_rate=0.1,
    random_state=42
)

ranker.fit(X, y, group=groups)

# Generar el score final de LTR
df_ltr['ltr_score'] = ranker.predict(X)

# Ranking final
df_ltr = df_ltr.sort_values(by=['query_id', 'ltr_score'], ascending=[True, False])
df_ltr['ltr_rank'] = df_ltr.groupby('query_id').cumcount() + 1

# --- IDENTIFICAR CAMBIOS EN TOP 10 ---
# Comparamos el rank inicial (BM25) con el final (LTR)
top10_final = df_ltr[df_ltr['ltr_rank'] <= 10].copy()
top10_final['change'] = top10_final['bm25_rank'] - top10_final['ltr_rank']

print("Resumen de movimientos (Documentos que más subieron gracias a LTR):")
print(top10_final[['query_id', 'doc_id', 'bm25_rank', 'ltr_rank', 'change']].sort_values('change', ascending=False).head(10))

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000177 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 510
[LightGBM] [Info] Number of data points in the train set: 3000, number of used features: 2
Resumen de movimientos (Documentos que más subieron gracias a LTR):
     query_id    doc_id  bm25_rank  ltr_rank  change
1344      294  27134931         10         1       9
233      1132  33499189         10         1       9
1241      238   6820680         10         1       9
2004      554   1049501         10         1       9
32       1014   1227277         10         2       8
2573      805   7343711         10         2       8
2682      837  32721137         10         2       8
751       130   6477536         10         2       8
981      1389  13002003         10         2       8
2114      597  12779444         10         2       8


## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

Preparación de los datos para la Evaluación

In [22]:
import ir_measures
from ir_measures import nDCG, MAP, Recall

# 1. Aseguramos que los IDs sean strings para evitar errores de comparación
df_qrels_eval = df_qrels.astype({'query_id': str, 'doc_id': str})

# 2. Preparamos el DataFrame de LTR (el ranking final)
# Renombramos 'ltr_score' a 'score' porque es lo que busca la librería
df_ltr_eval = df_ltr[['query_id', 'doc_id', 'ltr_score']].copy()
df_ltr_eval.columns = ['query_id', 'doc_id', 'score']
df_ltr_eval = df_ltr_eval.astype({'query_id': str, 'doc_id': str})

# 3. Preparamos también el ranking de Cross-Encoder y BM25 para comparar
df_ce_eval = df_reranked[['query_id', 'doc_id', 'cross_score']].copy()
df_ce_eval.columns = ['query_id', 'doc_id', 'score']
df_ce_eval = df_ce_eval.astype({'query_id': str, 'doc_id': str})

df_bm25_eval = df_results[['query_id', 'doc_id', 'score']].copy()
df_bm25_eval = df_bm25_eval.astype({'query_id': str, 'doc_id': str})

Cálculo de Métricas Comparativas

In [23]:
# Definimos las métricas deseadas
metrics_to_run = [nDCG@10, MAP, Recall@10]

# Calculamos para cada etapa
results_bm25 = ir_measures.calc_aggregate(metrics_to_run, df_qrels_eval, df_bm25_eval)
results_ce   = ir_measures.calc_aggregate(metrics_to_run, df_qrels_eval, df_ce_eval)
results_ltr  = ir_measures.calc_aggregate(metrics_to_run, df_qrels_eval, df_ltr_eval)

# Formateamos los resultados en una tabla comparativa
eval_df = pd.DataFrame([results_bm25, results_ce, results_ltr],
                       index=['BM25 (Base)', 'Cross-Encoder (Re-rank)', 'LTR (Final)'])

print("--- REPORTE FINAL DE MÉTRICAS ---")
print(eval_df.round(4))

--- REPORTE FINAL DE MÉTRICAS ---
                         nDCG@10      AP   R@10
BM25 (Base)               0.6720  0.6257  0.803
Cross-Encoder (Re-rank)   0.6891  0.6468  0.803
LTR (Final)               0.8061  0.8016  0.803
