# Ejercicio 10: Re-ranking

#Nombre: Nelson Casa
**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [None]:
import pandas as pd
!pip install beir
from beir import util
from beir.datasets.data_loader import GenericDataLoader

In [9]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

'../data/beir_datasets/scifact'

In [None]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

In [11]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [12]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [13]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [14]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

BM25 es un modelo clásico de recuperación de información basado en la frecuencia de términos.
En esta etapa:

* Usamos BM25 para recuperar los top-10 documentos por query.
* Evaluamos qué tan bien recupera documentos relevantes usando métricas estándar de IR.

Instalar dependencias

In [19]:
!pip install rank-bm25

Collecting rank-bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank-bm25
Successfully installed rank-bm25-0.2.2


Preparar corpus para BM25

In [20]:
from rank_bm25 import BM25Okapi
import numpy as np

# Tokenizamos el corpus
tokenized_corpus = [
    (doc["title"] + " " + doc["text"]).lower().split()
    for doc in corpus.values()
]

doc_ids = list(corpus.keys())

bm25 = BM25Okapi(tokenized_corpus)

Retrieval BM25 (top 10)

In [21]:
def bm25_retrieve(query, top_k=10):
    tokenized_query = query.lower().split()
    scores = bm25.get_scores(tokenized_query)
    top_indices = np.argsort(scores)[::-1][:top_k]

    return {
        doc_ids[i]: float(scores[i])
        for i in top_indices
    }

# Ejecutamos retrieval para todas las queries
results_bm25 = {
    qid: bm25_retrieve(query, top_k=10)
    for qid, query in queries.items()
}

Evaluación con BEIR (Recall@10, nDCG@10)

In [23]:
from beir.retrieval.evaluation import EvaluateRetrieval

evaluator = EvaluateRetrieval()

k_values = [10]

ndcg, _map, recall, precision = evaluator.evaluate(
    qrels,
    results_bm25,
    k_values
)

print("=== Métricas BM25 (Baseline) ===")
print(f"Recall@10: {recall['Recall@10']:.4f}")
print(f"nDCG@10: {ndcg['NDCG@10']:.4f}")

=== Métricas BM25 (Baseline) ===
Recall@10: 0.6862
nDCG@10: 0.5597


Inspección para la query 133

In [24]:
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nTop 10 documentos recuperados por BM25:")

for rank, (doc_id, score) in enumerate(results_bm25[qid].items(), start=1):
    title = df_corpus.loc[df_corpus["doc_id"] == doc_id, "title"].values[0]
    print(f"{rank}. DocID: {doc_id} | Score: {score:.4f}")
    print(f"   Title: {title}\n")

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Top 10 documentos recuperados por BM25:
1. DocID: 26688294 | Score: 55.1214
   Title: Schizophrenia susceptibility pathway neuregulin 1–ErbB4 suppresses Src upregulation of NMDA receptors

2. DocID: 9507605 | Score: 50.4710
   Title: Focal contacts as mechanosensors: externally applied local mechanical force induces growth of focal contacts by an mDia1-dependent and ROCKindependent mechanism

3. DocID: 37964706 | Score: 49.9029
   Title: Local Ca2+ influx through Ca2+ release-activated Ca2+ (CRAC) channels stimulates production of an intracellular messenger and an intercellular pro-inflammatory signal.

4. DocID: 5270265 | Score: 46.2733
   Title: Combating trastuzumab resistance by targeting SRC, a common node downstream of multiple resistance pathways

5. DocID: 12785130 | Score: 46.2202
   Title: The regulation of N-methy

## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

Instalar e importar el Cross-Encoder

In [None]:
!pip install sentence-transformers
from sentence_transformers import CrossEncoder

Cargar el modelo Cross-Encoder

In [None]:
cross_encoder = CrossEncoder(
    "cross-encoder/ms-marco-MiniLM-L-6-v2"
)

Función de re-ranking con Cross-Encoder

Esta función:
* Toma los top-k documentos de BM25
* Calcula scores con el cross-encoder
* Devuelve los documentos reordenados

In [28]:
def cross_encoder_rerank(query, doc_ids, corpus):
    pairs = []
    for doc_id in doc_ids:
        doc_text = corpus[doc_id]["title"] + " " + corpus[doc_id]["text"]
        pairs.append((query, doc_text))

    scores = cross_encoder.predict(pairs)

    reranked = sorted(
        zip(doc_ids, scores),
        key=lambda x: x[1],
        reverse=True
    )

    return reranked

Re-ranking del top-10 para una query específica

Usamos la misma query 133 para poder comparar claramente.

In [29]:
qid = "133"
query_text = queries[qid]

# Top 10 documentos del BM25
bm25_top_docs = list(results_bm25[qid].keys())

reranked_docs = cross_encoder_rerank(
    query_text,
    bm25_top_docs,
    corpus
)

Comparación de rankings (antes vs después)

In [30]:
print("=== Ranking BM25 (Baseline) ===")
for i, doc_id in enumerate(bm25_top_docs, start=1):
    print(f"{i}. DocID: {doc_id}")

print("\n=== Ranking Cross-Encoder (Re-ranking) ===")
for i, (doc_id, score) in enumerate(reranked_docs, start=1):
    print(f"{i}. DocID: {doc_id} | Score: {score:.4f}")

=== Ranking BM25 (Baseline) ===
1. DocID: 26688294
2. DocID: 9507605
3. DocID: 37964706
4. DocID: 5270265
5. DocID: 12785130
6. DocID: 12640810
7. DocID: 30861948
8. DocID: 86694016
9. DocID: 17934082
10. DocID: 6969753

=== Ranking Cross-Encoder (Re-ranking) ===
1. DocID: 12640810 | Score: 0.3597
2. DocID: 6969753 | Score: -1.7439
3. DocID: 9507605 | Score: -2.6066
4. DocID: 86694016 | Score: -2.6977
5. DocID: 17934082 | Score: -3.7946
6. DocID: 37964706 | Score: -6.6683
7. DocID: 12785130 | Score: -9.5063
8. DocID: 26688294 | Score: -9.8987
9. DocID: 30861948 | Score: -10.0028
10. DocID: 5270265 | Score: -10.4464


Análisis de cambios de posición

In [31]:
print("=== Cambios de posición en el Top 10 ===")

for doc_id, _ in reranked_docs:
    old_rank = bm25_top_docs.index(doc_id) + 1
    new_rank = [d for d, _ in reranked_docs].index(doc_id) + 1

    if old_rank != new_rank:
        print(
            f"DocID {doc_id}: "
            f"BM25 rank = {old_rank} → Cross-Encoder rank = {new_rank}"
        )

=== Cambios de posición en el Top 10 ===
DocID 12640810: BM25 rank = 6 → Cross-Encoder rank = 1
DocID 6969753: BM25 rank = 10 → Cross-Encoder rank = 2
DocID 9507605: BM25 rank = 2 → Cross-Encoder rank = 3
DocID 86694016: BM25 rank = 8 → Cross-Encoder rank = 4
DocID 17934082: BM25 rank = 9 → Cross-Encoder rank = 5
DocID 37964706: BM25 rank = 3 → Cross-Encoder rank = 6
DocID 12785130: BM25 rank = 5 → Cross-Encoder rank = 7
DocID 26688294: BM25 rank = 1 → Cross-Encoder rank = 8
DocID 30861948: BM25 rank = 7 → Cross-Encoder rank = 9
DocID 5270265: BM25 rank = 4 → Cross-Encoder rank = 10


##**Conclusión de la Parte 3**

Se aplicó un modelo Cross-Encoder para re-rankear los candidatos recuperados por BM25.

Se observaron cambios en el orden de los documentos dentro del top 10, evidenciando
que el re-ranking semántico puede mejorar la calidad del ranking inicial.

## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

Importar librerías para LTR

In [32]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import numpy as np

Construcción de features LTR

Usaremos features simples pero válidas:

* Score BM25
* Longitud del documento
* Overlap de términos entre query y documento

In [33]:
def extract_features(query, doc_id, bm25_score, corpus):
    doc_text = corpus[doc_id]["title"] + " " + corpus[doc_id]["text"]

    query_terms = set(query.lower().split())
    doc_terms = set(doc_text.lower().split())

    term_overlap = len(query_terms & doc_terms)
    doc_length = len(doc_text.split())

    return [
        bm25_score,
        term_overlap,
        doc_length
    ]

Preparar dataset de entrenamiento LTR

Creamos pares (query, documento) usando qrels como etiquetas.

In [34]:
X = []
y = []

for qid, docs in results_bm25.items():
    query_text = queries[qid]

    for doc_id, bm25_score in docs.items():
        features = extract_features(
            query_text,
            doc_id,
            bm25_score,
            corpus
        )

        relevance = qrels.get(qid, {}).get(doc_id, 0)

        X.append(features)
        y.append(relevance)

X = np.array(X)
y = np.array(y)

Entrenar el modelo LTR

In [35]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

ltr_model = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)

ltr_model.fit(X_train, y_train)

Re-ranking LTR para una query específica

Usamos nuevamente la query 133 para mantener consistencia.

In [36]:
qid = "133"
query_text = queries[qid]

bm25_docs = results_bm25[qid]

ltr_scores = []

for doc_id, bm25_score in bm25_docs.items():
    features = extract_features(
        query_text,
        doc_id,
        bm25_score,
        corpus
    )

    score = ltr_model.predict([features])[0]
    ltr_scores.append((doc_id, score))

ltr_reranked = sorted(
    ltr_scores,
    key=lambda x: x[1],
    reverse=True
)

Comparación BM25 vs LTR (Top 10)

In [37]:
print("=== Ranking BM25 (Baseline) ===")
for i, doc_id in enumerate(bm25_docs.keys(), start=1):
    print(f"{i}. DocID: {doc_id}")

print("\n=== Ranking LTR (Re-ranking) ===")
for i, (doc_id, score) in enumerate(ltr_reranked, start=1):
    print(f"{i}. DocID: {doc_id} | Score: {score:.4f}")

=== Ranking BM25 (Baseline) ===
1. DocID: 26688294
2. DocID: 9507605
3. DocID: 37964706
4. DocID: 5270265
5. DocID: 12785130
6. DocID: 12640810
7. DocID: 30861948
8. DocID: 86694016
9. DocID: 17934082
10. DocID: 6969753

=== Ranking LTR (Re-ranking) ===
1. DocID: 12640810 | Score: 0.7700
2. DocID: 6969753 | Score: 0.6100
3. DocID: 30861948 | Score: 0.3600
4. DocID: 86694016 | Score: 0.2700
5. DocID: 5270265 | Score: 0.2600
6. DocID: 12785130 | Score: 0.1600
7. DocID: 17934082 | Score: 0.1100
8. DocID: 37964706 | Score: 0.0500
9. DocID: 26688294 | Score: 0.0200
10. DocID: 9507605 | Score: 0.0100


Identificar cambios de posición

In [38]:
print("=== Cambios de posición en el Top 10 (LTR) ===")

bm25_rank = list(bm25_docs.keys())
ltr_rank = [doc_id for doc_id, _ in ltr_reranked]

for doc_id in ltr_rank:
    old_rank = bm25_rank.index(doc_id) + 1
    new_rank = ltr_rank.index(doc_id) + 1

    if old_rank != new_rank:
        print(
            f"DocID {doc_id}: "
            f"BM25 rank = {old_rank} → LTR rank = {new_rank}"
        )

=== Cambios de posición en el Top 10 (LTR) ===
DocID 12640810: BM25 rank = 6 → LTR rank = 1
DocID 6969753: BM25 rank = 10 → LTR rank = 2
DocID 30861948: BM25 rank = 7 → LTR rank = 3
DocID 86694016: BM25 rank = 8 → LTR rank = 4
DocID 5270265: BM25 rank = 4 → LTR rank = 5
DocID 12785130: BM25 rank = 5 → LTR rank = 6
DocID 17934082: BM25 rank = 9 → LTR rank = 7
DocID 37964706: BM25 rank = 3 → LTR rank = 8
DocID 26688294: BM25 rank = 1 → LTR rank = 9
DocID 9507605: BM25 rank = 2 → LTR rank = 10


## **Conclusión de la Parte 4**
Se implementó un modelo Learning to Rank basado en Random Forest, utilizando
features léxicas y estructurales. El re-ranking mostró cambios en el orden
de los documentos dentro del top 10, evidenciando la capacidad del modelo
para refinar el ranking inicial.

## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

Preparar resultados Cross-Encoder

In [40]:
# Limitamos el número de queries para evaluación
MAX_QUERIES = 30

results_ce = {}

for qid in list(results_bm25.keys())[:MAX_QUERIES]:
    query_text = queries[qid]
    docs = results_bm25[qid]
    doc_ids = list(docs.keys())

    reranked = cross_encoder_rerank(
        query_text,
        doc_ids,
        corpus
    )

    results_ce[qid] = {
        doc_id: float(score)
        for doc_id, score in reranked[:10]
    }

In [41]:
qrels_ce = {
    qid: qrels[qid]
    for qid in results_ce.keys()
}

Evaluación Cross-Encoder

In [42]:
ndcg_ce, map_ce, recall_ce, _ = evaluator.evaluate(
    qrels_ce,
    results_ce,
    k_values
)

Preparar resultados LTR para la evaluación

Usamos las mismas queries del Cross-Encoder (results_ce.keys()).

In [43]:
results_ltr_eval = {}

for qid in results_ce.keys():
    query_text = queries[qid]
    docs = results_bm25[qid]

    scores = []
    for doc_id, bm25_score in docs.items():
        features = extract_features(
            query_text,
            doc_id,
            bm25_score,
            corpus
        )
        score = ltr_model.predict([features])[0]
        scores.append((doc_id, score))

    reranked = sorted(scores, key=lambda x: x[1], reverse=True)

    results_ltr_eval[qid] = {
        doc_id: float(score)
        for doc_id, score in reranked[:10]
    }

Evaluación LTR

Reutilizamos las mismas qrels (qrels_ce).

In [44]:
ndcg_ltr, map_ltr, recall_ltr, _ = evaluator.evaluate(
    qrels_ce,
    results_ltr_eval,
    k_values
)

Evaluación BM25

BM25 debe evaluarse sobre el mismo subconjunto.

In [45]:
results_bm25_eval = {
    qid: results_bm25[qid]
    for qid in results_ce.keys()
}

ndcg_bm25, map_bm25, recall_bm25, _ = evaluator.evaluate(
    qrels_ce,
    results_bm25_eval,
    k_values
)

Tabla comparativa

In [46]:
import pandas as pd

df_final = pd.DataFrame({
    "Modelo": ["BM25", "Cross-Encoder", "LTR"],
    "nDCG@10": [
        ndcg_bm25["NDCG@10"],
        ndcg_ce["NDCG@10"],
        ndcg_ltr["NDCG@10"]
    ],
    "MAP@10": [
        map_bm25["MAP@10"],
        map_ce["MAP@10"],
        map_ltr["MAP@10"]
    ],
    "Recall@10": [
        recall_bm25["Recall@10"],
        recall_ce["Recall@10"],
        recall_ltr["Recall@10"]
    ]
})

df_final

Unnamed: 0,Modelo,nDCG@10,MAP@10,Recall@10
0,BM25,0.62816,0.59348,0.72
1,Cross-Encoder,0.67747,0.65645,0.72
2,LTR,0.68914,0.66897,0.72


Conclusión

Los resultados muestran que los métodos de re-ranking mejoran la calidad del
ranking respecto al baseline BM25. El Cross-Encoder incrementa los valores de
nDCG@10 y MAP al capturar relaciones semánticas entre la query y el documento,
mientras que el enfoque Learning to Rank (LTR) obtiene el mejor desempeño
global, alcanzando los valores más altos en nDCG@10 y MAP.

El Recall@10 se mantiene constante en todos los métodos, lo que indica que el
re-ranking no aumenta la cantidad de documentos relevantes recuperados, sino
que mejora su posición dentro del ranking. En conjunto, los resultados
confirman el impacto positivo del re-ranking, siendo LTR el enfoque más
efectivo en este experimento.
