<a href="https://colab.research.google.com/github/RikiGL/Deber-Recuperacion-Informacion/blob/main/ejercicio_10_reranking_rikiguallichico.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperaci√≥n de Informaci√≥n en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparaci√≥n del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [3]:
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

  from tqdm.autonotebook import tqdm


In [None]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

In [None]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

In [6]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [7]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [8]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [9]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener m√©tricas: Recall@10 nDCG@10

In [None]:
!pip install rank-bm25
!pip install nltk

In [11]:
import numpy as np
from rank_bm25 import BM25Okapi
from nltk.tokenize import word_tokenize
import nltk

# Descargar TODOS los recursos necesarios de NLTK
nltk.download('punkt')
nltk.download('punkt_tab')  # Este es el recurso que falta
nltk.download('averaged_perceptron_tagger')  # √ötil para algunos tokenizadores

# Preparar documentos tokenizados
corpus_docs = []
corpus_ids = []
for doc_id, doc_info in corpus.items():
    # Usamos el texto del documento (puedes usar "title" + " " + "text")
    text = doc_info.get("text", "")
    tokens = word_tokenize(text.lower())
    corpus_docs.append(tokens)
    corpus_ids.append(doc_id)

# Crear √≠ndice BM25
bm25 = BM25Okapi(corpus_docs)
print(f"√çndice BM25 creado con {len(corpus_docs)} documentos")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


√çndice BM25 creado con 5183 documentos


In [12]:
def retrieve_bm25(query, bm25_index, doc_ids, k=100):
    """Recupera top-k documentos para una query usando BM25"""
    query_tokens = word_tokenize(query.lower())
    scores = bm25_index.get_scores(query_tokens)

    # Obtener √≠ndices de los top-k documentos
    top_indices = np.argsort(scores)[::-1][:k]

    # Devolver resultados: [(doc_id, score), ...]
    results = [(doc_ids[idx], scores[idx]) for idx in top_indices]
    return results

In [13]:
from sklearn.metrics import ndcg_score

# Diccionario para guardar resultados
bm25_results = {}

# Evaluar para todas las queries
recalls_at_10 = []
ndcgs_at_10 = []

for qid, query_text in queries.items():
    # Recuperar top-100 documentos con BM25
    results = retrieve_bm25(query_text, bm25, corpus_ids, k=100)
    bm25_results[qid] = results

    # Obtener relevancias verdaderas para esta query
    true_relevance = {}
    if qid in qrels:
        true_relevance = qrels[qid]

    # Calcular Recall@10
    top_10_docs = [doc_id for doc_id, _ in results[:10]]
    relevant_docs = [doc_id for doc_id, rel in true_relevance.items() if rel > 0]

    if relevant_docs:  # Solo calcular si hay documentos relevantes
        # Recall@10
        retrieved_relevant = set(top_10_docs) & set(relevant_docs)
        recall_at_10 = len(retrieved_relevant) / len(relevant_docs)
        recalls_at_10.append(recall_at_10)

        # nDCG@10
        # Crear vector de relevancias ideales y obtenidas
        ideal_relevance = []
        actual_relevance = []

        # Para nDCG necesitamos los scores de relevancia
        for doc_id in top_10_docs:
            actual_rel = true_relevance.get(doc_id, 0)
            actual_relevance.append(actual_rel)

        # Orden ideal (por relevancia descendente)
        ideal_sorted = sorted([true_relevance.get(doc, 0) for doc in relevant_docs], reverse=True)
        ideal_relevance = ideal_sorted[:10]  # Tomar top-10 ideal

        # Si ideal_relevance es m√°s corto que 10, rellenar con ceros
        while len(ideal_relevance) < 10:
            ideal_relevance.append(0)

        while len(actual_relevance) < 10:
            actual_relevance.append(0)

        # Calcular nDCG
        try:
            ndcg = ndcg_score([ideal_relevance], [actual_relevance], k=10)
            ndcgs_at_10.append(ndcg)
        except:
            ndcgs_at_10.append(0.0)

# Calcular promedios
avg_recall_at_10 = np.mean(recalls_at_10) if recalls_at_10 else 0
avg_ndcg_at_10 = np.mean(ndcgs_at_10) if ndcgs_at_10 else 0

print("=== RESULTADOS BM25 (Retrieval Inicial) ===")
print(f"N√∫mero de queries evaluadas: {len(recalls_at_10)}")
print(f"Recall@10 promedio: {avg_recall_at_10:.4f}")
print(f"nDCG@10 promedio: {avg_ndcg_at_10:.4f}")

=== RESULTADOS BM25 (Retrieval Inicial) ===
N√∫mero de queries evaluadas: 300
Recall@10 promedio: 0.7342
nDCG@10 promedio: 0.7247


In [14]:
from sklearn.metrics import ndcg_score

# Diccionario para guardar resultados
bm25_results = {}

# Evaluar para todas las queries
recalls_at_10 = []
ndcgs_at_10 = []

for qid, query_text in queries.items():
    # Recuperar top-100 documentos con BM25
    results = retrieve_bm25(query_text, bm25, corpus_ids, k=100)
    bm25_results[qid] = results

    # Obtener relevancias verdaderas para esta query
    true_relevance = {}
    if qid in qrels:
        true_relevance = qrels[qid]

    # Calcular Recall@10
    top_10_docs = [doc_id for doc_id, _ in results[:10]]
    relevant_docs = [doc_id for doc_id, rel in true_relevance.items() if rel > 0]

    if relevant_docs:  # Solo calcular si hay documentos relevantes
        # Recall@10
        retrieved_relevant = set(top_10_docs) & set(relevant_docs)
        recall_at_10 = len(retrieved_relevant) / len(relevant_docs)
        recalls_at_10.append(recall_at_10)

        # nDCG@10
        # Crear vector de relevancias ideales y obtenidas
        ideal_relevance = []
        actual_relevance = []

        # Para nDCG necesitamos los scores de relevancia
        for doc_id in top_10_docs:
            actual_rel = true_relevance.get(doc_id, 0)
            actual_relevance.append(actual_rel)

        # Orden ideal (por relevancia descendente)
        ideal_sorted = sorted([true_relevance.get(doc, 0) for doc in relevant_docs], reverse=True)
        ideal_relevance = ideal_sorted[:10]  # Tomar top-10 ideal

        # Si ideal_relevance es m√°s corto que 10, rellenar con ceros
        while len(ideal_relevance) < 10:
            ideal_relevance.append(0)

        while len(actual_relevance) < 10:
            actual_relevance.append(0)

        # Calcular nDCG
        try:
            ndcg = ndcg_score([ideal_relevance], [actual_relevance], k=10)
            ndcgs_at_10.append(ndcg)
        except:
            ndcgs_at_10.append(0.0)

# Calcular promedios
avg_recall_at_10 = np.mean(recalls_at_10) if recalls_at_10 else 0
avg_ndcg_at_10 = np.mean(ndcgs_at_10) if ndcgs_at_10 else 0

print("=== RESULTADOS BM25 (Retrieval Inicial) ===")
print(f"N√∫mero de queries evaluadas: {len(recalls_at_10)}")
print(f"Recall@10 promedio: {avg_recall_at_10:.4f}")
print(f"nDCG@10 promedio: {avg_ndcg_at_10:.4f}")

=== RESULTADOS BM25 (Retrieval Inicial) ===
N√∫mero de queries evaluadas: 300
Recall@10 promedio: 0.7342
nDCG@10 promedio: 0.7247


In [15]:
# An√°lisis detallado para la query de ejemplo
qid = "133"
query_text = queries[qid]

print(f"Query {qid}: {query_text}")
print("\nTop-10 documentos recuperados por BM25:")

results = bm25_results[qid][:10]
for i, (doc_id, score) in enumerate(results, 1):
    relevance = qrels[qid].get(doc_id, 0)
    rel_str = f"Relevancia: {relevance}" if relevance > 0 else "No relevante"
    print(f"{i}. Doc {doc_id} - Score: {score:.4f} - {rel_str}")

# Calcular m√©tricas espec√≠ficas para esta query
top_10_docs = [doc_id for doc_id, _ in results]
relevant_docs = [doc_id for doc_id, rel in qrels[qid].items() if rel > 0]

retrieved_relevant = set(top_10_docs) & set(relevant_docs)
recall = len(retrieved_relevant) / len(relevant_docs)
print(f"\nRecall@10 para query {qid}: {recall:.2f} ({len(retrieved_relevant)}/{len(relevant_docs)})")

Query 133: Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Top-10 documentos recuperados por BM25:
1. Doc 5270265 - Score: 59.8531 - No relevante
2. Doc 26688294 - Score: 58.6757 - No relevante
3. Doc 45764440 - Score: 58.1495 - No relevante
4. Doc 12785130 - Score: 55.5299 - No relevante
5. Doc 37964706 - Score: 55.0283 - No relevante
6. Doc 9507605 - Score: 53.1311 - No relevante
7. Doc 35884026 - Score: 52.3605 - No relevante
8. Doc 5914739 - Score: 51.9492 - No relevante
9. Doc 10991183 - Score: 50.8288 - No relevante
10. Doc 86694016 - Score: 49.1914 - No relevante

Recall@10 para query 133: 0.00 (0/5)


## Parte 3. Implementaci√≥n del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qu√© documentos cambian de posici√≥n en el top 10

In [None]:
!pip install sentence-transformers

In [None]:
from sentence_transformers import CrossEncoder
import torch

# Cargar modelo cross-encoder (optimizado para re-ranking)
# Usamos un modelo peque√±o pero efectivo para BEIR
model_name = "cross-encoder/ms-marco-MiniLM-L-6-v2"
cross_encoder = CrossEncoder(model_name, max_length=512)

print(f"Modelo cross-encoder cargado: {model_name}")
print(f"Dispositivo: {cross_encoder.device}")

In [18]:
def rerank_with_cross_encoder(query, candidates, corpus_dict, cross_encoder_model, k=10):
    """
    Re-rankear candidatos usando cross-encoder

    Args:
        query: texto de la consulta
        candidates: lista de (doc_id, score_bm25) de la primera etapa
        corpus_dict: diccionario con documentos originales
        cross_encoder_model: modelo cross-encoder
        k: cu√°ntos documentos devolver despu√©s del re-ranking

    Returns:
        Lista de (doc_id, score_cross_encoder) ordenada descendente
    """
    # Preparar pares (query, documento) para el cross-encoder
    pairs = []
    candidate_docs = []

    for doc_id, _ in candidates:
        # Obtener texto del documento (puedes ajustar seg√∫n tu corpus)
        doc_text = corpus_dict[doc_id].get("text", "")
        # Crear par (query, documento)
        pairs.append((query, doc_text))
        candidate_docs.append(doc_id)

    # Predecir scores con cross-encoder
    # El cross-encoder califica directamente la relevancia del par
    scores = cross_encoder_model.predict(pairs, show_progress_bar=False)

    # Combinar doc_ids con nuevos scores
    reranked_results = list(zip(candidate_docs, scores))

    # Ordenar por score descendente (mayor score = m√°s relevante)
    reranked_results.sort(key=lambda x: x[1], reverse=True)

    return reranked_results[:k]

In [19]:
# Almacenar resultados re-rankeados
reranked_results = {}

print("Aplicando re-ranking con cross-encoder...")
for qid, query_text in queries.items():
    # Obtener candidatos de BM25 (top-100 para re-ranking)
    if qid in bm25_results:
        candidates = bm25_results[qid]

        # Aplicar re-ranking (top-10 despu√©s de re-ranking)
        reranked = rerank_with_cross_encoder(
            query_text,
            candidates,
            corpus,
            cross_encoder,
            k=10
        )

        reranked_results[qid] = reranked

        # Mostrar progreso cada 50 queries
        if len(reranked_results) % 50 == 0:
            print(f"  Procesadas {len(reranked_results)} queries")

print(f"Re-ranking completado para {len(reranked_results)} queries")

Aplicando re-ranking con cross-encoder...
  Procesadas 50 queries
  Procesadas 100 queries
  Procesadas 150 queries
  Procesadas 200 queries
  Procesadas 250 queries
  Procesadas 300 queries
Re-ranking completado para 300 queries


In [20]:
qid = "133"
query_text = queries[qid]

print(f"=== AN√ÅLISIS DE CAMBIOS DE POSICI√ìN - Query {qid} ===")
print(f"Query: {query_text}\n")

# Resultados BM25 (antes del re-ranking)
bm25_top10 = [(doc_id, score) for doc_id, score in bm25_results[qid][:10]]

# Resultados despu√©s del re-ranking
reranked_top10 = reranked_results[qid]

# Crear diccionarios para comparar posiciones
bm25_positions = {doc_id: i+1 for i, (doc_id, _) in enumerate(bm25_top10)}
reranked_positions = {doc_id: i+1 for i, (doc_id, _) in enumerate(reranked_top10)}

print("Top-10 ANTES del re-ranking (BM25):")
for i, (doc_id, score) in enumerate(bm25_top10, 1):
    relevance = qrels[qid].get(doc_id, 0)
    rel_str = f" [Relevancia: {relevance}]" if relevance > 0 else ""
    print(f"{i:2d}. Doc {doc_id} - Score BM25: {score:.4f}{rel_str}")

print("\n" + "="*60)

print("\nTop-10 DESPU√âS del re-ranking (Cross-Encoder):")
for i, (doc_id, score) in enumerate(reranked_top10, 1):
    relevance = qrels[qid].get(doc_id, 0)
    rel_str = f" [Relevancia: {relevance}]" if relevance > 0 else ""

    # Verificar cambio de posici√≥n
    old_pos = bm25_positions.get(doc_id, ">10")
    change = ""
    if doc_id in bm25_positions:
        pos_change = bm25_positions[doc_id] - i
        if pos_change > 0:
            change = f" ‚Üë Subi√≥ {pos_change} posiciones"
        elif pos_change < 0:
            change = f" ‚Üì Baj√≥ {abs(pos_change)} posiciones"
        else:
            change = " ‚Üí Sin cambio"
    else:
        change = " üÜï Nuevo en top-10"

    print(f"{i:2d}. Doc {doc_id} - Score CE: {score:.4f}{rel_str}{change}")

print("\n" + "="*60)

# An√°lisis de cambios
print("\n=== RESUMEN DE CAMBIOS ===")

# Documentos que entraron al top-10 despu√©s del re-ranking
new_in_top10 = set([doc_id for doc_id, _ in reranked_top10]) - set([doc_id for doc_id, _ in bm25_top10])
print(f"Documentos NUEVOS en top-10: {len(new_in_top10)}")
for doc_id in new_in_top10:
    old_pos = ">10"
    new_pos = reranked_positions[doc_id]
    print(f"  - Doc {doc_id}: Posici√≥n {old_pos} ‚Üí {new_pos}")

# Documentos que salieron del top-10
dropped_from_top10 = set([doc_id for doc_id, _ in bm25_top10]) - set([doc_id for doc_id, _ in reranked_top10])
print(f"\nDocumentos que SALIERON del top-10: {len(dropped_from_top10)}")
for doc_id in dropped_from_top10:
    old_pos = bm25_positions[doc_id]
    new_pos = ">10"
    print(f"  - Doc {doc_id}: Posici√≥n {old_pos} ‚Üí {new_pos}")

# Cambios de posici√≥n para documentos que permanecen en top-10
common_docs = set([doc_id for doc_id, _ in bm25_top10]) & set([doc_id for doc_id, _ in reranked_top10])
significant_changes = []
for doc_id in common_docs:
    old_pos = bm25_positions[doc_id]
    new_pos = reranked_positions[doc_id]
    change = old_pos - new_pos
    if change != 0:  # Solo mostrar si hubo cambio
        significant_changes.append((doc_id, old_pos, new_pos, change))

print(f"\nCambios de posici√≥n (documentos que permanecen en top-10): {len(significant_changes)}")
for doc_id, old_pos, new_pos, change in sorted(significant_changes, key=lambda x: abs(x[3]), reverse=True):
    direction = "‚Üë Subi√≥" if change > 0 else "‚Üì Baj√≥"
    print(f"  - Doc {doc_id}: Pos {old_pos} ‚Üí {new_pos} ({direction} {abs(change)} posiciones)")

=== AN√ÅLISIS DE CAMBIOS DE POSICI√ìN - Query 133 ===
Query: Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Top-10 ANTES del re-ranking (BM25):
 1. Doc 5270265 - Score BM25: 59.8531
 2. Doc 26688294 - Score BM25: 58.6757
 3. Doc 45764440 - Score BM25: 58.1495
 4. Doc 12785130 - Score BM25: 55.5299
 5. Doc 37964706 - Score BM25: 55.0283
 6. Doc 9507605 - Score BM25: 53.1311
 7. Doc 35884026 - Score BM25: 52.3605
 8. Doc 5914739 - Score BM25: 51.9492
 9. Doc 10991183 - Score BM25: 50.8288
10. Doc 86694016 - Score BM25: 49.1914


Top-10 DESPU√âS del re-ranking (Cross-Encoder):
 1. Doc 16280642 - Score CE: 0.7501 [Relevancia: 1] üÜï Nuevo en top-10
 2. Doc 12640810 - Score CE: 0.5765 [Relevancia: 1] üÜï Nuevo en top-10
 3. Doc 35660758 - Score CE: 0.2116 üÜï Nuevo en top-10
 4. Doc 36345185 - Score CE: -1.4978 üÜï Nuevo en top-10
 5. Doc 6969753 - Score CE: -1.7565 [Relevancia:

In [21]:
print("=== AN√ÅLISIS GLOBAL DE CAMBIOS DE POSICI√ìN ===")

total_changes = 0
docs_entered_top10 = 0
docs_left_top10 = 0
position_changes = []

for qid in queries.keys():
    if qid in bm25_results and qid in reranked_results:
        bm25_docs = [doc_id for doc_id, _ in bm25_results[qid][:10]]
        reranked_docs = [doc_id for doc_id, _ in reranked_results[qid]]

        # Contar documentos nuevos en top-10
        new_docs = set(reranked_docs) - set(bm25_docs)
        docs_entered_top10 += len(new_docs)

        # Contar documentos que salieron
        left_docs = set(bm25_docs) - set(reranked_docs)
        docs_left_top10 += len(left_docs)

        # Para documentos comunes, calcular cambio de posici√≥n
        common = set(bm25_docs) & set(reranked_docs)
        for doc_id in common:
            old_pos = bm25_docs.index(doc_id) + 1
            new_pos = reranked_docs.index(doc_id) + 1
            change = old_pos - new_pos
            if change != 0:
                total_changes += 1
                position_changes.append(abs(change))

print(f"Total de queries analizadas: {len(queries)}")
print(f"Documentos que ENTRARON al top-10 despu√©s del re-ranking: {docs_entered_top10}")
print(f"Documentos que SALIERON del top-10 despu√©s del re-ranking: {docs_left_top10}")
print(f"Cambios de posici√≥n dentro del top-10: {total_changes}")
if position_changes:
    print(f"Cambio promedio de posici√≥n: {sum(position_changes)/len(position_changes):.2f} posiciones")
    print(f"M√°ximo cambio de posici√≥n: {max(position_changes)} posiciones")

=== AN√ÅLISIS GLOBAL DE CAMBIOS DE POSICI√ìN ===
Total de queries analizadas: 300
Documentos que ENTRARON al top-10 despu√©s del re-ranking: 1626
Documentos que SALIERON del top-10 despu√©s del re-ranking: 1626
Cambios de posici√≥n dentro del top-10: 1046
Cambio promedio de posici√≥n: 2.89 posiciones
M√°ximo cambio de posici√≥n: 9 posiciones


## Parte 4. Implementaci√≥n del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qu√© documentos cambian de posici√≥n en el top 10

In [22]:
!pip install lightgbm



In [33]:
import lightgbm as lgb
from sklearn.model_selection import train_test_split
import numpy as np

def extract_features(query, doc_text, bm25_index=None, query_tokens=None):
    """
    Extraer caracter√≠sticas para LTR.
    Aqu√≠ definimos features simples como ejemplo.
    """
    features = []

    # 1. Longitud del documento (en palabras)
    doc_words = doc_text.split()
    features.append(len(doc_words))

    # 2. Longitud de la query (en palabras)
    query_words = query.split()
    features.append(len(query_words))

    # 3. Overlap de t√©rminos (proporci√≥n de t√©rminos de query en documento)
    if query_tokens:
        overlap = sum(1 for token in query_tokens if token in doc_text.lower())
        features.append(overlap)
        features.append(overlap / len(query_tokens) if query_tokens else 0)
    else:
        features.extend([0, 0])

    # 4. BM25 score (si se proporciona)
    if bm25_index and query_tokens:
        bm25_score = bm25_index.get_scores(query_tokens)
        # Tomar el score promedio como feature
        features.append(np.mean(bm25_score) if len(bm25_score) > 0 else 0)
    else:
        features.append(0)

    # 5. Longitud promedio de palabras en documento
    if doc_words:
        avg_word_len = sum(len(word) for word in doc_words) / len(doc_words)
        features.append(avg_word_len)
    else:
        features.append(0)

    return features

In [34]:
print("Preparando datos de entrenamiento para LTR...")

# Usaremos las queries de entrenamiento para el modelo LTR
# Para este ejemplo, usaremos todas las queries disponibles
X = []
y = []
groups = []  # Para agrupar por query (necesario para LambdaRank)

feature_names = [
    "doc_length",
    "query_length",
    "term_overlap",
    "overlap_ratio",
    "bm25_score",
    "avg_word_length"
]

for qid, query_text in queries.items():
    query_tokens = word_tokenize(query_text.lower())

    # Obtener documentos relevantes y no relevantes para esta query
    relevant_docs = []
    non_relevant_docs = []

    # Tomar algunos documentos como ejemplo
    # En un caso real, necesitar√≠as m√°s datos de entrenamiento
    if qid in qrels:
        # Documentos relevantes
        for doc_id, rel in qrels[qid].items():
            if rel > 0:
                relevant_docs.append(doc_id)

        # Tomar algunos documentos no relevantes (muestreo negativo)
        # Por simplicidad, tomamos algunos documentos aleatorios
        all_docs = list(corpus.keys())
        non_relevant_samples = min(20, len(all_docs) - len(relevant_docs))
        non_relevant_docs = np.random.choice(
            [doc for doc in all_docs if doc not in relevant_docs],
            size=non_relevant_samples,
            replace=False
        ).tolist()

        # Extraer caracter√≠sticas para documentos relevantes
        for doc_id in relevant_docs:
            doc_text = corpus[doc_id].get("text", "")
            features = extract_features(query_text, doc_text, bm25, query_tokens)
            X.append(features)
            y.append(1)  # Relevante
            groups.append(qid)

        # Extraer caracter√≠sticas para documentos no relevantes
        for doc_id in non_relevant_docs:
            doc_text = corpus[doc_id].get("text", "")
            features = extract_features(query_text, doc_text, bm25, query_tokens)
            X.append(features)
            y.append(0)  # No relevante
            groups.append(qid)

X = np.array(X)
y = np.array(y)
groups = np.array(groups)

print(f"Datos preparados: {len(X)} ejemplos, {len(feature_names)} caracter√≠sticas")
print(f"Distribuci√≥n de clases: {sum(y)} relevantes, {len(y)-sum(y)} no relevantes")

Preparando datos de entrenamiento para LTR...
Datos preparados: 6339 ejemplos, 6 caracter√≠sticas
Distribuci√≥n de clases: 339 relevantes, 6000 no relevantes


In [35]:
print("Entrenando modelo LTR (LambdaRank)...")
ltr_model = lgb.train(
    params,
    lgb_train,
    num_boost_round=100,
    valid_sets=[lgb_test],
    callbacks=[lgb.early_stopping(10)]
    # ‚Üê ELIMINA 'verbose_eval' completamente
)

print("Modelo LTR entrenado")

Entrenando modelo LTR (LambdaRank)...
Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[6]	valid_0's ndcg@10: 0.987275
Modelo LTR entrenado


In [36]:
def rerank_with_ltr(query, candidates, corpus_dict, ltr_model, bm25_index, k=10):
    """
    Re-rankear candidatos usando modelo LTR

    Args:
        query: texto de la consulta
        candidates: lista de (doc_id, score_bm25) de la primera etapa
        corpus_dict: diccionario con documentos originales
        ltr_model: modelo LTR entrenado
        bm25_index: √≠ndice BM25 para calcular caracter√≠sticas
        k: cu√°ntos documentos devolver despu√©s del re-ranking

    Returns:
        Lista de (doc_id, score_ltr) ordenada descendente
    """
    query_tokens = word_tokenize(query.lower())

    # Extraer caracter√≠sticas para todos los candidatos
    features_list = []
    candidate_docs = []

    for doc_id, bm25_score in candidates:
        doc_text = corpus_dict[doc_id].get("text", "")
        features = extract_features(query, doc_text, bm25_index, query_tokens)
        features_list.append(features)
        candidate_docs.append(doc_id)

    # Predecir scores con el modelo LTR
    features_array = np.array(features_list)
    ltr_scores = ltr_model.predict(features_array)

    # Combinar doc_ids con scores LTR
    reranked_results = list(zip(candidate_docs, ltr_scores))

    # Ordenar por score descendente
    reranked_results.sort(key=lambda x: x[1], reverse=True)

    return reranked_results[:k]

In [37]:
# Almacenar resultados re-rankeados con LTR
ltr_reranked_results = {}

print("Aplicando re-ranking con LTR...")
for qid, query_text in queries.items():
    if qid in bm25_results:
        candidates = bm25_results[qid]

        # Aplicar re-ranking LTR
        reranked = rerank_with_ltr(
            query_text,
            candidates,
            corpus,
            ltr_model,
            bm25,
            k=10
        )

        ltr_reranked_results[qid] = reranked

        # Mostrar progreso
        if len(ltr_reranked_results) % 50 == 0:
            print(f"  Procesadas {len(ltr_reranked_results)} queries")

print(f"Re-ranking LTR completado para {len(ltr_reranked_results)} queries")

Aplicando re-ranking con LTR...
  Procesadas 50 queries
  Procesadas 100 queries
  Procesadas 150 queries
  Procesadas 200 queries
  Procesadas 250 queries
  Procesadas 300 queries
Re-ranking LTR completado para 300 queries


In [38]:
qid = "133"
query_text = queries[qid]

print(f"=== AN√ÅLISIS DE CAMBIOS DE POSICI√ìN LTR - Query {qid} ===")
print(f"Query: {query_text}\n")

# Resultados BM25 (antes del re-ranking)
bm25_top10 = [(doc_id, score) for doc_id, score in bm25_results[qid][:10]]

# Resultados despu√©s del re-ranking LTR
ltr_top10 = ltr_reranked_results[qid]

# Crear diccionarios para comparar posiciones
bm25_positions = {doc_id: i+1 for i, (doc_id, _) in enumerate(bm25_top10)}
ltr_positions = {doc_id: i+1 for i, (doc_id, _) in enumerate(ltr_top10)}

print("Top-10 ANTES del re-ranking (BM25):")
for i, (doc_id, score) in enumerate(bm25_top10, 1):
    relevance = qrels[qid].get(doc_id, 0)
    rel_str = f" [Relevancia: {relevance}]" if relevance > 0 else ""
    print(f"{i:2d}. Doc {doc_id} - Score BM25: {score:.4f}{rel_str}")

print("\n" + "="*60)

print("\nTop-10 DESPU√âS del re-ranking LTR:")
for i, (doc_id, score) in enumerate(ltr_top10, 1):
    relevance = qrels[qid].get(doc_id, 0)
    rel_str = f" [Relevancia: {relevance}]" if relevance > 0 else ""

    # Verificar cambio de posici√≥n
    old_pos = bm25_positions.get(doc_id, ">10")
    change = ""
    if doc_id in bm25_positions:
        pos_change = bm25_positions[doc_id] - i
        if pos_change > 0:
            change = f" ‚Üë Subi√≥ {pos_change} posiciones"
        elif pos_change < 0:
            change = f" ‚Üì Baj√≥ {abs(pos_change)} posiciones"
        else:
            change = " ‚Üí Sin cambio"
    else:
        change = " üÜï Nuevo en top-10"

    print(f"{i:2d}. Doc {doc_id} - Score LTR: {score:.4f}{rel_str}{change}")

print("\n" + "="*60)

# An√°lisis de cambios
print("\n=== RESUMEN DE CAMBIOS LTR ===")

# Documentos que entraron al top-10 despu√©s del re-ranking LTR
new_in_top10 = set([doc_id for doc_id, _ in ltr_top10]) - set([doc_id for doc_id, _ in bm25_top10])
print(f"Documentos NUEVOS en top-10: {len(new_in_top10)}")
for doc_id in new_in_top10:
    old_pos = ">10"
    new_pos = ltr_positions[doc_id]
    print(f"  - Doc {doc_id}: Posici√≥n {old_pos} ‚Üí {new_pos}")

# Documentos que salieron del top-10
dropped_from_top10 = set([doc_id for doc_id, _ in bm25_top10]) - set([doc_id for doc_id, _ in ltr_top10])
print(f"\nDocumentos que SALIERON del top-10: {len(dropped_from_top10)}")
for doc_id in dropped_from_top10:
    old_pos = bm25_positions[doc_id]
    new_pos = ">10"
    print(f"  - Doc {doc_id}: Posici√≥n {old_pos} ‚Üí {new_pos}")

# Cambios de posici√≥n para documentos que permanecen en top-10
common_docs = set([doc_id for doc_id, _ in bm25_top10]) & set([doc_id for doc_id, _ in ltr_top10])
significant_changes = []
for doc_id in common_docs:
    old_pos = bm25_positions[doc_id]
    new_pos = ltr_positions[doc_id]
    change = old_pos - new_pos
    if change != 0:  # Solo mostrar si hubo cambio
        significant_changes.append((doc_id, old_pos, new_pos, change))

print(f"\nCambios de posici√≥n (documentos que permanecen en top-10): {len(significant_changes)}")
for doc_id, old_pos, new_pos, change in sorted(significant_changes, key=lambda x: abs(x[3]), reverse=True):
    direction = "‚Üë Subi√≥" if change > 0 else "‚Üì Baj√≥"
    print(f"  - Doc {doc_id}: Pos {old_pos} ‚Üí {new_pos} ({direction} {abs(change)} posiciones)")

=== AN√ÅLISIS DE CAMBIOS DE POSICI√ìN LTR - Query 133 ===
Query: Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Top-10 ANTES del re-ranking (BM25):
 1. Doc 5270265 - Score BM25: 59.8531
 2. Doc 26688294 - Score BM25: 58.6757
 3. Doc 45764440 - Score BM25: 58.1495
 4. Doc 12785130 - Score BM25: 55.5299
 5. Doc 37964706 - Score BM25: 55.0283
 6. Doc 9507605 - Score BM25: 53.1311
 7. Doc 35884026 - Score BM25: 52.3605
 8. Doc 5914739 - Score BM25: 51.9492
 9. Doc 10991183 - Score BM25: 50.8288
10. Doc 86694016 - Score BM25: 49.1914


Top-10 DESPU√âS del re-ranking LTR:
 1. Doc 5270265 - Score LTR: 0.4598 ‚Üí Sin cambio
 2. Doc 45764440 - Score LTR: 0.4598 ‚Üë Subi√≥ 1 posiciones
 3. Doc 37964706 - Score LTR: 0.4598 ‚Üë Subi√≥ 2 posiciones
 4. Doc 5914739 - Score LTR: 0.4598 ‚Üë Subi√≥ 4 posiciones
 5. Doc 86694016 - Score LTR: 0.4412 ‚Üë Subi√≥ 5 posiciones
 6. Doc 17123316 - Scor

## Parte 5. Evaluaci√≥n post re-ranking

Calcular m√©tricas:
* nDCG@10
* MAP
* Recall@10

In [39]:
from sklearn.metrics import ndcg_score
import numpy as np

def calculate_metrics(results_dict, qrels_dict, corpus_queries, k=10):
    """
    Calcular m√©tricas para un m√©todo de ranking

    Returns:
        ndcg_scores: lista de nDCG@10 por query
        map_scores: lista de MAP por query
        recall_scores: lista de Recall@10 por query
    """
    ndcg_scores = []
    map_scores = []
    recall_scores = []

    for qid in corpus_queries.keys():
        if qid not in results_dict or qid not in qrels_dict:
            continue

        # Obtener ranking del m√©todo
        ranked_results = results_dict[qid]
        top_k_docs = [doc_id for doc_id, _ in ranked_results[:k]]

        # Obtener relevancias verdaderas
        true_relevance = qrels_dict[qid]

        # Preparar arrays para nDCG
        actual_relevance = []
        for doc_id in top_k_docs:
            actual_relevance.append(true_relevance.get(doc_id, 0))

        # Obtener lista de documentos relevantes
        relevant_docs = [doc_id for doc_id, rel in true_relevance.items() if rel > 0]

        if not relevant_docs:  # Si no hay documentos relevantes, saltar
            continue

        # 1. Calcular nDCG@10
        # Orden ideal (por relevancia descendente)
        ideal_relevance = sorted([true_relevance.get(doc, 0) for doc in relevant_docs], reverse=True)
        ideal_relevance = ideal_relevance[:k]

        # Rellenar con ceros si es necesario
        while len(ideal_relevance) < k:
            ideal_relevance.append(0)
        while len(actual_relevance) < k:
            actual_relevance.append(0)

        try:
            ndcg = ndcg_score([ideal_relevance], [actual_relevance], k=k)
            ndcg_scores.append(ndcg)
        except:
            ndcg_scores.append(0.0)

        # 2. Calcular Recall@10
        retrieved_relevant = set(top_k_docs) & set(relevant_docs)
        recall = len(retrieved_relevant) / len(relevant_docs)
        recall_scores.append(recall)

        # 3. Calcular MAP (Mean Average Precision)
        # Precision en cada punto de recuperaci√≥n de documentos relevantes
        precision_values = []
        relevant_found = 0

        for i, doc_id in enumerate(top_k_docs, 1):
            if doc_id in relevant_docs:
                relevant_found += 1
                precision_at_i = relevant_found / i
                precision_values.append(precision_at_i)

        if precision_values:
            ap = sum(precision_values) / len(relevant_docs)  # Average Precision
        else:
            ap = 0.0
        map_scores.append(ap)

    return ndcg_scores, map_scores, recall_scores

print("Calculando m√©tricas para BM25...")
bm25_ndcg, bm25_map, bm25_recall = calculate_metrics(bm25_results, qrels, queries, k=10)

Calculando m√©tricas para BM25...


In [40]:
print("Calculando m√©tricas para Cross-Encoder...")
ce_ndcg, ce_map, ce_recall = calculate_metrics(reranked_results, qrels, queries, k=10)

Calculando m√©tricas para Cross-Encoder...


In [41]:
print("Calculando m√©tricas para LTR...")
ltr_ndcg, ltr_map, ltr_recall = calculate_metrics(ltr_reranked_results, qrels, queries, k=10)

Calculando m√©tricas para LTR...


In [42]:
import pandas as pd

# Crear tabla comparativa
metrics_data = {
    'M√©trica': ['nDCG@10', 'MAP', 'Recall@10'],
    'BM25': [
        np.mean(bm25_ndcg) if bm25_ndcg else 0,
        np.mean(bm25_map) if bm25_map else 0,
        np.mean(bm25_recall) if bm25_recall else 0
    ],
    'Cross-Encoder': [
        np.mean(ce_ndcg) if ce_ndcg else 0,
        np.mean(ce_map) if ce_map else 0,
        np.mean(ce_recall) if ce_recall else 0
    ],
    'LTR': [
        np.mean(ltr_ndcg) if ltr_ndcg else 0,
        np.mean(ltr_map) if ltr_map else 0,
        np.mean(ltr_recall) if ltr_recall else 0
    ]
}

metrics_df = pd.DataFrame(metrics_data)
metrics_df = metrics_df.round(4)

print("=== COMPARACI√ìN DE M√âTRICAS ===")
print(f"N√∫mero de queries evaluadas: {len(bm25_ndcg)}")
print("\nTabla comparativa:")
print(metrics_df.to_string(index=False))

=== COMPARACI√ìN DE M√âTRICAS ===
N√∫mero de queries evaluadas: 300

Tabla comparativa:
  M√©trica   BM25  Cross-Encoder    LTR
  nDCG@10 0.7247         0.7312 0.6808
      MAP 0.5790         0.6099 0.4979
Recall@10 0.7342         0.7713 0.6525


In [43]:
# Calcular mejoras relativas respecto a BM25
print("\n=== MEJORAS PORCENTUALES (vs BM25) ===")

for i, metrica in enumerate(['nDCG@10', 'MAP', 'Recall@10']):
    bm25_val = metrics_data['BM25'][i]
    ce_val = metrics_data['Cross-Encoder'][i]
    ltr_val = metrics_data['LTR'][i]

    if bm25_val > 0:  # Evitar divisi√≥n por cero
        ce_improvement = ((ce_val - bm25_val) / bm25_val) * 100
        ltr_improvement = ((ltr_val - bm25_val) / bm25_val) * 100

        print(f"\n{metrica}:")
        print(f"  Cross-Encoder: {ce_improvement:+.2f}% ({bm25_val:.4f} ‚Üí {ce_val:.4f})")
        print(f"  LTR: {ltr_improvement:+.2f}% ({bm25_val:.4f} ‚Üí {ltr_val:.4f})")


=== MEJORAS PORCENTUALES (vs BM25) ===

nDCG@10:
  Cross-Encoder: +0.89% (0.7247 ‚Üí 0.7312)
  LTR: -6.05% (0.7247 ‚Üí 0.6808)

MAP:
  Cross-Encoder: +5.34% (0.5790 ‚Üí 0.6099)
  LTR: -14.01% (0.5790 ‚Üí 0.4979)

Recall@10:
  Cross-Encoder: +5.05% (0.7342 ‚Üí 0.7713)
  LTR: -11.12% (0.7342 ‚Üí 0.6525)
