# Getting started

### CLEF 2025 - CheckThat! Lab  - Task 4 Scientific Web Discourse - Subtask 4b (Scientific Claim Source Retrieval)

This notebook enables participants of subtask 4b to quickly get started. It includes the following:
- Code to upload data, including:
    - code to upload the collection set (CORD-19 academic papers' metadata)
    - code to upload the query set (tweets with implicit references to CORD-19 papers)
- Code to run a baseline retrieval model (BM25)
- Code to evaluate the baseline model

Participants are free to use this notebook and add their own models for the competition.

# 1) Importing data

In [None]:
import numpy as np
import pandas as pd

## 1.a) Import the collection set
The collection set contains metadata of CORD-19 academic papers.

The preprocessed and filtered CORD-19 dataset is available on the Gitlab repository here: https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task4?ref_type=heads

Participants should first download the file then upload it on the Google Colab session with the following steps.


In [None]:
# 1) Download the collection set from the Gitlab repository: https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task4?ref_type=heads
# 2) Drag and drop the downloaded file to the "Files" section (left vertical menu on Colab)
# 3) Modify the path to your local file path
PATH_COLLECTION_DATA = 'subtask4b_collection_data.pkl' #MODIFY PATH

In [None]:
df_collection = pd.read_pickle(PATH_COLLECTION_DATA)

In [None]:
df_collection.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7764 entries, 162 to 1056448
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   cord_uid          7764 non-null   object        
 1   source_x          7764 non-null   object        
 2   title             7764 non-null   object        
 3   doi               7706 non-null   object        
 4   pmcid             4963 non-null   object        
 5   pubmed_id         6257 non-null   object        
 6   license           7764 non-null   object        
 7   abstract          7764 non-null   object        
 8   publish_time      7761 non-null   object        
 9   authors           7720 non-null   object        
 10  journal           6704 non-null   object        
 11  mag_id            0 non-null      float64       
 12  who_covidence_id  549 non-null    object        
 13  arxiv_id          20 non-null     object        
 14  label             7764 n

In [None]:
df_collection.head()

Unnamed: 0,cord_uid,source_x,title,doi,pmcid,pubmed_id,license,abstract,publish_time,authors,journal,mag_id,who_covidence_id,arxiv_id,label,time,timet
162,umvrwgaw,PMC,Professional and Home-Made Face Masks Reduce E...,10.1371/journal.pone.0002618,PMC2440799,18612429,cc-by,BACKGROUND: Governments are preparing for a po...,2008-07-09,"van der Sande, Marianne; Teunis, Peter; Sabel,...",PLoS One,,,,umvrwgaw,2008-07-09,1215561600
611,spiud6ok,PMC,The Failure of R (0),10.1155/2011/527610,PMC3157160,21860658,cc-by,"The basic reproductive ratio, R (0), is one of...",2011-08-16,"Li, Jing; Blakeley, Daniel; Smith?, Robert J.",Comput Math Methods Med,,,,spiud6ok,2011-08-16,1313452800
918,aclzp3iy,PMC,Pulmonary sequelae in a patient recovered from...,10.4103/0970-2113.99118,PMC3424870,22919170,cc-by-nc-sa,The pandemic of swine flu (H1N1) influenza spr...,2012,"Singh, Virendra; Sharma, Bharat Bhushan; Patel...",Lung India,,,,aclzp3iy,2012-01-01,1325376000
993,ycxyn2a2,PMC,What was the primary mode of smallpox transmis...,10.3389/fcimb.2012.00150,PMC3509329,23226686,cc-by,The mode of infection transmission has profoun...,2012-11-29,"Milton, Donald K.",Front Cell Infect Microbiol,,,,ycxyn2a2,2012-11-29,1354147200
1053,zxe95qy9,PMC,"Lessons from the History of Quarantine, from P...",10.3201/eid1902.120312,PMC3559034,23343512,no-cc,"In the new millennium, the centuries-old strat...",2013-02-03,"Tognotti, Eugenia",Emerg Infect Dis,,,,zxe95qy9,2013-02-03,1359849600


## 1.b) Import the query set

The query set contains tweets with implicit references to academic papers from the collection set.

The preprocessed query set is available on the Gitlab repository here: https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task4?ref_type=heads

Participants should first download the file then upload it on the Google Colab session with the following steps.

In [None]:
# 1) Download the query tweets from the Gitlab repository: https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/tree/main/task4?ref_type=heads
# 2) Drag and drop the downloaded file to the "Files" section (left vertical menu on Colab)
# 3) Modify the path to your local file path
PATH_QUERY_DATA = 'subtask4b_query_tweets.tsv' #MODIFY PATH

In [None]:
df_query = pd.read_csv(PATH_QUERY_DATA, sep = '\t')

In [None]:
df_query.head()

Unnamed: 0,tweet_text,cord_uid
0,Oral care in rehabilitation medicine: oral vul...,htlvpvz5
1,this study isn't receiving sufficient attentio...,4kfl29ul
2,"thanks, xi jinping. a reminder that this study...",jtwb17u8
3,Taiwan - a population of 23 million has had ju...,0w9k8iy1
4,Obtaining a diagnosis of autism in lower incom...,tiqksd69


In [None]:
df_query.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14253 entries, 0 to 14252
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   tweet_text  14253 non-null  object
 1   cord_uid    14253 non-null  object
dtypes: object(2)
memory usage: 222.8+ KB


In [None]:
np.random.seed(88)
df_query = df_query.sample(20)

# 2) Running the baseline
The following code runs a BM25 baseline.


In [None]:
!pip install rank_bm25
from rank_bm25 import BM25Okapi

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [None]:
# Create the BM25 corpus
corpus = df_collection[:][['title', 'abstract']].apply(lambda x: f"{x['title']} {x['abstract']}", axis=1).tolist()
cord_uids = df_collection[:]['cord_uid'].tolist()
tokenized_corpus = [doc.split(' ') for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

In [None]:
text2bm25top = {}

def get_top_cord_uids(query):
    if query in text2bm25top.keys():
        return text2bm25top[query]
    else:
        tokenized_query = query.split(' ')
        doc_scores = bm25.get_scores(tokenized_query)
        indices = np.argsort(-doc_scores)[:1000]
        bm25_topk = [cord_uids[x] for x in indices]

        text2bm25top[query] = bm25_topk
        return bm25_topk


In [None]:
# Retrieve topk candidates using the BM25 model
df_query['bm25_topk'] = df_query['tweet_text'].apply(lambda x: get_top_cord_uids(x))

# 3) Evaluating the baseline
The following code evaluates the BM25 retrieval baseline on the query set using the Mean Reciprocal Rank score (MRR@5).

In [None]:
# Evaluate retrieved candidates using MRR@k
def get_performance_mrr(data, col_gold, col_pred, list_k = [1, 5, 10]):
    d_performance = {}
    for k in list_k:
        data["in_topx"] = data.apply(lambda x: (1/([i for i in x[col_pred][:k]].index(x[col_gold]) + 1) if x[col_gold] in [i for i in x[col_pred][:k]] else 0), axis=1)
        #performances.append(data["in_topx"].mean())
        d_performance[k] = data["in_topx"].mean()
    return d_performance


In [None]:
results = get_performance_mrr(df_query, 'cord_uid', 'bm25_topk')
# Printed MRR@k results in the following format: {k: MRR@k}
print(results)

{1: 0.45, 5: 0.475, 10: 0.4916666666666667}


# 4) Possible solutions

# 4.1) BM25 for Retrieval and SBert for Rerank

In [None]:
!pip install sentence-transformers
from sentence_transformers import SentenceTransformer

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

In [None]:
from sentence_transformers import CrossEncoder
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [None]:
def retrieve_paper(paper_ids):
  paper_dict = {}
  for id in paper_ids:
    paper_data = df_collection[df_collection['cord_uid'] == id]
    title = paper_data.iloc[0]['title']
    abstract = paper_data.iloc[0]['abstract']
    paper_dict[id] = {'title': title, 'abstract': abstract}
  return paper_dict

df_query['title_abstract'] = df_query['bm25_topk'].apply(lambda row: retrieve_paper(row))

In [None]:
import pandas as pd

# Set display options to show full DataFrame without truncation
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', None)  # Adjust width for full view
pd.set_option('display.max_colwidth', None)

In [None]:
def rerank_with_crossencoder(row):
    tweet = row['tweet_text']
    title_abstracts = row['title_abstract']

    model_inputs = [(tweet, f"{paper_data['title']} {paper_data['abstract']}")
                    for paper_id, paper_data in title_abstracts.items()]

    results = cross_encoder.rank(query=tweet, documents=[f"{paper_data['title']} {paper_data['abstract']}"
                                                 for paper_id, paper_data in title_abstracts.items()],
                         top_k=10, show_progress_bar = True)

    ranked_document_indices = [result['corpus_id'] for result in results]

    ranked_paper_ids = [list(title_abstracts.keys())[index] for index in ranked_document_indices]

    return ranked_paper_ids

df_query['bm25_cross_encoder_topk'] = df_query.apply(lambda row: rerank_with_crossencoder(row), axis=1)

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/31 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/31 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

KeyError: "['bm25_cross_encoder_topk'] not in index"

In [None]:
# Check the result (this will contain the tweet and paper pairs)
df_query[['tweet_text', 'cord_uid', 'bm25_topk', 'bm25_cross_encoder_topk']].head()

Unnamed: 0,tweet_text,cord_uid,bm25_topk,cross_encoder_topk
8073,Sinovac. new safety/immunogenicity results in a participant's aged >18 yrs in phase 3 trial in Chile. Interim report: Safety/immunogenicity of an inactivated vaccine against SARS-CoV-2 in healthy Chilean adults in a phase 3 clinical trial | medRxiv,rcb943w2,"[rcb943w2, ssdqobqb, amw2c96l, dkfohub3, so6irh9b, rda2kmv1, kff7ho04, qa1tbu6t, 1uabldiq, 0pptqw7o, i1icueuw, r7aqsyfl, 6akx6xpt, awfgfmx1, ehmy9al1, tl39u0h9, 4vuucmfd, wdtrpnkn, hmkqnply, y77rbrnd, k2jzadwq, ukbhlaaa, 86qczmbt, nk8m2dxt, 3ncm8iwb, vxf2jexb, kdxoyvsv, tq8jpmin, hay91kuq, ztwemu74, uj5deryi, r4fd5vp6, t90pvtad, 9uxazk87, zbi0llig, nvtoxzka, 6274gicp, 4zcjjoc7, g2wpw330, 0n07fow0, 5rkxs2w3, z8758x5u, wabd3b9z, gh6bizzh, 5gshj480, zql4zhyk, 3g8asqkb, ydh4ve24, v7szuzfa, o71p89nb, y40s45iv, pvw9t7zd, z9jqbliw, ifxm3j4y, tz8gc3cz, 3qefts96, 7wsumadw, 3jfok14h, 8hbll61z, 5ojyly05, s86zb5up, 8x3lqokw, vo4csaah, 8ljnxihr, lyhdeks1, 8pz6131o, z218oii3, wt6azxc1, xj1nw76b, mck3rgcm, ix48o00j, ro29tdt8, 1hlqov6m, rpa2v44v, 7z6e5jhh, 4ttda53p, s4xintwp, u7tqolyi, bi9fid5f, 0igh53pl, h4sbz3md, mldwnz43, lq580iwg, 7twi1z5b, a45f2qtl, 1gv3t5t0, olv2kuwx, fdkwklzj, m1bvurwi, e0pz0z4j, mqyu72nc, rng65ofx, c1db9jlv, uexahhdr, 7iuls8x1, 8t2tic9n, 9jlnker4, wdfzrzkt, 5vp2r2bd, fvcyqzgb, ...]","[rcb943w2, 1uabldiq, dkfohub3, 0pptqw7o, ssdqobqb, amw2c96l, kdxoyvsv, rda2kmv1, hmkqnply, nvtoxzka]"
4754,"before discussing #vaccine impacts, is there a link between the ongoing #covid-19 pandemic &amp; rates of #erectiledysfunction❓ 🎉here is our #systematic review that examines this topic, taking into account both exposure and #lockdown effects.",lcqxrcji,"[8rv94jeu, l69kwh48, zmkvs7e7, j0uojajt, e2rtzymt, sxx3yid9, mkl7rvr6, wu2dyleo, 44hifagu, f0m3wuyj, o5zkv42f, cpbu3fv3, a0q61mpi, uppeztta, dbxufi3m, 4ohgr8j7, 6gcgzx36, g9ao6ruo, 2i7zd1t2, sijfzd2o, kt47i60w, l6zsf85e, l375v2my, 57qk34fx, fer4tlhz, ws7lp560, 3a7utmsd, 0t5n87u6, bk3ss63e, b0g8y96j, 4bbapd1t, 0lk8eujq, aq8ckfv7, cymppa7g, styavbvi, umyojne2, hsm75cww, v7udqoi0, o9xdv4x0, 42wcyjuk, p9qeaedh, 8hvve871, r5552ql7, b9b6ww9q, ekal5251, nemkary2, 9gnqfmbq, azr73w2w, z65wm00n, xdbtwiqw, otjnwb4s, wfa5fb3r, ib17jyva, lcqxrcji, ouno4jpl, n119a7fx, 5wlbalzu, oud5ioks, xxy10tcl, gg5c8v7d, 60e9lhfi, sh9w4ipq, c80hle1p, sajmj1jw, r4j1rcm5, 6sy80720, 2rx84imv, ierqfgo5, yslf5er3, 0ho4f5wg, 3lzm42wq, yq6jhupe, ueh60bz2, gatxuwz7, 8w2t6u96, scw4p9wq, mfop9fne, 3ees74tt, 1m2mp7o7, fbsrggog, y14lw5b2, y047cuxp, 2arx86a6, ncbjncfx, 2q6qmex3, a3qf6i28, aahsvh09, 0c03fjk5, 3r8jbhhq, v1egyqt4, 2yka2luy, q55l1chd, p4ejdth1, rvygtpvb, tmp6yxlv, oijl3pjt, xpqfpizp, rwh56zhg, lt7qsxxh, 4zelg0m1, ...]","[uppeztta, rcb943w2, aahsvh09, 8rv94jeu, 0rrhgz26, 9sv03cqa, v1egyqt4, o29hjkca, s7z1l3a9, to2wxs8m]"
11094,i just had a terrifying realization: hiv attaches with cd4 &gt; the body absorbs and suppresses cd4 &gt; long term - no cd4 t cells sars-cov-2 attaches with ace2 &gt; the body absorbs and suppresses ace2 &gt; long term - out of control raas (ang ii),wku1sd9k,"[s86zb5up, f5g3mcee, geo7ac5i, 25aj8rj5, xwax7o13, b52pn8t9, tov6uq27, 65fwicjz, 2u9eenwu, k6cumncp, t2gxkxxd, jzosdlu7, tffpan0f, 2jdlavwj, wku3qrtg, qv31t2vh, fxwszm22, est5jx7g, far6giyb, 0r46eacc, 0r46eacc, izlg8zu6, pspko62f, xcs9podj, 4unn3fmu, 1pbo1qlb, vmmwtdia, yr6z0eki, 8r6hln9b, 0ojzssli, jyxrk9bz, ltet5qu6, rzahax88, zmxmo1q8, n1q1wr9s, yrmeweat, utc0qrax, mcw3ir3a, h82s5xst, nc2sh98g, 6sy80720, bn4rpjv0, 50dmclk4, n3upa0xn, llo64qbp, n2wqalcu, tqx25jad, lehzj4d8, a64w0a1y, thaje9nj, hdjlnot1, aaxhrecp, 446p4tbc, 1iago9rr, ky5env7t, 4dsq1dds, a5av0dq8, g9ao6ruo, 7jwqm0b2, k1w2wxnu, k4ttbbix, 6vqf2n5j, 8xf8u6bg, 5tpkcd5z, 0hqe4jb5, ndhhy1xt, ruw45n81, hyezouy8, 8rvhqd9b, jb455t9p, m4u4ulml, l3k9bb5m, 89gzlrd1, xi17qo6z, ikoyaj3b, f6sm0w9y, 2199ydle, c187k4yc, irk1gxeq, b22cioi2, 6vkab51a, 4pyx0xps, 9y9s7tn3, kde51mn0, itx70h27, tv1a1hwj, o4vvlmr4, waerqfzu, wpuujmjx, 2i7zd1t2, hgg33kwz, lz8b60ew, 58r5wcwd, erygg5u2, cfkeqgj3, p4ejdth1, fa05sovj, 5dwkunqw, ajidoq7c, ioey6zgd, ...]","[hw2s28hs, x996e0cn, 1peg3502, 5hpbjkft, if40hlvg, g4250q6l, 8qml9rrb, 8xf8u6bg, d5w0ecuy, 4uqgpb0w]"
2290,the sars-cov-2 receptor-binding domain elicits a strong neutralizing response without antibody-dependent enhancement [🚨preprint],fnguelau,"[fnguelau, cd81i918, 3qefts96, f5s0ntps, 7xnga86x, t3dzso36, 8llmriik, wgfdd3lm, q7vv128t, m1bvurwi, 1m2mp7o7, awfgfmx1, mpeugvt9, moxjaxuy, gvaslwh9, pkxc2219, m89wmmxd, q3db7v16, v4pkwd44, q4zuslmp, 1qwq3jme, 8gub6wmy, 9f7k0q1h, ta3zlz4z, eczga3ur, 9k4pwc2h, x73moqog, kgmugkmw, d7c6l97j, h8ykn6ut, b36vq2pj, kq6rups8, 5jlohhmv, syy2r7jr, ag5coipu, g2wpw330, ksveiiid, 8bi9owk4, wt5qxe0j, lnnsxwk2, rw5bci0y, 53t1mhnb, spicb5h2, maxbdi99, eys1k8gb, f296patc, h8c8lc0a, d6mt6nze, toy8i1lm, hay91kuq, nt1e70dv, nagj4wh8, nrhk8ctf, w1jjy29i, vblfew3o, nj1p4ehx, rb20ge7e, 1l0vrbi0, gnhvvpn6, 1oatq6x0, k2qertx2, edsrpgjo, ksjkypzx, wv2gzahy, entkqcn3, jvo0nk3w, ahpkmffy, tmp6yxlv, 18swtk61, c18arb6s, z5jbjx59, l9bswegi, 6kbdxhxv, 4uqgpb0w, ngbnnpni, tq41ioyt, j7ltjn4k, hz5vigp0, jg7ycmf9, ratbgibg, 9p2pzsx0, 721ofhsv, u5rbvc3i, ybcr7clp, wku1sd9k, et3c99w0, 5jjoko32, tsyl7crt, park7t3g, 7yocj24n, loq68wfv, bs5hcx6l, onhhjbbt, 69wuny3p, 48ay8yl3, tkxzhf7c, t0zphgfl, 4vvuye8u, 2cq5vpyd, c80hle1p, ...]","[fnguelau, m13ndhaz, h8ykn6ut, wgfdd3lm, kq6rups8, m89wmmxd, 8llmriik, q3db7v16, gvaslwh9, kgmugkmw]"
4288,"Unfortunately, COVID during pregnancy poses several risks, including premature birth as shown in your photo.",5rpd8d0t,"[ipblyab0, rg5sd5ya, mzekhyu0, 0v55vvfd, hp62t734, ebgu29uh, ebgu29uh, w3gt0w41, 24t0lunz, vpoqfm7d, wa5rxe76, jw713e2n, sgezc9kx, nb5ayz0h, y2bsx8p2, y9kkl2lf, q5ie1v0v, 68x50rni, ltet5qu6, d81rq6b9, 4u3v4vyu, 5rpd8d0t, tyhwtt7j, 6j45mmjh, 3xzivp7d, 1tlduxz2, rm6av6sj, ptnjbhtw, 0r580il2, uv5jctnd, 5j1pce5e, nvbt5gxl, t13pehgc, bjvg2ivr, maj8r6ti, ilnudz0g, jyxrk9bz, v9iq01dc, ccjzc1x0, c8sthkc6, eakfj0wv, 0bk9jmdu, l9lj98b1, dwgxrbag, 06fwhyac, pmcvhg2i, 97msfh4l, aaxhrecp, ttytoz3v, ue09khtb, tmgmqtjq, 37gpov43, a68y6qsf, bgsbbszf, 8rvhqd9b, 250glj07, q3h8afld, ks0zkp57, 93y40vnp, pxpghfhq, ia8rou81, fm73ly1x, z22g03v4, afytjnny, yvdaljz9, byvsuvn0, n85adjg5, styavbvi, 8pemlz7a, a5av0dq8, vmmwtdia, s6u194yd, uoj3sclr, g4nceu64, wrl7buxr, jbimjcx4, c3a85ld0, porptfqn, ocl5qf9o, tonemvd7, 18b6ikq3, 6cttilm8, lstwxv4k, w7smahni, qv1xdqau, bxjmeqgp, 7jwqm0b2, 9183740p, xii14gan, b4tqd8wx, 3cl3q80m, 4mlz5w0j, kl8kg0yv, xn2zi1uq, vvxz7pei, vq9m9m94, 5wlbalzu, r76tqrwz, h0spcvjl, 407va85q, ...]","[yvdaljz9, c3a85ld0, wa5rxe76, vpoqfm7d, 68x50rni, l9lj98b1, 18b6ikq3, fyy42zaf, 37gpov43, xn2zi1uq]"


In [None]:
results = get_performance_mrr(df_query, 'cord_uid', 'bm25_cross_encoder_topk')
# Printed MRR@k results in the following format: {k: MRR@k}
print(results)

{1: 0.55, 5: 0.5916666666666666, 10: 0.5916666666666666}
