# WOWS-Eval Retrieva Baseline

This is a retrieval baseline to WOWS-EVAL that uses a retrieval model to assign the probability that an unknown document is relevant. The query is used as query which retrieves against all unknown documents that are to-be judged. The probability that a document is relevant is then the min-max normalized rank of an unknown document in the ranking of all unknown documents to the query.

## Step 1: Install Dependencies



In [None]:
!pip3 install 'wows-eval>=0.0.6' python-terrier==0.10.0

## Step 2: Load the Data

Pointwise models have a query and a document as input and output the probability that the document is relevant into a field `probability_relevant`. For this retrieval baseline, we just take a PyTerrier retrieval model and use the min-max normalized retrieval rank as probability.

In the following, we will process the pointwise smoke test dataset. Please modify the variable `DATASET_ID` to submit for other datasets. See [tira.io/datasets?query=wows-eval](https://archive.tira.io/datasets?query=wows-eval) for an complete overview of dataset identifiers.


In [1]:
import pyterrier as pt
from tqdm import tqdm
if not pt.started():
    pt.init()

from tira.rest_api_client import Client
from wows_eval import evaluate as wows_evaluate
import pandas as pd
from jnius import autoclass
import numpy as np

# For measuring consumed resources (e.g., GPU, CPU, RAM, etc.)
from tirex_tracker import tracking, ExportFormat

pd.set_option('display.max_colwidth', None)

# Dataset IDs visible at https://archive.tira.io/datasets?query=wows-eval
DATASET_ID = 'wows-eval/pointwise-smoke-test-20250128-training'
#DATASET_ID = 'wows-eval/pairwise-20250309-test'

tira = Client()
input_data = tira.pd.inputs(DATASET_ID)

PyTerrier 0.10.0 has loaded Terrier 5.11 (built by craig.macdonald on 2025-01-13 21:29) and terrier-helper 0.0.8



## Step 3: Look at the data

In [2]:
input_data.head(2)

Unnamed: 0,id,query,unknown
0,32d23068-7440-4891-9958-42325f98a604,who sings monk theme song,This is a reference to the minor controversy that brewed among Monk fans over the introduction of the new theme song It's A Jungle Out There written and performed by Randy Newman in the second season of Monk.
1,cde83146-ac3e-4bc5-a959-f2006ac7b8de,who sings monk theme song,"Walker, Texas Ranger. Chuck Norris thought “Eyes of a Ranger” would be the perfect theme song for his new show Walker, Texas Ranger. He wanted his friend Randy Travis should sing it, but CBS had a different idea: The network suggested Norris sing the theme himself."


## Step 4: Implement the Approach

We wrap all computations into a [tirex_tracker.tracking](https://github.com/tira-io/tirex-tracker/) environment to measure the resources consumed for our computations and also a snapshot of our code in the [ir-metadata format](https://www.ir-metadata.org/).

In [3]:
WMODEL = "BM25"
system_name = f'pointwise-retrieval-{WMODEL}'.lower()

def retrieve_and_normalize(query, retrieval_system, unk):    
    ret = {}
    tokeniser = autoclass("org.terrier.indexing.tokenisation.Tokeniser").getTokeniser()
    query_text = " ".join(tokeniser.getTokens(query))
    run = retrieval_system.search(query_text)
    last_rank = -1
    scores = {}
    for _, i in run.iterrows():
        assert last_rank < i['rank']
        last_rank = i['rank']
        if i['docno'] in unk:
            scores[unk[i['docno']]] = i['rank']

    max_score = max(scores.values())
    min_score = min(scores.values())

    return {k: ((v-min_score)/(max_score-min_score)) for k, v in scores.items()}

!rm -Rf run
with tracking(export_file_path='run/.metadata.yml', export_format=ExportFormat.IR_METADATA) as tracked:
    queries = set(input_data['query'].unique())

    def unknown_documents(query):
        docs = set(input_data[input_data['query'] == query]['unknown'].unique())
        return {f'{i[0]}-unkn': i[1] for i in zip(range(len(docs)), docs)}

    results = {}
    for query in tqdm(queries):
        unk = unknown_documents(query)

        docs = [{'docno': k, 'text': v} for k, v in unk.items()]
        indexer = pt.IterDictIndexer("/tmp/index", overwrite=True, meta={'docno': 100, 'text': 20480})
        index_ref = indexer.index(docs)
        retriever = pt.BatchRetrieve(index_ref, wmodel=WMODEL)
        results[query] = retrieve_and_normalize(query, retriever, unk)

    predictions = []
    for _, i in input_data.iterrows():
        res = results[i['query']]
        predictions.append({
            'id': i['id'],
            'probability_relevant': res.get(i['unknown'], -1)
        })
    predictions = pd.DataFrame(predictions)


PCM Info: setrlimit for file limit 1000000 failed with error Operation not permitted

=====  Processor information  =====
Linux arch_perfmon flag  : yes
Hybrid processor         : yes
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 32
CPU model number         : 154
ERROR: Can not open /sys/module/msr/parameters/allow_writes file.
PCM Error: can't open MSR handle for core 0 (No such file or directory)
Try no-MSR mode by setting env variable PCM_NO_MSR=1
Can not access CPUs Model Specific Registers (MSRs).
execute 'modprobe msr' as root user, then execute pcm as root user.
100%|██████████| 2/2 [00:00<00:00,  2.51it/s]


## Step 5: Evaluate and Submit Your Run

We use the `wows_evaluate` method imported above to evaluate our predictions and to upload them, to TIRA.

The `wows_evaluate` method has optional parameters that you can pass to describe your system and to include the resource measurements used during your computations in the ir-metadata format into your submission. You can remove those attributes or modify them for your submission accordingly. Call `help(wows_evaluate)` to see a full description.

In [4]:
wows_evaluate(
    predictions,
    DATASET_ID,
    tracking_results=tracked,
    upload=True,
    system_name=system_name,
    system_description=f'We use the PyTerrier retrieval model {WMODEL} to assign the probability that an unknown document is relevant. We rank all unknown documents that are to-be judged against the query. The probability that a document is relevant is then the min-max normalized rank of an unknown document in the ranking of all unknown documents to the query.'
)

Run uploaded to TIRA. Claim ownership via: https://www.tira.io/claim-submission/18f7f8aa-192f-4d08-a03b-0866d540ccae


Unnamed: 0,system,tau_ap,kendall,spearman,pearson
0,pointwise-retrieval-bm25,0.040556,0.085714,0.1,0.1
