# WOWS-Eval Autoqrels Pointwise Baseline

This is a pointwise autoqrels baseline to WOWS-EVAL that uses autoqrels to predict the probability that a document is relevant to a query.

In [None]:
!pip3 install wows-eval 'git+https://github.com/mam10eks/autoqrels.git'

## Step 2: Load the Data

Pointwise models have a query and a document as input and output the probability that the document is relevant into a field `probability_relevant`. For this naive baseline, we always predict a probability of 0.5.

In the following, we will process the pointwise smoke test dataset. Please modify the variable `DATASET_ID` to submit for other datasets. See [tira.io/datasets?query=wows-eval](https://archive.tira.io/datasets?query=wows-eval) for an complete overview of dataset identifiers.


In [5]:
from tira.rest_api_client import Client
from wows_eval import evaluate as wows_evaluate
from autoqrels.zeroshot import GradedMonoPrompt
import pandas as pd

# For measuring consumed resources (e.g., GPU, CPU, RAM, etc.)
from tira_measure import Environment

pd.set_option('display.max_colwidth', None)

DATASET_ID = 'wows-eval/pointwise-smoke-test-20250128-training'

tira = Client()
input_data = tira.pd.inputs(DATASET_ID)

## Step 3: Look at the data

In [3]:
input_data.head(2)

Unnamed: 0,id,query,unknown
0,32d23068-7440-4891-9958-42325f98a604,who sings monk theme song,This is a reference to the minor controversy that brewed among Monk fans over the introduction of the new theme song It's A Jungle Out There written and performed by Randy Newman in the second season of Monk.
1,cde83146-ac3e-4bc5-a959-f2006ac7b8de,who sings monk theme song,"Walker, Texas Ranger. Chuck Norris thought “Eyes of a Ranger” would be the perfect theme song for his new show Walker, Texas Ranger. He wanted his friend Randy Travis should sing it, but CBS had a different idea: The network suggested Norris sing the theme himself."


## Step 4: Implement the AutoQrels approach

Here, we use a prompted language model (you can modify the prompt and the backbone model) to predict the relevance of the document to the query and store the probability in the field `probability_relevant`. We wrap all computations into a `tira_measure.Environment` to measure the resources consumed for our computations to later include the used resources into the ir-metadata of our run.

In [6]:
BACKBONE_MODEL = "flan-t5-small"

PROMPT = """Instruction: Indicate if the passage answers the question.
###
Example 1:
Question: At about what age do adults normally begin to lose bone mass?
Passage: For most people, bone mass peaks during the third decade of life. By this age, men typically have accumulated more bone mass than women. After this point, the amount of bone in the skeleton typically begins to decline slowly as removal of old bone exceeds formation of new bone.
Answer: Perfectly relevant
###
Example 2:
Question: when and where did the battle of manassas take place
Passage: Summary of the Battle of Bull Run. The conflict took place close to Manassas Junction, Virginia. Around 35,000 Union soldiers marched from Washing D.C. towards Bull Run (a small river) where a 20,000 troop Confederate force was stationed.
Answer: Irrelevant
###
Example 3:
Question: which kind of continental boundary is formed where two plates move horizontally past one another?
Passage: One plate slides horizontally past another. The best-known example is the earthquake-prone San Andreas Fault Zone of California, which marks the boundary between the Pacific and North America Plates. See: Teaching About Transform Plate Boundaries.
Answer: Highly relevant
###
Example 4:
Question: what foods should you stay away from if you have asthma
Passage: Get early and regular prenatal care. The first 8 weeks of your pregnancy are important to your baby's development. Early and regular prenatal care can boost your chances of having a safe pregnancy and a healthy baby. Prenatal care includes screenings, regular exams, pregnancy and childbirth education, and counseling and support.
Answer: Irrelevant
###
Example 5:
Question: what is lbm in body composition
Passage: They also measured the participantsâ€™ body fat and â€œlean body massâ€ â€“ a measure. Trusted Source. of muscle mass, obtained by subtracting the body fat weight from the total body weight.
Answer: Relevant
###
Example 6:
Question: {{ query_text }}
Passage: {{ unk_doc_text }}
Answer:"""

In [7]:
autoqrels_assessor = GradedMonoPrompt(
    backbone=f'google/{BACKBONE_MODEL}',
    prompt=PROMPT,
    dataset=None
)

## Step 5: Run the predictions and look at the outputs

In [8]:
with Environment().measure() as tracked_experiment:
    predictions = autoqrels_assessor.predict(input_data)

  0%|          | 0/2 [00:00<?, ?it/s]Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
100%|██████████| 2/2 [00:03<00:00,  1.84s/it]


In [9]:
predictions.head(2)

Unnamed: 0,id,query,unknown,probability_relevant
0,32d23068-7440-4891-9958-42325f98a604,who sings monk theme song,This is a reference to the minor controversy that brewed among Monk fans over the introduction of the new theme song It's A Jungle Out There written and performed by Randy Newman in the second season of Monk.,0.288885
1,cde83146-ac3e-4bc5-a959-f2006ac7b8de,who sings monk theme song,"Walker, Texas Ranger. Chuck Norris thought “Eyes of a Ranger” would be the perfect theme song for his new show Walker, Texas Ranger. He wanted his friend Randy Travis should sing it, but CBS had a different idea: The network suggested Norris sing the theme himself.",0.226886


## Step 6: Evaluate and Submit Your Run

We use the `wows_evaluate` method imported above to evaluate our predictions and to upload them, to TIRA.

The `wows_evaluate` method has optional parameters that you can pass to describe your system and to include the resource measurements used during your computations in the ir-metadata format into your submission. You can remove those attributes or modify them for your submission accordingly. Call `help(wows_evaluate)` to see a full description.

In [10]:
wows_evaluate(
    predictions,
    DATASET_ID,
    environment=tracked_experiment,
    upload=True,
    system_name=f'auto-qrels-pointwise-{BACKBONE_MODEL}',
    system_description="We use autoqrels [1] with a custom in-context learning prompt for pointwise relevance judgments.\n\n[1] - https://github.com/seanmacavaney/autoqrels",
)

Run uploaded to TIRA. Claim ownership via: https://www.tira.io/claim-submission/23ff9bbe-1b4a-4371-80d7-3035045b1f0c


Unnamed: 0,system,tau_ap,kendall,spearman,pearson
0,auto-qrels-pointwise-flan-t5-small,0.323889,0.380952,0.564286,0.564286


## Step 7: Register to TIRA and to the WOWS-EVAL task

To finalize your submission to WOWS-EVAL, you must now claim your submission via the URL printed above. For this, please register at [tira.io](https://www.tira.io) and navigate to the [WOWS-EVAL](https://www.tira.io/task-overview/wows-eval/) task and click on "Register". You can choose your team name from a list of [fictional](https://en.wikipedia.org/wiki/Category:Fictional_librarians) and [real](https://en.wikipedia.org/wiki/List_of_librarians) librarians ([please drop a message](#contact) if your favourite team name is not in the list).

For instance, after clicking on the ownership link above, you can claim ownership via a form in TIRA (where you can potentially overwrite the name and description of your system):


![example of claim ownership form](../figures/claim-run-in-tira.png)