# CheckThat Task 2

The goal here is to find a method to extract a claim from a passage of text. For example:


- **Passage**: Hydrate YOURSELF W After Waking Up Water 30 min Before a Meal DRINK Before Taking a Shower →→ Before Going to Bed at the correct time T A YE Helps activate internal organs Helps digestion Helps lower blood pressure Helps to avoid heart attack Health+ by Punjab Kesari

- **Claim**: Drinking water at specific times can have different health benefits


To evaluate our method, we will use the **METEOR** metric on the **CLEF2025** dataset.




## Data Acquisition

In [19]:
import os
import requests

def download(url, path=None):
    get_response = requests.get(url,stream=True)
    filename  = url.split("/")[-1].split("?")[0]


    if path is not None:
        filename = os.path.join(path, filename)

    if os.path.exists(filename):
        return

    with open(filename, 'wb') as f:
        for chunk in get_response.iter_content(chunk_size=1024):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

## METEOR Metric Implementation

In [20]:
import nltk

nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\jwilder\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [21]:
from nltk.translate import meteor
from nltk import word_tokenize

def evaluate_claim_extraction(text_passage: str, claim: str, precision: int = 4):
    return round(meteor([word_tokenize(text_passage)], word_tokenize(claim)), precision)

text_passage = "Hydrate YOURSELF W After Waking Up Water 30 min Before a Meal DRINK Before Taking a Shower →→ Before Going to Bed at the correct time T A YE Helps activate internal organs Helps digestion Helps lower blood pressure Helps to avoid heart attack Health+ by Punjab Kesari"
claim = "Drinking water at specific times can have different health benefits"
score = evaluate_claim_extraction(text_passage, claim)
print(score)

text_passage = "I enjoy eating soup"
claim = "I consume soup sometimes"
score = evaluate_claim_extraction(text_passage, claim)
print(score)

0.0566
0.25


In [22]:

train_url = "https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/raw/main/task2/data/train/train-eng.csv?inline=false"
test_url = "https://gitlab.com/checkthat_lab/clef2025-checkthat-lab/-/raw/main/task2/data/dev/dev-eng.csv?inline=false"

os.makedirs("data", exist_ok=True)
download(train_url, "data")
download(test_url, "data")


## Dataset

In [23]:
import csv
from torch.utils.data import Dataset
from typing import List, Tuple

class ClaimVerificationDataset(Dataset):
    def __init__(self, csv_path: str):
        self.csv_path = csv_path
        self.data = self.parse_csv(self.csv_path)

    def parse_csv(self, csv_path: str) -> List[Tuple[str, str]]:
        csv_data = []
        try:
            with open(csv_path, 'r', encoding="utf8") as file:
                csv_reader = csv.reader(file)
                next(csv_reader) # skip header
                
                for row in csv_reader:
                    csv_data.append({"text": row[0], "claim": row[1]})
            return csv_data
                    

        except FileNotFoundError:
            print(f"Error: File not found at '{csv_path}'")
        except Exception as e:
            print(f"An error occurred: {e}")
        
    def __len__(self) -> int:
        return len(self.data)
    
    def __getitem__(self, index: int) -> Tuple[str, str]:
        return self.data[index]


In [24]:
train_dataset = ClaimVerificationDataset(f"data/train-eng.csv")
test_dataset = ClaimVerificationDataset(f"data/dev-eng.csv")

In [25]:
print(f"Train dataset length: {len(train_dataset)}")
print(f"Test dataset length: {len(test_dataset)}")

Train dataset length: 11374
Test dataset length: 1171


# Method

First we will try some basic prompt engineering to see if we can condition a LLM to extract claims

In [26]:
from together import Together

def ask_llm(client: Together,
            model: str = "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
            messages=[{"role": "user", "content": "How are you?"}]):
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    return response.choices[0].message.content

In [27]:
client = Together()
prompt = "Take a look at the following article. It is your job to look at it and extract what is being claimed. Return to the user only the claim: Hydrate YOURSELF W After Waking Up Water 30 min Before a Meal DRINK Before Taking a Shower →→ Before Going to Bed at the correct time T A YE Helps activate internal organs Helps digestion Helps lower blood pressure Helps to avoid heart attack Health+ by Punjab Kesari"
messages = [{"role": "user", "content": prompt}]
output = ask_llm(client, messages=messages)
print(output)

Drinking water at specific times (after waking up, 30 minutes before a meal, before taking a shower, and before going to bed) helps to: 
1. Activate internal organs
2. Aid digestion
3. Lower blood pressure
4. Avoid heart attack.


In [28]:
prompt = "Hydrate YOURSELF W After Waking Up Water 30 min Before a Meal DRINK Before Taking a Shower →→ Before Going to Bed at the correct time T A YE Helps activate internal organs Helps digestion Helps lower blood pressure Helps to avoid heart attack Health+ by Punjab Kesari"

messages = [{"role": "system", "content": "You are an AI assistant designed to extract claims from a given passage of text. Keep it short and return the claim in the text."},
            {"role": "user", "content": train_dataset[0]["text"]},
            {"role": "assistant", "content": train_dataset[0]["claim"]},
            {"role": "user", "content": train_dataset[1]["text"]},
            {"role": "assistant", "content": train_dataset[1]["claim"]},
            {"role": "user", "content": train_dataset[2]["text"]},
            {"role": "assistant", "content": train_dataset[2]["claim"]},
            {"role": "user", "content": prompt}]

print(messages)

output = ask_llm(client, messages=messages)
print(output)

[{'role': 'system', 'content': 'You are an AI assistant designed to extract claims from a given passage of text. Keep it short and return the claim in the text.'}, {'role': 'user', 'content': 'Lieutenant Retired General Asif Mumtaz appointed as Chairman Pakistan Medical Commission PMC Lieutenant Retired General Asif Mumtaz appointed as Chairman Pakistan Medical Commission PMC Lieutenant Retired General Asif Mumtaz appointed as Chairman Pakistan Medical Commission PMC None'}, {'role': 'assistant', 'content': 'Pakistani government appoints former army general to head medical regulatory body'}, {'role': 'user', 'content': 'A priceless clip of 1970 of Bruce Lee playing Table Tennis with his Nan-chak !! His focus on speed A priceless clip of 1970 of Bruce Lee playing Table Tennis with his Nan-chak !! His focus on speed A priceless clip of 1970 of Bruce Lee playing Table Tennis with his Nan-chak !! His focus on speed None'}, {'role': 'assistant', 'content': 'Late actor and martial artist Bru

The following function will run evaluation on the whole test dataset

In [29]:
from tqdm import tqdm

def evaluate_llm_method(client: Together, limit: int = None):
    data = []
    scores = []

    counter = 0
    for entry in tqdm(test_dataset, desc="Evaluating LLM claim extraction"):
        prompt = entry["text"]

        messages = [
            {"role": "system", "content": "You are an AI assistant designed to extract claims from a given passage of text. Keep it short and return the claim in the text. Only return the big idea and exclude unneeded details."},
            {"role": "user", "content": train_dataset[0]["text"]},
            {"role": "assistant", "content": train_dataset[0]["claim"]},
            {"role": "user", "content": train_dataset[1]["text"]},
            {"role": "assistant", "content": train_dataset[1]["claim"]},
            {"role": "user", "content": train_dataset[2]["text"]},
            {"role": "assistant", "content": train_dataset[2]["claim"]},
            {"role": "user", "content": prompt}
        ]

        output = ask_llm(client, messages=messages)
        meteor_score = evaluate_claim_extraction(entry["claim"], output)

        data.append({
            "ground_truth_claim": entry["claim"],
            "generated_claim": output,
            "meteor_score": meteor_score
        })

        scores.append(meteor_score)

        counter += 1
        if limit and counter >= limit:
            break

    avg_score = sum(scores) / len(scores) if scores else 0

    return data, avg_score


In [31]:
data, avg_score = evaluate_llm_method(client)

Evaluating LLM claim extraction: 100%|██████████| 1171/1171 [2:07:51<00:00,  6.55s/it] 


Let's take a look at some of our generated claims vs the ground truth claims

In [18]:
for i in range(len(data)):
    print(f"Ground truth claim: {data[i]["ground_truth_claim"]}")
    print(f"Generated claim: {data[i]["generated_claim"]}")
    print(f"Score: {data[i]["meteor_score"]}")
    print("\n")

Ground truth claim: Photo shows Louis Armstrong as a child
Generated claim: A young Louis Armstrong was taken in by a Jewish family who supported his early musical talent.
Score: 0.3155


Ground truth claim: This leopard cub's mother was killed by a trophy hunter
Generated claim: Trophy hunting is cruel and should be banned.
Score: 0.0463


Ground truth claim: Videos show current situation of Hyderabad amid heavy rain
Generated claim: Crocodile alert issued in Hyderabad due to heavy rain.
Score: 0.2808


Ground truth claim: Joe Biden lives in a large estate bought on a senator's salary
Generated claim: Joe Biden's wealth is questioned given his reported senator salary.
Score: 0.3305


Ground truth claim: Photo shows August 26, 2021 explosion near Kabul airport
Generated claim: Explosion outside Kabul airport kills 40, injures 120, and US administration is being criticized for blaming the victims.
Score: 0.2843


Ground truth claim: White people own only 22 percent of South Africa’s lan

In [34]:
print(avg_score)

0.2516289496157131
