# CheckThat Lab Task 2: Claims Extraction & Normalization (English)

In this task, you will be given a noisy, unstructured social media post, and your goal is to simplify it into a concise form, and normalize them into a structured format. 

Therefore, we aim to bridge this gap by decomposing social media posts into simpler, more comprehensible forms, which are referred to as normalized claims.

We will employ METEOR score for final evaluation.

For more information, please visit [CHECKTHAT! LAB TASK 2](https://checkthat.gitlab.io/clef2025/task2/)

## Installing together.ai 

To interact with the `together.ai` platform, you will need to install the necessary libraries. 

We will use pytorch to fine-tune the model.

You can install them using pip.

Run the below cell only once when you setup the environment first time.

In [6]:
#!pip install together --upgrade
#!pip install torch 
#!pip install nltk

In [7]:
from together import Together
import os
from typing import Dict, Any
import torch
import torch.nn as nn
import pandas as pd

## Extract the claims

Your task is to write a Python function `extract_claims` that takes a noisy, unstructured social media post as input and returns a dictionary containing the extracted claims. The function should utilize the `together.ai` platform to simplify the post using any model of your choice preferably Meta Llama 3.3 and extract the relevant information.


You need copy your API_KEY from together.ai and set the environment variable using 

$env:TOGETHER_API_KEY = "YOUR_API_KEY"

Instead you can also hard code it into the Together() function using Together(api_key="YOUR_API_KEY").

But hard coding a personal secret API key is not advisable and do so at your own risk.

# Data Pre-Processing

Load the data from the dataset file and print to inspect the format

Format data to appropriate format for fine-tuning purposes.

In [8]:
TRAIN_DATA = "./data/train/train-eng.csv"
DEV_DATA = "./data/dev/dev-eng.csv"
TEST_DATA = "./data/test/test-eng.csv"

train_df = pd.read_csv(TRAIN_DATA)
dev_df = pd.read_csv(DEV_DATA)
test_df = pd.read_csv(TEST_DATA)

train_data = train_df['post']
train_labels = train_df['normalized claim']

dev_data = dev_df['post']
dev_labels = dev_df['normalized claim']

test_data = test_df['post']

In [35]:
dev_df['post'].values

array(['The Karnofsky Jewish family, who immigrated to the United States from Lithuania, employed a 7-year-old boy and adopted (so to speak) him into their home.  He was originally given homework to get food because he was a starving kid.  He remained under the Jewish families employ, until he was 12  Karnofsky gave him money to buy his first instrument, which was a common instrument in Jewish families.  They really admired his musical talent.Later, when he became a professional … See More The Karnofsky Jewish family, who immigrated to the United States from Lithuania, employed a 7-year-old boy and adopted (so to speak) him into their home.  He was originally given homework to get food because he was a starving kid.  He remained under the Jewish families employ, until he was 12  Karnofsky gave him money to buy his first instrument, which was a common instrument in Jewish families.  They really admired his musical talent.Later, when he became a professional … See More The Karnofsky Jewi

In [None]:
import json

instruction = """You are a helpful AI assistant that extracts a claim made in the input text provided to you. 
Take a look at the below example and follow the similar style and format for the response.
Original post: ▪️Fake snow on texas..
Haarp used chemtrail snow.
#OperationDarkWinter ❄️
#WeatherModification 
▪️Fake snow on texas..
Haarp used chemtrail snow.
#OperationDarkWinter ❄️
#WeatherModification 
▪️Fake snow on texas..
Haarp used chemtrail snow.
#OperationDarkWinter ❄️
#WeatherModification None
Normalized claim: Video shows snow in Texas is fake"""
                
formatted_data = []

# Iterate through the DataFrame and create the desired format
for index, row in train_df.iterrows():
    entry = {
        'instruction': instruction,
        'input': row['post'],
        'output': row['normalized claim']
    }
    formatted_data.append(entry)

# Save the formatted data to a JSON file
with open('instruction_data_0.json', 'w', encoding='utf-8') as json_file:
    json.dump(formatted_data, json_file, indent=4, ensure_ascii=False)
    
from typing import List, Dict

def create_training_jsonl(input_data: List[Dict], output_file: str) -> None:
    """
    Create a properly formatted JSONL file for Together AI fine-tuning
    """
    train_examples = []
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in input_data:
            json_data = {
                "messages": [
                                {
                                    "content":item.get("instruction",""),
                                    "role": "system"
                                },
                                {
                                    "content": str(item.get("input", "")),
                                    "role": "user"
                                },
                                {
                                    "content": str(item.get("output", "")),
                                    "role": "assistant"
                                }
                            ]
            }
            train_examples.append(json_data)
            json_line = json.dumps(json_data, ensure_ascii=False)
            f.write(json_line + '\n')
        #json.dump(train_examples, f, ensure_ascii=False)

with open('instruction_data_0.json', 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

create_training_jsonl(data, 'instruction_data_0.jsonl')

import json

def validate_and_fix_jsonl(input_file, output_file):
    with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile:
        for line_number, line in enumerate(infile, 1):
            try:
                # Try to parse the JSON
                json_object = json.loads(line.strip())

                # Write the valid JSON line to the output file
                json.dump(json_object, outfile)
                outfile.write('\n')
            except json.JSONDecodeError as e:
                print(f"Error in line {line_number}: {e}")

# Use the function
validate_and_fix_jsonl('instruction_data_0.jsonl', 'fixed_instruction_data_0.jsonl')

In [None]:
from typing import List, Dict

def create_training_jsonl(input_data: List[Dict], output_file: str) -> None:
    """
    Create a properly formatted JSONL file for Together AI fine-tuning
    """
    train_examples = []
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in input_data:
            json_data = {
                "messages": [
                                {
                                    "content":item.get("instruction",""),
                                    "role": "system"
                                },
                                {
                                    "content": str(item.get("input", "")),
                                    "role": "user"
                                },
                                {
                                    "content": str(item.get("output", "")),
                                    "role": "assistant"
                                }
                            ]
            }
            train_examples.append(json_data)
            json_line = json.dumps(json_data, ensure_ascii=False)
            f.write(json_line + '\n')
        #json.dump(train_examples, f, ensure_ascii=False)

with open('instruction_data_0.json', 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

create_training_jsonl(data, 'instruction_data_0.jsonl')

In [29]:
import json

def validate_and_fix_jsonl(input_file, output_file):
    with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile:
        for line_number, line in enumerate(infile, 1):
            try:
                # Try to parse the JSON
                json_object = json.loads(line.strip())

                # Write the valid JSON line to the output file
                json.dump(json_object, outfile)
                outfile.write('\n')
            except json.JSONDecodeError as e:
                print(f"Error in line {line_number}: {e}")

# Use the function
validate_and_fix_jsonl('./data/finetune_data.jsonl', 'fixed_instruction_data_0.jsonl')


In [12]:
TOGETHER_API_KEY = os.getenv("TOGETHER_API_KEY")
WANDB_API_KEY = os.getenv("WANDB_API_KEY")
client = Together(api_key = TOGETHER_API_KEY)

## Upload and check the files for fine-tuning

In [34]:
file_resp = client.files.upload(file="./data/finetune_data.jsonl", check=True)
#print(file_resp.id)

Uploading file finetune_data.jsonl: 100%|██████████| 11.3M/11.3M [00:05<00:00, 1.98MB/s]


In [None]:
# List available models
models = client.models.list()
for model in models:
    #if hasattr(model., 'finetune'):
    print(f"Model: {model.display_name}, {model.pricing}")
#print(f"Models available for fine-tuning: {models}")

In [15]:
ft_resp = client.fine_tuning.create(
    training_file = "file-4a0981a6-2313-4453-84aa-0a1288235bfc",
    model = 'meta-llama/Llama-3.3-70B-Instruct-Reference',
    train_on_inputs= "auto",
    n_epochs = 3,
    wandb_api_key = os.getenv('WANDB_API_KEY'),
    lora = True,
    learning_rate = 1e-5,
    suffix = 'ZeroShot-FineTuned',
)

print(ft_resp.id)

message='Starting from together>=1.3.0, the default batch size is set to the maximum allowed value for each model.'
message='Starting from together>=1.3.0, the default batch size is set to the maximum allowed value for each model.'


ft-84a7748b


In [16]:
# The output model name
ft_resp.output_name

'Nikhil_Kadapala/Llama-3.3-70B-Instruct-Reference-ZeroShot-FineTuned-bf15554a'

In [19]:
from together import Together

client = Together(api_key=os.environ["TOGETHER_API_KEY"])

def cancel_finetune_job(job_id: str) -> dict:
    try:
        response = client.fine_tuning.cancel(id=job_id)
        return {
            "status": "success",
            "job_id": job_id,
            "final_state": response.get("status")
        }
    except Exception as e:
        return {
            "status": "error",
            "message": str(e)
        }

stats = cancel_finetune_job("ft-84a7748b")

print(stats)

{'status': 'error', 'message': 'Error code: 404 - {"message": "Invalid Cannot cancel job while in state cancelled param: status", "type_": "invalid_request_error", "param": "status", "code": ""}'}


In [21]:
from concurrent.futures import InvalidStateError

job_id = "ft-84a7748b"
job_status = client.fine_tuning.retrieve(job_id).status
if job_status in ["QUEUED", "RUNNING"]:
    print(f"Job {job_id} is currently {job_status}")
else:
    print(f"Job {job_id} is in terminal state {job_status}")


Job ft-84a7748b is in terminal state FinetuneJobStatus.STATUS_CANCELLED


In [23]:
def get_claims(model_name, input_data):
    response = client.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": input_data}],
        max_tokens=None,
        temperature=0.3,
        top_p=0.7,
        top_k=50,
        repetition_penalty=1,
        stop=["<|eot_id|>"],
        stream=False
    )
    return response


In [24]:
model = "Nikhil_Kadapala/Llama-3.2-1B-Instruct-ZeroShot-FineTuned-9836a5f1"
prompt = "Donald Trump is the greatest president of the modern era. There is no doubt about it."
response = get_claims(model, prompt)
print(get_claims("Nikhil_Kadapala/Llama-3.2-1B-Instruct-ZeroShot-FineTuned-9836a5f1", "Donald Trump is the greatest president of the modern era. There is no doubt about it."))

id='914872eed81742cf' object=<ObjectType.ChatCompletion: 'chat.completion'> created=1739991093 model='Nikhil_Kadapala/Llama-3.2-1B-Instruct-ZeroShot-FineTuned-9836a5f1' choices=[ChatCompletionChoicesData(index=0, logprobs=None, seed=13488802899849945000, finish_reason=<FinishReason.EOS: 'eos'>, message=ChatCompletionMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" \nDonald Trump's presidency was marked by significant events, policies, and controversies. While opinions about his presidency vary widely, it is undeniable that he was a polarizing figure who left a lasting impact on the United States.\n\nSome of the key events and policies of his presidency include:\n\n1. The 2016 presidential election: Trump's victory marked a significant shift in American politics, with his campaign focusing on issues such as immigration, trade, and national security.\n2. The travel ban: Trump issued a travel ban targeting predominantly Muslim countries, which sparked widespread protests and c

# EVALUATION METRICS

NLTK'S METEOR.


In [1]:
import nltk
#nltk.download('wordnet')
from nltk.translate.meteor_score import meteor_score
import numpy as np
def meteor_loss(prediction, ground_truth):
    return 1 - meteor_score([prediction], ground_truth)

In [2]:
from tqdm import tqdm
def evaluate_model(model_name, input_data, labels):
    predictions = []
    for data in tqdm(input_data):
        response = get_claims(model_name, data)
        predictions.append(response.choices[0].message.content)
    score = meteor_score([predictions], labels)
    return score

In [None]:
model = "Nikhil_Kadapala/Llama-3.2-1B-Instruct-ZeroShot-FineTuned-9836a5f1"
avg_score = evaluate_model(model, dev_data[0:10], dev_labels[0:10])
print(f"Average METEOR score on development set: {avg_score}")

## Training loop

The training loop should follow these steps:

1. Load the dataset of social media posts and their corresponding claims.

2. Extract the claims from the input using your model of choice.

3. Calculate the loss using the chosen metric (meteor).

4. Update the model parameters using the optimizer.

The training loop should continue for a specified number of epochs or until a certain stopping criterion is met.

In [None]:
from tqdm import tqdm

def train_model(model, optimizer, train_data, train_labels, criterion, epochs):
    for epoch in range(epochs):
        model.train()
        train_loss = 0
        for i in tqdm(range(len(train_data))):
            optimizer.zero_grad()
            output = model(train_data[i])
            loss = criterion(output.choices[0].delta.content, train_labels[i])
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        train_loss /= len(train_data)
        print(f"Epoch {epoch + 1}: Train Loss = {train_loss}")

In [None]:
model = "meta-llama/Llama-3.3-70B-Instruct-Turbo"
fine_tuned_1 =get_claims(model)
epochs = 3
lr = 0.001
optimizer = torch.optim.Adam(fine_tuned_1.parameters(), lr=lr)
loss_fn = meteor_loss()
train_model(fine_tuned_1, optimizer, train_data, train_labels, loss_fn, epochs)

In [None]:
model.eval()
dev_loss = 0
for i in range(len(dev_data)):
    output = model(dev_data[i])
    loss = meteor_loss(output.choices[0].delta.content, dev_labels[i])
    dev_loss += loss.item()
    dev_loss /= len(dev_data)
    print(f"Epoch {i + 1}: Dev Loss = {dev_loss}")

## Pipeline flow of the task2

The pipeline flow for the task 2 is as follows:
1. Data Preprocessing: Clean and preprocess the data by reformatting to the finetuning format for the model of your choice.
2. Fine-tuning: Upload the dataset file and create a new fine-tuning job 
3. Evaluation: Use the model to extract claims from the social media posts and calculate the average METEOR score on the development set.
4. Training: Train the fine-tuned model using the training loop if the performance is not satisfactory.
5. Inference: Use the model to extract claims from the social media posts from the test set.