# 2024 COMP90042 Project
*Make sure you change the file name with your group id.*

# Readme
In this part, we need some json files that we prepared from previous evidence retrieval part. Please ensure that you put them into the right directory.
In details, they are:
1. "dev_concatenated_claim_evidences.json"
2. "test_concatenated_claim_evidences.json"
3. "pred_train_wrongly_pred_evidences.json"

**We use keras, tensorflow, nltk, scikit-learn in this project.**

# 1.DataSet Processing


Download the dataset from github

In [1]:
import os

# the repository link:
repository_url = 'https://github.com/drcarenhan/COMP90042_2024.git'

# clone the repository
os.system(f'git clone {repository_url}')


0

In [2]:
save_path = '/content/COMP90042_2024/data'
os.makedirs(save_path, exist_ok=True)

output_file_path = os.path.join(save_path, 'evidence.json')

!gdown --id '1JlUzRufknsHzKzvrEjgw8D3n_IRpjzo6' -O {output_file_path}

Downloading...
From (original): https://drive.google.com/uc?id=1JlUzRufknsHzKzvrEjgw8D3n_IRpjzo6
From (redirected): https://drive.google.com/uc?id=1JlUzRufknsHzKzvrEjgw8D3n_IRpjzo6&confirm=t&uuid=8669b27c-c7e5-45fc-93d4-ea64c695be49
To: /content/COMP90042_2024/data/evidence.json
100% 174M/174M [00:05<00:00, 32.6MB/s]


In [3]:
cd /content/COMP90042_2024/

/content/COMP90042_2024


## 1.1 PreProcess for evidence and claims

This code of stemming, lemmatizing and stopword removal are referred from tutorial.

In [1]:
import string
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download necessary NLTK data files
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('punkt')

# Initialize the lemmatizer and stopwords
lemmatizer = WordNetLemmatizer()
stopwords_set = set(stopwords.words('english'))

# Lemmatizer function
def lemmatize(word):
    lemma = lemmatizer.lemmatize(word, 'v')
    if lemma == word:
        lemma = lemmatizer.lemmatize(word, 'n')
    return lemma

# Text preprocessing function
def text_preprocessing(text):
    # Lowercasing
    text = text.lower()

    # Tokenizing
    words = word_tokenize(text)

    # Lemmatizing and removing stopwords
    new_words = [lemmatize(w) for w in words if w not in stopwords_set]

    return " ".join(new_words)


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### 1.1.1 read files

Auxilary functions for reading and pre-processing the data.

In [5]:
import json

def process_claims(claims, evidences_id_dict):
    """
    Process claims data to extract relevant information and map evidence IDs.

    Args:
    claims (dict): A dictionary of claims where each key is a claim ID and each value is a dictionary containing claim details.
    evidences_id_dict (dict): A dictionary mapping evidence IDs to indices for quick access.

    Returns:
    tuple: Contains lists of claim IDs, claim texts, preprocessed claim texts, associated evidence indices, and claim labels.
    """
    ids = []
    texts = []
    processed_texts = []
    evidences = []
    labels = []

    for claim_id, data in claims.items():
        ids.append(claim_id)
        texts.append(data["claim_text"])
        processed_texts.append(text_preprocessing(data["claim_text"]))
        labels.append(data.get("claim_label", None))  # Test data may not have labels.
        evidences.append([evidences_id_dict[i] for i in data.get("evidences", [])])

    return ids, texts, processed_texts, evidences, labels

def process_evidences(evidences):
    """
    Process evidences data to extract relevant information and create a mapping from evidence IDs to indices.

    Args:
    evidences (dict): A dictionary of evidences where each key is an evidence ID and the value is the evidence text.

    Returns:
    tuple: Contains lists of evidence IDs, evidence texts, preprocessed evidence texts, and a dictionary mapping IDs to indices.
    """
    ids = []
    texts = []
    processed_texts = []
    id_dict = {}

    for idx, (evidence_id, evidence_text) in enumerate(evidences.items()):
        ids.append(evidence_id)
        texts.append(evidence_text)
        processed_texts.append(text_preprocessing(evidence_text))
        id_dict[evidence_id] = idx

    return ids, texts, processed_texts, id_dict

Use the functions to read the data.

In [6]:
# Load data from files
with open(save_path+'/train-claims.json', 'r') as file:
    train_claims = json.load(file)

with open(save_path+'/evidence.json', 'r') as file:
    evidences = json.load(file)

with open(save_path+'/dev-claims.json', 'r') as file:
    dev_claims = json.load(file)

with open(save_path+'/test-claims-unlabelled.json', 'r') as file:
    test_claims = json.load(file)

# Process evidence data to prepare for linkage with claims
evidences_ids, evidences_texts, evidences_p_texts, evidences_id_dict = process_evidences(evidences)

# Process claims data for training, development, and test sets using the evidence dictionary
train_ids, train_claim_texts, train_p_claim_texts, train_evidences, train_labels = process_claims(train_claims, evidences_id_dict)
dev_ids, dev_claim_texts, dev_p_claim_texts, dev_evidences, dev_labels = process_claims(dev_claims, evidences_id_dict)
test_ids, test_claim_texts, test_p_claim_texts, _, _ = process_claims(test_claims, evidences_id_dict)

Read the data that we prepared in the evidence retrieval part.

In [7]:
import json

dev_concatenated_claim_evidences = json.load(open("dev_concatenated_claim_evidences.json", "r"))
test_concatenated_claim_evidences = json.load(open("test_concatenated_claim_evidences.json", "r"))

train_wrongly_pred_evidences = json.load(open("pred_train_wrongly_pred_evidences.json", "r"))

parse the files from previous evidence retrieval part to get the concatenated input and label for dev and test dataset.

In [8]:
dev_inputs = [i['text'] for i in dev_concatenated_claim_evidences]
test_inputs = [i['text'] for i in test_concatenated_claim_evidences]

### 1.1.2 Construct vocab and indexing

In [9]:
def build_vocabulary(texts, min_count=3):
    """
    Build a vocabulary from a list of texts, filtering words by a minimum count threshold.

    Args:
    texts (list of str): A list of sentences from which to build the vocabulary.
    min_count (int): Minimum occurrence threshold for words to be included in the vocabulary.

    Returns:
    tuple: Two dictionaries - idx2word (maps index to word) and word2idx (maps word to index).
    """
    # Initialize word count dictionary and predefined special tokens.
    wordcount = {}
    idx2word = ["<pad>", "<cls>", "<sep>", "<unk>"]
    word2idx = {"<pad>": 0, "<cls>": 1, "<sep>": 2, "<unk>": 3}

    # Count occurrences of each word in the texts.
    for text in texts:
        for word in text.split():
            wordcount[word] = wordcount.get(word, 0) + 1

    # Start indexing for new words from 4 since 0-3 are reserved for special tokens.
    idx = len(idx2word)

    # Include words in the vocabulary only if they meet the minimum count criteria.
    for word, count in wordcount.items():
        if count > min_count:
            idx2word.append(word)
            word2idx[word] = idx
            idx += 1

    return idx2word, word2idx

# Use the function to build the vocabulary from training and evidence texts.
idx2word, word2idx = build_vocabulary(train_claim_texts + evidences_texts, min_count=3)

In [10]:
def convert_to_indices(text_data, word2idx):
    """
    Convert a list of sentences into lists of indices based on a given word-to-index mapping.

    Args:
    text_data (list of str): A list of sentences to be converted.
    word2idx (dict): A dictionary mapping words to their corresponding indices.

    Returns:
    list of list of int: A list where each sentence is represented as a list of indices.
    """
    # Initialize the list that will store the converted sentences.
    idx_data = []

    # Iterate over each sentence in the input list.
    for text in text_data:
        # Convert each word in the sentence to its corresponding index.
        # If the word is not found in the dictionary, use the index for "<unk>".
        indices = [word2idx.get(word, word2idx["<unk>"]) for word in text.split()]

        # Append the list of indices to the main list.
        idx_data.append(indices)

    return idx_data

In [11]:
train_claim_text_idx = convert_to_indices(train_claim_texts, word2idx)
dev_claim_text_idx = convert_to_indices(dev_claim_texts, word2idx)
test_claim_text_idx = convert_to_indices(test_claim_texts, word2idx)
evidences_text_idx = convert_to_indices(evidences_texts, word2idx)

Set the length for padding and truncating and prepare the label for classification. The value of length is chosen by considering the statistics of length in previous evidence retrieval part.

Train - Average: 20.09771986970684, Median: 19.0,Max: 67


Dev - Average: 21.084415584415584, Median: 18.0,Max: 65


Test - Average: 20.03921568627451, Median: 19.0,Max: 53


Evidence - Average: 19.691925312720514, Median: 18.0,Max: 479

In [12]:
# set the length for our model input
# limit the claim to be at most 60
claim_max_len = 60
# limit the evidence to be at most 100
evidence_max_len = 100
# limit the max length for the concatenated claim and evidences
# claim length + evidence length * 5 + special tokens * 6
# 60 + 100*5 + 6 = 566
concatenated_max_len = 570
retrieval_num = 5

# prepare the label and corresponding index
# Transform the string label into index can make the traning more efficient
id2labels = ["SUPPORTS", "NOT_ENOUGH_INFO", "REFUTES", "DISPUTED"]
labels2id = {"SUPPORTS": 0, "NOT_ENOUGH_INFO": 1, "REFUTES": 2, "DISPUTED": 3}

Transform the string label into index number for convenient training.

In [13]:
dev_labels = [labels2id[i["label"]] for i in dev_concatenated_claim_evidences]
train_labels = [labels2id[i] for i in train_labels]

## 1.2 Construct the dataloader

In [17]:
import torch
from torch.utils.data import Dataset
import random

class TrainDataset(Dataset):
    """
    A PyTorch Dataset class that handles data involving claims and their associated evidences.

    Args:
    claim_data (list): List of claim text indices.
    evidence_data (list): List of evidence text indices.
    true_evidences (list): List of indices pointing to true evidence for each claim.
    wrongly_retrieved_evidences (list): List of indices of incorrectly retrieved evidences.
    label (list): List of labels corresponding to the claim data.
    cls_idx (int): Index used for <cls> token.
    sep_idx (int): Index used for <sep> token.
    pad_idx (int): Index used for <pad> token.
    evidence_num (int): Number of evidences to use per claim.
    """
    def __init__(self, claim_data, evidence_data, true_evidences, wrongly_retrieved_evidences, labels, cls_token, sep_token, pad_token, evidence_num=5):
        self.claim_data = claim_data
        self.evidence_data = evidence_data
        self.true_evidences = true_evidences
        self.wrongly_retrieved_evidences = wrongly_retrieved_evidences
        self.labels = labels
        self.cls_token = cls_token
        self.sep_token = sep_token
        self.pad_token = pad_token
        self.evidence_num = evidence_num

    def __len__(self):
        """Returns the number of items in the dataset."""
        return len(self.claim_data)

    def __getitem__(self, idx):
        """
        Returns:
        list: Data for a single training example including claim, evidences, and label.
        """
        return [self.claim_data[idx][:claim_max_len], self.true_evidences[idx], self.wrongly_retrieved_evidences[idx], self.labels[idx]]

    def collate_fn(self, batch):
        """
        Custom collate function to process the batch, used by DataLoader to prepare batches.

        Args:
        batch (list): List of elements returned by __getitem__.

        Returns:
        dict: Dictionary containing tensors of queries, positions, and labels.
        """
        queries, queries_pos, batch_labels = [], [], []

        for claim, true_evid, wrong_evid, label in batch:
            concatenated_text = self.construct_query_text(claim, true_evid, wrong_evid)
            queries.append(concatenated_text)
            queries_pos.append(list(range(len(concatenated_text))))
            batch_labels.append(label)

        return {
            "queries": torch.LongTensor(queries),
            "queries_pos": torch.LongTensor(queries_pos),
            "labels": torch.LongTensor(batch_labels)
        }

    def construct_query_text(self, claim, true_evid, wrong_evid):
        """
        Construct the full text for a query by concatenating claim and evidence texts with special tokens.

        Args:
        claim (list): List of indices for claim text.
        true_evid (list): Indices of true evidence.
        wrong_evid (list): Indices of wrong evidence.

        Returns:
        list: Combined list of indices including special tokens and padded to maximum length.
        """
        # The whole query start with the claim and cls token
        concatenated_text = [self.cls_token] + claim
        # compute the number of wrongly retrieved evidences needed to add to meet the evidence_num
        evidences_to_include = self.evidence_num - len(true_evid)

        all_evidences = true_evid + random.sample(wrong_evid, evidences_to_include)
        # concatenate the evidence to the claim, seperated by sep token
        for evid_idx in all_evidences:
            concatenated_text += [self.sep_token] + self.evidence_data[evid_idx][:evidence_max_len]

        concatenated_text.append(self.sep_token)

        # if the length of concatenated text is still shorter than the given max length, pad it with padding token
        if len(concatenated_text) < concatenated_max_len:
            concatenated_text.extend([self.pad_token] * (concatenated_max_len - len(concatenated_text)))

        return concatenated_text

In [18]:
from torch.utils.data import DataLoader
train_set = TrainDataset(train_claim_text_idx, evidences_text_idx, train_evidences, train_wrongly_pred_evidences, train_labels,
                         word2idx["<cls>"], word2idx["<sep>"], word2idx["<pad>"], evidence_num=retrieval_num)
dataloader = DataLoader(train_set, batch_size=10, shuffle=True, num_workers=1, collate_fn=train_set.collate_fn)

# 2.Model Implementation


Define our transformer based classifier. The code of this part is referred from workshops.

In [19]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class CLS(nn.Module):
    """
    A neural network module for classification tasks using a transformer encoder.

    Args:
    vocab_size (int): Size of the vocabulary.
    embed_dim (int): Dimensionality of the embeddings.
    hidden_size (int): Size of the hidden layer.
    output_size (int): The size of the output layer, which corresponds to the number of classes.
    nhead (int): Number of heads in the multi-head attention models of the transformer.
    num_layers (int): Number of transformer layers to stack.
    max_position (int): Maximum sequence length that can be processed by this model.
    """
    def __init__(self, vocab_size, embed_dim, hidden_size, output_size, nhead, num_layers, max_position):
        super(CLS, self).__init__()

        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.pos_embedding = nn.Embedding(max_position, embed_dim)

        # Initialize transformer encoder layer and the encoder itself
        encoder_layer = nn.TransformerEncoderLayer(d_model=hidden_size, nhead=nhead, batch_first=True)
        self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers, norm=nn.LayerNorm(hidden_size))

        # Hidden layer that reduces dimensionality from hidden_size to hidden_size // 2
        self.hidden_layer = nn.Linear(hidden_size, hidden_size // 2)
        # Output layer for classification
        self.pred_layer = nn.Linear(hidden_size // 2, output_size)
        # Dropout layer for regularization
        self.dropout = nn.Dropout(0.1)

    def forward(self, text_data, position_text):
        """
        Forward pass of the model.

        Args:
        text_data (Tensor): Indices of words in the batch, shape (batch_size, seq_length)
        position_text (Tensor): Positional indices corresponding to `text_data`.

        Returns:
        Tensor: Output from the final classification layer.
        """
        # Create mask for padding tokens
        mask_ = text_data == 0

        # Combine word embeddings with position embeddings scaled down by 0.01
        text_embeddings = self.embedding(text_data)
        position_embeddings = self.pos_embedding(position_text) * 0.01
        encoder_text_input = text_embeddings + position_embeddings

        # Apply the transformer encoder
        text_encoder_output = self.encoder(encoder_text_input, src_key_padding_mask=mask_)

        # Apply the first linear transformation and non-linearity
        encoder_output_cls = text_encoder_output[:, 0, :]  # we use the first token of encoder output for classification purposes
        hidden_output = F.relu(self.hidden_layer(encoder_output_cls))
        hidden_output = self.dropout(hidden_output)

        # Final classification layer
        prediction = self.pred_layer(hidden_output)
        return prediction

In [22]:
cls_model = CLS(vocab_size=len(idx2word), embed_dim=512, hidden_size=512, output_size=4, nhead=8, num_layers=5, max_position=700)
cls_model.cuda()

CLS(
  (embedding): Embedding(197728, 512)
  (pos_embedding): Embedding(700, 512)
  (encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-4): 5 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (linear1): Linear(in_features=512, out_features=2048, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=2048, out_features=512, bias=True)
        (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (hidden_layer): Linear(in_features=512, out_features=256, bias=True)
  (pred_layer): Linear(in_features=256, out_features=4, bias=True)
  (drop

### Training

### 2.1 Set the related parameters before training

In [23]:
import torch
import torch.optim as optim
import random
torch.manual_seed(90042)
torch.cuda.manual_seed_all(90042)
random.seed(90042)

encoder_optimizer = optim.Adam(cls_model.parameters())
max_lr = 1e-3
for param_group in encoder_optimizer.param_groups:
    param_group['lr'] = max_lr

save_dir = "model_ckpts"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

### 2.2 Define auxilary functions used in training

In [24]:
import torch

def validate(dev_input, dev_labels, cls_model):
    """
    Validate the classification model's performance on the development set.

    Args:
    dev_input (list of list of int): The input data for validation, where each item is a sequence of token indices.
    dev_output (list of int): The ground truth output labels for the validation data.
    cls_model (torch.nn.Module): The classification model to validate.

    Returns:
    float: The accuracy of the model on the development data.
    """
    cls_model.eval()  # Switch the model to evaluation mode
    batch_size = 50
    total_correct = 0
    total_count = len(dev_labels)

    # Process the dataset in batches
    for start_idx in range(0, total_count, batch_size):
        end_idx = min(start_idx + batch_size, total_count)
        batch_input = torch.LongTensor(dev_input[start_idx:end_idx]).cuda()
        batch_pos = torch.LongTensor([list(range(len(dev_input[0]))) for _ in range(end_idx - start_idx)]).cuda()

        # Perform prediction
        batch_output = cls_model(batch_input, batch_pos)
        # Pick the predicted label that has the highest probability among 4 labels
        predicted_labels = torch.argmax(batch_output, axis=1).cpu().tolist()

        # Compute accuracy for the batch
        total_correct += sum(1 for predicted, true in zip(predicted_labels, dev_labels[start_idx:end_idx]) if predicted == true)

        # Free CUDA memory
        del batch_input, batch_pos

    # Calculate total accuracy
    accuracy = total_correct / total_count
    print(f"\nClassification Accuracy: {accuracy:.3f}\n")

    cls_model.train()  # Switch back to training mode
    return accuracy

In [26]:
import os
import torch
import torch.nn as nn
from tqdm import tqdm

def train_model(cls_model, dataloader, epochs, encoder_optimizer, loss_function, grad_norm, accumulate_step, warmup_steps, max_lr, eval_interval, save_dir):
    """
    Train a classification model.

    Args:
    cls_model (torch.nn.Module): The classification model to be trained.
    dataloader (DataLoader): DataLoader for training data.
    encoder_optimizer (Optimizer): Optimizer for the model.
    loss_function: Used for computing loss
    grad_norm (float): Maximum norm for gradient clipping.
    accumulate_step (int): Number of steps to accumulate gradients before backward pass.
    warmup_steps (int): Number of steps for linear learning rate warmup.
    max_lr (float): Maximum learning rate after warmup.
    eval_interval (int): Interval for evaluating the model.
    save_dir (str): Directory to save the best model checkpoint.
    """
    step_cnt = 0
    all_step_cnt = 0
    avg_loss = 0
    maximum_f_score = 0

    for epoch in range(epochs):  # Iterate over epochs
        epoch_step = 0

        for i, batch in enumerate(tqdm(dataloader, desc="Training Epoch {}".format(epoch+1))):
            step_cnt += 1

            # Forward pass
            cur_res = cls_model(batch["queries"].cuda(), batch["queries_pos"].cuda())
            loss = loss_function(cur_res, batch["labels"].cuda()) / accumulate_step
            loss.backward()

            # Accumulate loss for reporting
            avg_loss += loss.item()

            # Parameter update
            if step_cnt == accumulate_step:
                if grad_norm > 0:
                    torch.nn.utils.clip_grad_norm_(cls_model.parameters(), grad_norm)

                encoder_optimizer.step()
                encoder_optimizer.zero_grad()
                step_cnt = 0
                epoch_step += 1
                all_step_cnt += 1

                # Adjust learning rate
                lr = adjust_learning_rate(encoder_optimizer, all_step_cnt, warmup_steps, max_lr)

                # Report training status
                if all_step_cnt % report_freq == 0:
                    print(f"Epoch: {epoch + 1}, Step: {epoch_step}, Avg Loss: {avg_loss / report_freq:.6f}, Learning Rate: {lr:.6f}")
                    avg_loss = 0  # Reset average loss

            # Free up memory
            del loss, cur_res

            # Periodic evaluation and checkpointing
            if all_step_cnt % eval_interval == 0 and all_step_cnt != 0:
                f_score = validate(dev_inputs, dev_labels, cls_model)  # Evaluate on development set
                # save the best checkpoint to avoid overfitting
                if f_score > maximum_f_score:
                    maximum_f_score = f_score
                    torch.save(cls_model.state_dict(), os.path.join(save_dir, "best_cls_ckpt.bin"))
                    print(f"New best F-score: {f_score:.4f} at Epoch: {epoch + 1}, Step: {epoch_step}")

def adjust_learning_rate(optimizer, step_cnt, warmup_steps, max_lr):
    """
    Adjusts learning rate based on step count.

    Args:
    optimizer (Optimizer): Optimizer whose learning rate needs adjustment.
    step_cnt (int): Current step count.
    warmup_steps (int): Steps to linearly increase learning rate.
    max_lr (float): Target learning rate post-warmup.

    Returns:
    float: Adjusted learning rate.
    """
    if step_cnt <= warmup_steps:
        lr = step_cnt * (max_lr - 2e-8) / warmup_steps + 2e-8
    else:
        lr = max_lr - (step_cnt - warmup_steps) * 1e-6

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

    return lr

Set some hyperparameters for training.

In [27]:
accumulate_step = 2
grad_norm = 5
warmup_steps = 250
report_freq = 15
eval_interval = 40
epochs = 5

We perform a statistics on the distribution of training dataset labels here.

In [28]:
from collections import Counter
print(Counter(train_labels))

Counter({0: 519, 1: 386, 2: 199, 3: 124})


So we can set the different weights for different labels when computing the crossentropy loss as the labels are imbalanced in the training dataset.


{'SUPPORTS': 519, 'NOT_ENOUGH_INFO': 386, 'REFUTES': 199, 'DISPUTED': 124}

In [36]:
# For more common label like 'SUPPORTS', we give it smaller weight as 0.2.
# For minority label like 'DISPUTED' and 'REFUTES', we give larger weight.
# These weights are tunable hyperparameters, too
labels_weights = torch.FloatTensor([0.3, 0.4, 0.7, 1.]).cuda()
loss_function = nn.CrossEntropyLoss(labels_weights)

## 2.3 Training start here!

In [37]:
train_model(cls_model, dataloader,epochs, encoder_optimizer, loss_function, grad_norm, accumulate_step, warmup_steps, max_lr, eval_interval, save_dir)

Training Epoch 1:  24%|██▍       | 30/123 [00:07<00:23,  3.90it/s]

Epoch: 1, Step: 15, Avg Loss: 1.408939, Learning Rate: 0.000060


Training Epoch 1:  49%|████▉     | 60/123 [00:15<00:16,  3.85it/s]

Epoch: 1, Step: 30, Avg Loss: 1.384404, Learning Rate: 0.000120


Training Epoch 1:  64%|██████▍   | 79/123 [00:20<00:11,  3.83it/s]


Classification Accuracy: 0.266



Training Epoch 1:  65%|██████▌   | 80/123 [00:28<01:52,  2.62s/it]

New best F-score: 0.2662 at Epoch: 1, Step: 40


Training Epoch 1:  66%|██████▌   | 81/123 [00:30<01:34,  2.25s/it]


Classification Accuracy: 0.266



Training Epoch 1:  73%|███████▎  | 90/123 [00:32<00:11,  2.91it/s]

Epoch: 1, Step: 45, Avg Loss: 1.413497, Learning Rate: 0.000180


Training Epoch 1:  98%|█████████▊| 120/123 [00:40<00:00,  3.76it/s]

Epoch: 1, Step: 60, Avg Loss: 1.389134, Learning Rate: 0.000240


Training Epoch 1: 100%|██████████| 123/123 [00:41<00:00,  2.98it/s]
Training Epoch 2:  22%|██▏       | 27/123 [00:07<00:25,  3.80it/s]

Epoch: 2, Step: 14, Avg Loss: 1.437560, Learning Rate: 0.000300


Training Epoch 2:  30%|███       | 37/123 [00:10<00:51,  1.67it/s]


Classification Accuracy: 0.175



Training Epoch 2:  31%|███       | 38/123 [00:12<01:09,  1.22it/s]


Classification Accuracy: 0.175



Training Epoch 2:  46%|████▋     | 57/123 [00:17<00:17,  3.79it/s]

Epoch: 2, Step: 29, Avg Loss: 1.389128, Learning Rate: 0.000360


Training Epoch 2:  71%|███████   | 87/123 [00:25<00:09,  3.78it/s]

Epoch: 2, Step: 44, Avg Loss: 1.335564, Learning Rate: 0.000420


Training Epoch 2:  94%|█████████▍| 116/123 [00:32<00:01,  3.76it/s]

Epoch: 2, Step: 59, Avg Loss: 1.472062, Learning Rate: 0.000480

Classification Accuracy: 0.442



Training Epoch 2:  95%|█████████▌| 117/123 [00:40<00:14,  2.50s/it]

New best F-score: 0.4416 at Epoch: 2, Step: 59


Training Epoch 2:  96%|█████████▌| 118/123 [00:41<00:10,  2.16s/it]


Classification Accuracy: 0.442



Training Epoch 2: 100%|██████████| 123/123 [00:43<00:00,  2.84it/s]
Training Epoch 3:  20%|█▉        | 24/123 [00:06<00:26,  3.78it/s]

Epoch: 3, Step: 12, Avg Loss: 1.405230, Learning Rate: 0.000540


Training Epoch 3:  44%|████▍     | 54/123 [00:14<00:18,  3.77it/s]

Epoch: 3, Step: 27, Avg Loss: 1.420614, Learning Rate: 0.000600


Training Epoch 3:  60%|██████    | 74/123 [00:20<00:29,  1.65it/s]


Classification Accuracy: 0.266



Training Epoch 3:  61%|██████    | 75/123 [00:22<00:39,  1.21it/s]


Classification Accuracy: 0.266



Training Epoch 3:  68%|██████▊   | 84/123 [00:24<00:11,  3.46it/s]

Epoch: 3, Step: 42, Avg Loss: 1.377736, Learning Rate: 0.000660


Training Epoch 3:  93%|█████████▎| 114/123 [00:32<00:02,  3.76it/s]

Epoch: 3, Step: 57, Avg Loss: 1.396883, Learning Rate: 0.000720


Training Epoch 3: 100%|██████████| 123/123 [00:35<00:00,  3.51it/s]
Training Epoch 4:  17%|█▋        | 21/123 [00:05<00:26,  3.79it/s]

Epoch: 4, Step: 11, Avg Loss: 1.377941, Learning Rate: 0.000780


Training Epoch 4:  25%|██▌       | 31/123 [00:09<00:54,  1.67it/s]


Classification Accuracy: 0.442



Training Epoch 4:  26%|██▌       | 32/123 [00:10<01:14,  1.23it/s]


Classification Accuracy: 0.442



Training Epoch 4:  41%|████▏     | 51/123 [00:15<00:19,  3.78it/s]

Epoch: 4, Step: 26, Avg Loss: 1.363693, Learning Rate: 0.000840


Training Epoch 4:  66%|██████▌   | 81/123 [00:23<00:11,  3.80it/s]

Epoch: 4, Step: 41, Avg Loss: 1.409509, Learning Rate: 0.000900


Training Epoch 4:  89%|████████▉ | 110/123 [00:31<00:03,  3.75it/s]

Epoch: 4, Step: 56, Avg Loss: 1.388116, Learning Rate: 0.000960


Training Epoch 4:  90%|█████████ | 111/123 [00:32<00:07,  1.68it/s]


Classification Accuracy: 0.442



Training Epoch 4:  91%|█████████ | 112/123 [00:34<00:08,  1.24it/s]


Classification Accuracy: 0.442



Training Epoch 4: 100%|██████████| 123/123 [00:36<00:00,  3.33it/s]
Training Epoch 5:  15%|█▍        | 18/123 [00:04<00:27,  3.81it/s]

Epoch: 5, Step: 9, Avg Loss: 1.381740, Learning Rate: 0.000995


Training Epoch 5:  39%|███▉      | 48/123 [00:12<00:19,  3.80it/s]

Epoch: 5, Step: 24, Avg Loss: 1.383381, Learning Rate: 0.000980


Training Epoch 5:  55%|█████▌    | 68/123 [00:19<00:32,  1.67it/s]


Classification Accuracy: 0.266



Training Epoch 5:  56%|█████▌    | 69/123 [00:20<00:43,  1.23it/s]


Classification Accuracy: 0.266



Training Epoch 5:  63%|██████▎   | 78/123 [00:22<00:12,  3.50it/s]

Epoch: 5, Step: 39, Avg Loss: 1.380678, Learning Rate: 0.000965


Training Epoch 5:  88%|████████▊ | 108/123 [00:30<00:03,  3.76it/s]

Epoch: 5, Step: 54, Avg Loss: 1.378711, Learning Rate: 0.000950


Training Epoch 5: 100%|██████████| 123/123 [00:34<00:00,  3.53it/s]


# 3.Testing and Evaluation

Define the function for prediction on the given input

In [44]:
import torch

def predict(input, cls_model):
    """
    Predict claim labels for a given set of concatenated inputs using the trained classification model.

    Args:
    input (list of list of int): The data inputs, where each input is a list of token indices.
    cls_model (torch.nn.Module): The classification model to use for prediction.

    Returns:
    list: A list of predicted class labels.
    """
    cls_model.eval()  # Set the model to evaluation mode to disable dropout, etc.
    batch_size = 75
    pos_len = len(input[0])  # Assume all inputs are the same length
    predictions = []

    # Iterate through the input in batches
    for start_idx in range(0, len(input), batch_size):
        end_idx = min(start_idx + batch_size, len(input))

        # Prepare batch data for model input
        batch_input = torch.LongTensor(input[start_idx:end_idx]).view(-1, pos_len).cuda()
        batch_pos = torch.LongTensor([list(range(pos_len)) for _ in range(end_idx - start_idx)]).cuda()

        # Get the model predictions for the current batch
        batch_res = cls_model(batch_input, batch_pos)
        predicted_labels = torch.argmax(batch_res, 1).tolist()  # pick out the label that has the highest probability

        # Collect all predictions
        predictions.extend(predicted_labels)

        # Clean up memory
        del batch_input, batch_pos

    return predictions

Do the prediction

In [45]:
import os
cls_model.load_state_dict(torch.load(os.path.join(save_dir, "best_cls_ckpt.bin")))

dev_predicted_labels = predict(dev_inputs, cls_model)
test_predicted_labels = predict(test_inputs, cls_model)

Store the predicted labels into the dictionary with the retrieved evidences together.

In [46]:
pred_dev_claims = json.load(open("pred_dev_claims_retrieval.json", "r"))
pred_test_claims = json.load(open("pred_test_claims_retrieval.json", "r"))

def update_claim_labels(claim_ids, predicted_labels, claims_dict, id_to_label):
    """
    Update claim dictionaries with predicted labels translated from label IDs.

    Args:
    claim_ids (list): List of claim identifiers.
    predicted_labels (list): List of predicted label IDs corresponding to the claim IDs.
    claims_dict (dict): Dictionary of claims where each key is a claim ID and the value is the claim data.
    id_to_label (dict): Dictionary mapping label IDs to human-readable labels.

    Effect:
    Modifies the claims_dict by adding a 'claim_label' field with the translated label.
    """
    for claim_id, label_id in zip(claim_ids, predicted_labels):
        if claim_id in claims_dict:
            claims_dict[claim_id]['claim_label'] = id_to_label[label_id]
        else:
            print(f"Warning: Claim ID {claim_id} not found in claims dictionary.")

update_claim_labels(dev_ids, dev_predicted_labels, pred_dev_claims, id2labels)
update_claim_labels(test_ids, test_predicted_labels, pred_test_claims, id2labels)

In [47]:
## save the final predicted test data for leaderboard submission
json.dump(pred_test_claims, open("test-output.json", "w"))

In [48]:
pred_test_claims

{'claim-2967': {'claim_text': 'The contribution of waste heat to the global climate is 0.028 W/m2.',
  'evidences': ['evidence-0',
   'evidence-3',
   'evidence-1',
   'evidence-2',
   'evidence-4'],
  'claim_label': 'SUPPORTS'},
 'claim-979': {'claim_text': '“Warm weather worsened the most recent five-year drought, which included the driest four-year period on record in terms of statewide precipitation.',
  'evidences': ['evidence-1',
   'evidence-2',
   'evidence-3',
   'evidence-4',
   'evidence-5'],
  'claim_label': 'SUPPORTS'},
 'claim-1609': {'claim_text': 'Greenland has only lost a tiny fraction of its ice mass.',
  'evidences': ['evidence-0',
   'evidence-1',
   'evidence-2',
   'evidence-3',
   'evidence-4'],
  'claim_label': 'SUPPORTS'},
 'claim-1020': {'claim_text': '“The global reef crisis does not necessarily mean extinction for coral species.',
  'evidences': ['evidence-0',
   'evidence-1',
   'evidence-2',
   'evidence-3',
   'evidence-4'],
  'claim_label': 'SUPPORTS'},


## Object Oriented Programming codes here

*You can use multiple code snippets. Just add more if needed*