# 2025 COMP90042 Project
*Make sure you change the file name with your group id.*

# Readme
*If there is something to be noted for the marker, please mention here.*

*If you are planning to implement a program with Object Oriented Programming style, please put those the bottom of this ipynb file*

# 1.DataSet Processing
(You can add as many code blocks and text blocks as you need. However, YOU SHOULD NOT MODIFY the section title)

In [None]:
import json
import os
import pandas as pd
import torch

In [4]:

# load data
DATA_DIR = "./data"  
TRAIN_FILE = os.path.join(DATA_DIR, "train-claims.json")
DEV_FILE   = os.path.join(DATA_DIR, "dev-claims.json")
TEST_FILE  = os.path.join(DATA_DIR, "test-claims-unlabelled.json")
EVID_FILE  = os.path.join(DATA_DIR, "evidence.json")


def load_claims(path, labelled=True):
    with open(path, 'r', encoding='utf-8') as f:
        raw = json.load(f)
    records = []
    for cid, info in raw.items():
        rec = {
            "claim_id": cid,
            "claim_text": info.get("claim_text", ""),
        }
        if labelled:
            rec["label"] = info["claim_label"]
            rec["evid_ids"] = info["evidences"]
        records.append(rec)
    return pd.DataFrame(records)

def load_evidence(path):
    with open(path, 'r', encoding='utf-8') as f:
        raw = json.load(f)
    return pd.DataFrame([{"evid_id": k, "evid_text": v} for k, v in raw.items()])

# make them to dataframe
df_train = load_claims(TRAIN_FILE, labelled=True)
df_dev   = load_claims(DEV_FILE,   labelled=True)
df_test  = load_claims(TEST_FILE,  labelled=False)
df_evid  = load_evidence(EVID_FILE)

# 4. have a look!
print("Train size：", len(df_train))
print("Dev size：", len(df_dev))
print("Test  size：", len(df_test))
print("Evidence size：", len(df_evid))


display(df_train.head())
display(df_evid.head())

# 5. label count
print("Train label distribution：")
display(df_train["label"].value_counts())

print("Dev label distribution：")
display(df_dev["label"].value_counts())




Train size： 1228
Dev size： 154
Test  size： 153
Evidence size： 1208827


Unnamed: 0,claim_id,claim_text,label,evid_ids
0,claim-1937,Not only is there no scientific evidence that ...,DISPUTED,"[evidence-442946, evidence-1194317, evidence-1..."
1,claim-126,El Niño drove record highs in global temperatu...,REFUTES,"[evidence-338219, evidence-1127398]"
2,claim-2510,"In 1946, PDO switched to a cool phase.",SUPPORTS,"[evidence-530063, evidence-984887]"
3,claim-2021,Weather Channel co-founder John Coleman provid...,DISPUTED,"[evidence-1177431, evidence-782448, evidence-5..."
4,claim-2449,"""January 2008 capped a 12 month period of glob...",NOT_ENOUGH_INFO,"[evidence-1010750, evidence-91661, evidence-72..."


Unnamed: 0,evid_id,evid_text
0,evidence-0,"John Bennet Lawes, English entrepreneur and ag..."
1,evidence-1,Lindberg began his professional career at the ...
2,evidence-2,``Boston (Ladies of Cambridge)'' by Vampire We...
3,evidence-3,"Gerald Francis Goyer (born October 20, 1936) w..."
4,evidence-4,He detected abnormalities of oxytocinergic fun...


Train label distribution：


label
SUPPORTS           519
NOT_ENOUGH_INFO    386
REFUTES            199
DISPUTED           124
Name: count, dtype: int64

Dev label distribution：


label
SUPPORTS           68
NOT_ENOUGH_INFO    41
REFUTES            27
DISPUTED           18
Name: count, dtype: int64

# 2. Model Implementation
(You can add as many code blocks and text blocks as you need. However, YOU SHOULD NOT MODIFY the section title)

In [1]:
if torch.cuda.is_available():
    print("CUDA is available.")

    num_gpus = torch.cuda.device_count()
    print(f"Number of available GPUs: {num_gpus}")

    for i in range(num_gpus):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

    print(f"Current GPU device: {torch.cuda.current_device()}")
else:
    print("CUDA is not available. Running on CPU.")

CUDA is available.
Number of available GPUs: 1
GPU 0: NVIDIA GeForce RTX 4070 SUPER
Current GPU device: 0


# 3.Testing and Evaluation
(You can add as many code blocks and text blocks as you need. However, YOU SHOULD NOT MODIFY the section title)

## Object Oriented Programming codes here

*You can use multiple code snippets. Just add more if needed*