# BERT Model Training
## Explainable Fake News Detection using Transformer-based Language Models

**Student Name:** Abdullah  
**Reg Number:** M24F0044DS009  
**Course:** Natural Language Processing  

### Objective
The objective of this notebook is to fine-tune a pre-trained BERT model for binary fake news classification using labeled news text data.


In [14]:
# !pip install transformers torch scikit-learn tqdm pandas

In [15]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset

from transformers import (
    BertTokenizer,
    BertForSequenceClassification,
    Trainer,
    TrainingArguments
)

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

import warnings
warnings.filterwarnings("ignore")



In [16]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


Using device: cpu


In [17]:
train_df = pd.read_csv("../data/shorttextpreprocessedtrain.csv")
test_df = pd.read_csv("../data/shorttextpreprocessedtest.csv")

train_df.head()


Unnamed: 0,text,label
0,phil robertson of duck dynasty has endorsed do...,0
1,san francisco ap — apple penalized ceo tim coo...,1
2,if we use tax increment financing funding that...,0
3,roughly georgians or about percent of the stat...,1
4,stunned that did not concede if pulled that pe...,0


In [18]:
train_texts, val_texts, train_labels, val_labels = train_test_split(
    train_df['text'],
    train_df['label'],
    test_size=0.2,
    random_state=42,
    stratify=train_df['label']
)


In [19]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [20]:
train_encodings = tokenizer(
    list(train_texts),
    truncation=True,
    padding=True,
    max_length=128
)

val_encodings = tokenizer(
    list(val_texts),
    truncation=True,
    padding=True,
    max_length=128
)


In [21]:
class FakeNewsDataset(Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels.values

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item


In [22]:
train_dataset = FakeNewsDataset(train_encodings, train_labels)
val_dataset = FakeNewsDataset(val_encodings, val_labels)


In [23]:
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2
)

model.to(device)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [24]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = np.argmax(pred.predictions, axis=1)
    
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average='binary'
    )
    acc = accuracy_score(labels, preds)
    
    return {
        'accuracy': acc,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }


In [25]:
training_args = TrainingArguments(
    output_dir="../results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="../results/logs",
    load_best_model_at_end=True
)


In [26]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer
)


In [27]:
trainer.train()

 13%|█▎        | 500/3801 [58:53<5:32:11,  6.04s/it] 

{'loss': 0.2531, 'grad_norm': 5.093451499938965, 'learning_rate': 1.736911339121284e-05, 'epoch': 0.39}


 26%|██▋       | 1000/3801 [1:55:03<6:23:55,  8.22s/it]

{'loss': 0.2168, 'grad_norm': 2.3349204063415527, 'learning_rate': 1.473822678242568e-05, 'epoch': 0.79}


                                                       
 33%|███▎      | 1267/3801 [2:35:17<4:14:43,  6.03s/it]

{'eval_loss': 0.22040127217769623, 'eval_accuracy': 0.9001578531965272, 'eval_precision': 0.6874172185430464, 'eval_recall': 0.6577946768060836, 'eval_f1': 0.672279792746114, 'eval_runtime': 535.9446, 'eval_samples_per_second': 9.456, 'eval_steps_per_second': 0.591, 'epoch': 1.0}


 39%|███▉      | 1500/3801 [3:02:14<4:18:36,  6.74s/it]   

{'loss': 0.1925, 'grad_norm': 12.228032112121582, 'learning_rate': 1.2107340173638518e-05, 'epoch': 1.18}


 53%|█████▎    | 2000/3801 [4:08:09<4:56:11,  9.87s/it]

{'loss': 0.1576, 'grad_norm': 11.552817344665527, 'learning_rate': 9.476453564851356e-06, 'epoch': 1.58}


 66%|██████▌   | 2500/3801 [5:10:12<2:43:50,  7.56s/it]

{'loss': 0.1453, 'grad_norm': 11.414710998535156, 'learning_rate': 6.845566956064194e-06, 'epoch': 1.97}


                                                       
 67%|██████▋   | 2534/3801 [5:21:55<2:05:43,  5.95s/it]

{'eval_loss': 0.2715437710285187, 'eval_accuracy': 0.904301499605367, 'eval_precision': 0.6711711711711712, 'eval_recall': 0.7553865652724968, 'eval_f1': 0.7107930828861061, 'eval_runtime': 485.8606, 'eval_samples_per_second': 10.431, 'eval_steps_per_second': 0.652, 'epoch': 2.0}


 79%|███████▉  | 3000/3801 [6:23:39<1:16:46,  5.75s/it]  

{'loss': 0.1065, 'grad_norm': 0.38320186734199524, 'learning_rate': 4.214680347277033e-06, 'epoch': 2.37}


 92%|█████████▏| 3500/3801 [7:16:09<32:01,  6.38s/it]  

{'loss': 0.0912, 'grad_norm': 8.831109046936035, 'learning_rate': 1.5837937384898713e-06, 'epoch': 2.76}


                                                     
100%|██████████| 3801/3801 [7:54:13<00:00,  5.35s/it]

{'eval_loss': 0.3671300709247589, 'eval_accuracy': 0.8983820047355959, 'eval_precision': 0.6473118279569893, 'eval_recall': 0.7629911280101395, 'eval_f1': 0.7004072134962188, 'eval_runtime': 457.4326, 'eval_samples_per_second': 11.079, 'eval_steps_per_second': 0.693, 'epoch': 3.0}


100%|██████████| 3801/3801 [7:54:26<00:00,  7.49s/it]

{'train_runtime': 28466.4007, 'train_samples_per_second': 2.136, 'train_steps_per_second': 0.134, 'train_loss': 0.1594156592935363, 'epoch': 3.0}





TrainOutput(global_step=3801, training_loss=0.1594156592935363, metrics={'train_runtime': 28466.4007, 'train_samples_per_second': 2.136, 'train_steps_per_second': 0.134, 'total_flos': 3218388818049360.0, 'train_loss': 0.1594156592935363, 'epoch': 3.0})

In [28]:
trainer.save_model("../models/bert_fake_news_model")
tokenizer.save_pretrained("../models/bert_fake_news_model")


('../models/bert_fake_news_model\\tokenizer_config.json',
 '../models/bert_fake_news_model\\special_tokens_map.json',
 '../models/bert_fake_news_model\\vocab.txt',
 '../models/bert_fake_news_model\\added_tokens.json')

## Training Summary

- A pre-trained BERT-base model was successfully fine-tuned for fake news classification.
- The model learned contextual representations of news text.
- Validation metrics indicate effective learning.
- The trained model has been saved for further evaluation and explainability analysis.
