## **Step-by-Step Plan for Fine-Tuned BERT Fake News Detector**

✅ Phase 1: Data Preparation  
We'll merge the two CSVs (Fake.csv and True.csv) and label them.

✅ Phase 2: Tokenization  
Use HuggingFace's BertTokenizer to convert text into tokens for the model.

✅ Phase 3: Dataset & Dataloader  
Use torch.utils.data.Dataset and DataLoader to prepare data for training.

✅ Phase 4: Model  
Load BertForSequenceClassification (binary classifier) and fine-tune it.

✅ Phase 5: Training Loop  
Set up the training process with AdamW, CrossEntropyLoss, and evaluation metrics.

✅ Phase 6: Save & Predict  
Save the model, and write the predict.py script to use it on any input text.

**Phase 1: Data Preparation**

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [2]:
df_fake = pd.read_csv('../data/Fake.csv')
df_true = pd.read_csv('../data/True.csv')

In [4]:
df_fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [5]:
df_fake['label'] = 0
df_true['label'] = 1

In [6]:
df = pd.concat([df_fake, df_true]).sample(frac=1).reset_index(drop=True)

In [8]:
df.head(10)

Unnamed: 0,title,text,subject,date,label
0,Germany's Steinmeier warns of EU disruptions i...,BERLIN (Reuters) - The European Union would no...,politicsNews,"May 12, 2016",1
1,FEDERAL JUDGE STEPS IN To Review Legroom On Co...,Firebrand conservative Ann Coulter exposed Del...,left-news,"Jul 31, 2017",0
2,CNN ANCHOR Loses It During Debate on NFL: ‘I’m...,CNN host Don Lemon and conservative commentato...,politics,"Oct 12, 2017",0
3,South Africa's Ramaphosa gets most nominations...,JOHANNESBURG (Reuters) - South African Deputy ...,worldnews,"December 4, 2017",1
4,REPORTER ASKS: Will Obama Golf Instead of Atte...,BRAVE GUY!,politics,"Feb 18, 2016",0
5,Trump's energy pick Perry softens stance on cl...,"WASHINGTON (Reuters) - Rick Perry, President-e...",politicsNews,"January 19, 2017",1
6,Pennsylvania glitches did not cause ballots to...,WASHINGTON (Reuters) - Election officials in t...,politicsNews,"November 8, 2016",1
7,FBI director nominee Wray earned $9.2 million ...,WASHINGTON (Reuters) - President Donald Trump’...,politicsNews,"July 10, 2017",1
8,WHY ANTI-TRUMP BILLIONAIRE MARK CUBAN Couldn’t...,"During the 2016 presidential election, Mark Cu...",left-news,"Oct 4, 2017",0
9,COMEDY GOLD ON DETROIT NEWS: “Willy” Dumps His...,Charlie LeDuff is legend in Detroit but this i...,Government News,"Jan 23, 2016",0


In [9]:
df = df[['text', 'label']]

In [10]:
df.head(10)

Unnamed: 0,text,label
0,BERLIN (Reuters) - The European Union would no...,1
1,Firebrand conservative Ann Coulter exposed Del...,0
2,CNN host Don Lemon and conservative commentato...,0
3,JOHANNESBURG (Reuters) - South African Deputy ...,1
4,BRAVE GUY!,0
5,"WASHINGTON (Reuters) - Rick Perry, President-e...",1
6,WASHINGTON (Reuters) - Election officials in t...,1
7,WASHINGTON (Reuters) - President Donald Trump’...,1
8,"During the 2016 presidential election, Mark Cu...",0
9,Charlie LeDuff is legend in Detroit but this i...,0


In [11]:
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df['text'].tolist(), df['label'].tolist(), test_size=0.2, stratify=df['label'], random_state=42
)