In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## `Importing Libraries`

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_recall_fscore_support

import torch
from torch.utils.data import Dataset, DataLoader
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments

### **Workflow for Fine-Tuning DistilBERT**


**Step 1: Load Dataset:**
  * Split into train and test set.

**Step 2: Tokenization using DistilBERT Tokenizer:**
  * Convert raw text into tokenized format using DistilBertTokenizer.
  * Tokenized text is stored as input_ids and attention_mask.

**Step 3: Create PyTorch Dataset:**
  * Convert tokenized text and labels into a custom PyTorch Dataset.
  * Use this dataset to create Dataloaders for training/testing.

**Step 4: Load Pretrained DistilBERT Model:**
  * Use DistilBertForSequenceClassification (pretrained on general text).
  * Modify the final classification layer for binary classification (real vs. fake news).

**Step 5: Defining Training Configuration & Hyperparameters:**
  * Setting LR, batch size, epochs, etc.

**Step 6: Train the Model:**
Using Hugging Face Trainer to:
  * Feed tokenized input into DistilBERT.
  * Compute loss & update weights using backpropagation.
  * Evaluate model on test set after each epoch.

**Step 7: Evaluate the Model:**
  * Compute Accuracy & F1 Score on test data.

**Step 8:  Save Model for Future Use:**
  * Save trained DistilBERT model & tokenizer for inference.


##  **``` Step 1: Loading Data ```**



In [None]:
df = pd.read_csv('/content/drive/MyDrive/Fake_News_Classification/Data/titles_text_combined.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,title,text,subject,date,label,title_length,text_length,full_text
0,2619,Ex-CIA head says Trump remarks on Russia inter...,Former CIA director John Brennan on Friday cri...,politicsNews,"July 22, 2017",1,67,2733,Ex-CIA head says Trump remarks on Russia inter...
1,16043,YOU WON’T BELIEVE HIS PUNISHMENT! HISPANIC STO...,How did this man come to OWN this store? There...,Government News,"Jun 19, 2017",0,121,2630,YOU WON’T BELIEVE HIS PUNISHMENT! HISPANIC STO...
2,876,Federal Reserve governor Powell's policy views...,President Donald Trump on Thursday tapped Fede...,politicsNews,"November 2, 2017",1,64,4052,Federal Reserve governor Powell's policy views...
3,19963,SCOUNDREL HILLARY SUPPORTER STARTS “TrumpLeaks...,Hillary Clinton ally David Brock is offering t...,left-news,"Sep 17, 2016",0,72,1131,SCOUNDREL HILLARY SUPPORTER STARTS “TrumpLeaks...
4,10783,NANCY PELOSI ARROGANTLY DISMISSES Questions on...,Pleading ignorance is a perfect ploy for Nancy...,politics,"May 26, 2017",0,104,1061,NANCY PELOSI ARROGANTLY DISMISSES Questions on...


In [None]:
#Chechking Dimensions of data (no. of rows, no. of cols)
print(f"Dataset shape: {df.shape}")

Dataset shape: (27209, 9)


In [None]:
#Checking Distribution of labels
df['label'].value_counts()   # 0: Fake, 1: Real

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,14422
0,12787


In [None]:
#Splitting Data into Training and Test Set
X = df['full_text'].tolist()
y = df['label'].tolist()

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

In [None]:
type(X_train)

list

In [None]:
print(f"Training Data Size: {len(X_train)}")
print(f"Testing Data Size: {len(X_test)}")
# Counting real and fake news
print(f"Real news articles in Training data: {y_train.count(1)}")
print(f"Fake news articles in Training data: {(y_train.count(0))}")

Training Data Size: 21767
Testing Data Size: 5442
Real news articles in Training data: 11537
Fake news articles in Training data: 10230


## **`Step 2: Tokenization using DistilBert Tokenizer`**

In [None]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [None]:
train_encodings = tokenizer(X_train, truncation=True, padding=True, max_length=512)
test_encodings = tokenizer(X_test, truncation=True, padding=True, max_length=512)

## **`Step 3: Create PyTorch Dataset Class`**

#### Understanding Dataset and DataLoader in PyTorch:
In PyTorch, handling data efficiently is essential for training deep learning models. PyTorch provides two key components for this:

1. torch.utils.data.Dataset:
  * A class used to represent your dataset.
  * Allows custom indexing (__getitem__) and length calculation (__len__).
  * Used for structured access to input data and labels.

2. torch.utils.data.DataLoader:
  * Helps batch and shuffle data.
  * Efficiently loads data in parallel using multiprocessing (via num_workers).
  * Converts a dataset into iterable batches for training.

In [None]:
class FakeNewsDataset(Dataset):
  def __init__(self, encodings, labels):
    self.encodings = {key: val for key, val in encodings.items() if key in ['input_ids', 'attention_mask']}
    self.labels = labels

  def __len__(self):
    """Returns the number of samples in the dataset."""
    return len(self.labels)

  def __getitem__(self, idx):
    """Returns tokenized inputs and labels for a given index."""
    item = {key: torch.tensor(val[idx], dtype=torch.long) for key, val in self.encodings.items()}
    item['labels'] = torch.tensor(self.labels[idx], dtype=torch.long)
    return item

In [None]:
train_dataset = FakeNewsDataset(train_encodings, y_train)
test_dataset =  FakeNewsDataset(test_encodings, y_test)

## **`Step 4: Load Pretrained DistilBERT Model`**

In [None]:
distilbert_model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## **`Step 5: Defining Training Configuration & Hyperparameters`**

In [None]:
training_args = TrainingArguments(
    output_dir = "/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbertResults",
    eval_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_dir="/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbertResultslogs",
    logging_steps=10,
    save_total_limit=2
)

## **`Step 6: Train the Model`**


In [None]:
def compute_metrics(pred):
  labels = pred.label_ids
  preds = pred.predictions.argmax(-1)

  accuracy = accuracy_score(labels, preds)
  precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average=None, zero_division=1)

  return {
        "accuracy": accuracy,
        "f1_fake": f1[0],            # F1-score for Fake News (Class 0)
        "f1_real": f1[1],             # F1-score for Real News (Class 1)
        "macro_f1": f1_score(labels, preds, average="macro")      # Macro-average F1
    }

In [None]:
trainer = Trainer(
    model = distilbert_model,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = test_dataset,
     compute_metrics = compute_metrics    # Custom function created above
)

trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mabhicodes03[0m ([33mabhicodes03-hugging-face[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy,F1 Fake,F1 Real,Macro F1
1,0.0626,0.005827,0.99853,0.998435,0.998614,0.998525
2,0.0,0.010314,0.998162,0.998048,0.998264,0.998156
3,0.0,0.00285,0.999449,0.999413,0.99948,0.999447


TrainOutput(global_step=8163, training_loss=0.01292714067262412, metrics={'train_runtime': 813.2556, 'train_samples_per_second': 80.296, 'train_steps_per_second': 10.037, 'total_flos': 8650253599635456.0, 'train_loss': 0.01292714067262412, 'epoch': 3.0})

### Observations:

1. Training Loss at Epoch 2 and 3 is 0.000000
* This suggests overfitting— our model is learning the training data too well, memorizing rather than generalizing.

2. Validation Loss Fluctuations:
* Loss slightly increases in Epoch 2, which might indicate early signs of overfitting. But since it decreases again in Epoch 3, it may still be stable.

3. Accuracy & F1 Scores (~99.9%):
* The model has high precision & recall for both "Fake" and "Real" news.
* Macro F1 (average of both class F1 scores) is near 0.999—indicating perfect balance in performance.

## **`Step 7: Evaluate the Model`**

In [None]:
eval_results = trainer.evaluate()
print(f"Test Accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"\nFake News F1 Score: {eval_results['eval_f1_fake']:.4f}")
print(f"\nReal News F1 Score: {eval_results['eval_f1_real']:.4f}")
print(f"\nMacro F1 Score: {eval_results['eval_macro_f1']:.4f}")

Test Accuracy: 0.9994

Fake News F1 Score: 0.9994

Real News F1 Score: 0.9995

Macro F1 Score: 0.9994


These scores indicate our model is almost perfectly classifying both fake and real news.

## **`Step 8: Save Model`**

In [None]:
distilbert_model.save_pretrained("/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_model")

tokenizer.save_pretrained("/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer")

('/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer/tokenizer_config.json',
 '/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer/special_tokens_map.json',
 '/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer/vocab.txt',
 '/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer/added_tokens.json')

#### **Loading Saved Models and Use for Prediction**

In [None]:
model_path = "/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_model"

tokenizer_path = "/content/drive/MyDrive/Fake_News_Classification/Saved_Models/distilbert_tokenizer"

In [None]:
# Loading the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained(tokenizer_path)

In [None]:
# Load the model
model = DistilBertForSequenceClassification.from_pretrained(model_path)
model.eval()

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [None]:
def predict_news_article(article_text):
    # Tokenize the input text
    inputs = tokenizer(article_text, truncation=True, padding=True, max_length=512, return_tensors="pt")

    # Forward pass through the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Get predicted class
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

    # Convert prediction to label
    label_map = {0: "Fake News", 1: "Real News"}
    return label_map[predicted_class]

In [None]:
sample_news = '''All England: How with long rallies and tall tosses, Malvika Bansod ousted World No 12 Yeo Jia Min
Malvika Bansod politely declined the offer of having legendary coach Irwansyah sit for her All England opening match against Singaporean Yeo Jia Min. The Indonesian perfectly understood, as Malvika and her regular coach for the last two years at Thane’s Shrikant Vad Academy, Vignesh Devlekar, had a plan to take down the World No 12.

The 23-year-old from Nagpur had lost to Jia Min previously and was coming off a first-round loss from Orleans. But pushing herself beyond limits of exhaustion, with both players utterly knackered by the end, Malvika recorded a stunning 21-13, 10-21, 21-17 victory to advance to the second round. Things were tricky at 11-9 in the decider, but Malvika did well to conserve energy, and mix her usually well-executed high tosses and lifts with attacking openings on cross shots as Jia Min tired out, fading off at the baseline.

Vignesh had won five national ranking titles in 2019 in doubles, and played Maldives Open, the only international trip he could afford on his parents’ salaries that year – mother a BMC teacher, and father a clerk in a PSU. “The lockdown ended my playing dreams and I had no funds anyway. But when coaching Malvika, I lean on my weaknesses – I never had a big attack or great physical strength. I was good at finding solutions,” Devlekar says. The long rallies and tall tosses that would strain Jia Min, the coach and athlete had analysed thoroughly.
In addition, Malvika dumped low serves in the large Birmingham hall for high serves -again pressuring Jia Min’s neck – and injected pace into her first stroke. The Indian was prepared for the long rallies, but had no intention of defending endlessly. She created openings, pouncing on the short lengths.
Malvika has compiled nearly a dozen thick diaries where she jots down details of fitness workouts she observes when at tournaments and national camps. “She must be the only player at this level (World No 28) who designs own fitness. We are desperately looking for an experienced trainer but we don’t have the funds for it yet. So, she plans it herself,” Vignesh notes.
Same shots, different paths
Malvika, a cerebral player, has spent last few months devising two strokes from the same position, guided by Vad and Vignesh, tweaking angles with wrist work, because a giant smash isn’t suddenly going to materialise. “We work within our limitations, but we are looking to get her stronger if we get a trainer,” he adds.

The engineering graduate who had beaten Olympic bronze medallist Gregoria Tunjung at China, has immersed herself even deeper into badminton, and shut out all noise that questions if she has a future when 23 already. “Qualifying for LA Olympics is the plan. We are working towards it,” Vignesh says, adding they hit it off, geeking out on the sport because he was in constant analysing/plotting mode as coach. A BWF Level 1 certified coach, he’s pursuing his MBA alongside, but the natural aptitude for coaching and the bond he struck with Malvika has brought good results.
All evenings see her work on fitness by herself. “She’s working hard like crazy. We don’t have funds for beyond the coach. Hopefully if we get results and few Top 10 wins, they will consider funding her,” Vignesh says.

'''

In [None]:
prediction = predict_news_article(sample_news)
print("Prediction:", prediction)

Prediction: Real News


In [None]:
sample_news2 = '''
Pune: Tailor tag helps police unravel murder after body found in gunny bag on Nira river bank
WITHIN HOURS after the body of a man was found in a gunny bag on the banks of Nira river in Bhor taluka
of Pune, a crucial lead has helped Pune Rural Police solve his murder. The tailor’s tag on his shirt
not just helped cops establish his identity, but also led them to the victim’s wife and her paramour,
who had allegedly killed him after he had uncovered their illicit affair.
On the morning of March 9 around 11 am, the police had found the body of an unidentified person
in a gunny bag on the banks of Nira river in Sarola village in Bhor taluka. The hands and legs
of the deceased were tied with pieces of cloth.“While we immediately launched a murder
 probe after the body was found, there were no apparent leads or evidence in the initial phase
 of the investigation. Important breakthrough in the case came when the tailor tag on the shirt
 of the deceased led our investigation team to a tailor in Dharashiv (earlier known as Osmanabad)
 district,” said Pankaj Deshmukh, Superintendent of Police, Pune rural.“The tag was of a tailor
 from Lohara taluka of Dharashiv district. The deceased had a tattoo on the back of his palm.
 With the help of these two clues, we established the identity of the deceased as Siddheshwar
  Bandu Bhise, 40, who hailed from Vadgaonwadi village of Lohara taluka of Dharashiv. We found
  out that Bhise worked as a tanker driver. He and his wife Yogita along with their seven-year-old
  had moved to Pune from Dharashiv two months ago for work. They were living in the Sasane Nagar
  area near Hadapsar,” said Deputy Superintendent of Police Tanaji Barde.Police investigation revealed that Yogita (30) had been having an affair with a man identified as Shivaji Baswant Sutar (32), a resident of Tuljapur taluka in Dharashiv, whom she knew from her childhood. On March 3, Sutar had come to meet Yogita. Bhise got to know about their affair when he got up from his sleep. The duo murdered Bhise by strangling him some time in the early hours of March 4. They tied his hands and legs with pieces of a sari. They stuffed the body in a gunny bag, probe has revealed.

“Sutar and Yogita carried the body over 50 kilometres to Sarola in Bhor from Bhise’s home in Sasane Nagar on a bike. When they carried the body, they also took along Yogita’s seven-year-old daughter. They dumped the body in the Nira river at Sarola. We have arrested Yogita and Sutar. They have been remanded to police custody till March 17,” said police inspector Rajesh Gawali, in-charge of Rajgad police station. Probe revealed that on March 9, Yogita had filed a missing complaint about her husband.

Along with charges of murder, police have also invoked charges of destruction of evidence against Yogita and Sutar. Police have also invoked Scheduled Caste and Scheduled Tribes (Prevention of Atrocities) Act since Bhide belonged to Matang community which is a Scheduled Caste and Sutar belongs to Lingayat community.
Deputy Superintendent of Police Tanaji Barde is probing the case further.

'''

In [None]:
prediction = predict_news_article(sample_news2)
print("Prediction:", prediction)

Prediction: Real News


In [None]:
sample_news3 = '''
As RSS sparks fresh Bharat vs India row, why it has always been Bharat for Sangh Parivar
On RSS general secretary Dattatreya Hosabale's pitch for Bharat, J&K CM Omar Abdullah says the
country is known by its three names — Bharat, India and Hindustan — and that citizens may call
it by any name that resonates with them.
The RSS has reignited the debate over the country’s name, insisting that it should be called Bharat.

“In English, it is India, but in the Indian language, it is ‘Bharat’… It is the ‘Constitution of India’, ‘Reserve Bank of India’. Why is it like this? Such a question should be raised. It should be rectified. If the country’s name is Bharat, it should only be called that way,” RSS general secretary Dattatreya Hosabale said while speaking at a function in Delhi.

Hosabale also referred to the G20 dinner invitation that termed the President as “President of Bharat” in September 2023.
Responding to a question on Hosabale’s remarks, Jammu and Kashmir Chief Minister and National Conference leader Omar Abdullah said Tuesday that this country is known by its three names — Bharat, India and Hindustan — and that citizens may call it by any name that resonates with them.
“We call it Bharat. We call it India. We call it Hindustan. We have three names. Whichever name resonates with you, you can call it that,” Abdullah told reporters outside J&K Assembly in Jammu.

“It is the ‘Constitution of India’ and the ‘Reserve Bank of India’. Why is that? That question should be asked. If the country’s name is Bharat, shouldn’t it have been called only that?” Abdullah asked. He pointed out that both ‘Bharat’ and ‘India’ are written on the Prime Minister’s plane, adding that “It is called the Indian Air Force and the Indian Army. But we also speak from the perspective of Bharat.”

In September 2023, when RSS chief Mohan Bhagwat urged people to refer to the country as “Bharat” and not “India”, he was drawing on a longstanding tradition of the Sangh Parivar that has used Bharat since before Independence. For the Sangh and the BJP, only Bharat has a linguistic and cultural context for Indians, not India.
As a political row had then erupted over a G20 dinner invitation from the government sent out in the name of the “President of Bharat”, BJP National Executive Committee member Anirban Ganguly termed it a “needless controversy” raked up the Congress. “Bharat is the natural name of India. It is not a question of the BJP’s ideology. All Indian languages call the country Bharat. Read Bangla literature and find out what it calls India. Both India and Bharat are part of the Constitution. We are giving primacy to Bharat because most people call it Bharat,” said Ganguly, the chairperson of the Dr Syama Prasad Mookerjee Research Foundation.

Ganguly said those protesting had been at the forefront of “distorting history and our civilisational identity”. He added, “The British gave many names. But subsequently, Ceylon became Sri Lanka and Burma became Myanmar. Has it caused any problems? The argument of our founding fathers is good, but should it then also not apply to the words secular and socialist as they were not sanctioned by them?”

An RSS leader, who is now with a government institution, also argued there should be no Bharat versus India debate. “There is an issue between Hindustan and Bharat. But there is no issue between India and Bharat. Everyone calls India Bharat in Hindi. In Marathi, Bengali, Gujarati and several other Indian languages it is called Bharat. Culturally, India has always been Bharat. India was a geographical term given by outsiders. Bharat also has such a beautiful meaning. Even in the Constitution, it should actually have been ‘Bharat that is India’,” he said.
The Sangh, one of whose larger political projects is ensuring an Indian culture shorn of British and Islamic influence, has used the term Bharat since its inception in 1925. Professor Rakesh Sinha writes in his book Builders of Modern India: KB Hedgewar (published by the Ministry of Information and Broadcasting) that Hedgewar, the founder of the RSS, said in Maharashtra’s Wardha in 1929, “The British government has promised to give independence to India on many occasions, but these have turned out to be false promises. It has now become amply clear that Bharat shall attain independence on her own strength.”
In his last speech before his death, Hedgewar said in Nagpur in 1940, “The golden day when all of Bharat will be Sanghified will certainly dawn. There will then be no power on earth that can dare cast malicious eyes on the Hindus.”

A senior RSS leader underlined that since all Sangh leaders speak in Hindi, they refer to India as Bharat. “The PM also speaks largely in Hindi, unless addressing an international audience. So, it is not surprising that they speak of Bharat and not India. Also, India is an odd term culturally. Have you heard anyone say India Mata ki Jai?”

The RSS has consistently used Bharat to denote India even in its English texts. The first-ever RSS resolution is instructive in this context. The 1950 resolution on the plight of Hindus who suffered during Partition referred to the “State of Bharat”, “Govt of Bharat”, and “citizens of Bharat”.

Two resolutions in 1953 — one of the Akhil Bharatiya Pratinidhi Sabha and the other of the Karyakarai Mandal — were titled “Movement for Complete Integration of Kashmir with Bharat” and “Bharat’s Pak policy vis-a-vis Kashmir”, respectively.
The RSS, in fact, has never used the word India in the heading of its resolutions. The first time India appears even in the text of these resolutions is in 1962 when the government is referred to as the Government of India. Subsequently, in the Sangh’s English texts India and Bharat were used interchangeably. But, there has never been any RSS resolution calling for India to be replaced with Bharat.

In that context, Bhagwat’s speech in Guwahati on September 1, 2023 has significance, especially since it was followed by the controversy over the invitations mentioning the “President of Bharat”.

A deep cultural meaning
Bharat has a deep cultural meaning for the RSS. In his book Bunch of Thoughts, the second sarsanghchalak of the RSS, M S Golwalkar, underlined this point while expounding the concept of the motherland.

“In fact, the very name ‘Bharat’ denotes that this is our mother. In our cultural tradition, the respectful way of calling a woman is by her child’s name. To call a lady as the wife of Mr. so-and-so or as Mrs. so-and-so is the Western way. We say, ‘She is Ramu’s mother.’ So also is the case with the name ‘Bharat’ for our motherland. Bharata is an elder brother of ours, born long long before us. He was a noble, virtuous and victorious king and a shining model of Hindu manhood. When a woman has more than one child, we call her by the name of her eldest or the most well-known among her children. Bharata was well known and this land was called as his mother, Bharat, the mother of all Hindus,” Golwalkar wrote.
In another context, he emphasised how Bharat appears even in the Vedas and the Puranas. He also spoke of Bharat being the land of Hindus. It is a running theme in his writings.

Talking about the “appeasement of Muslims” by independent India’s leadership, Golwalkar said in the book, “The first thing they preached was that our nationality could not be called Hindu, that even our land could not be called by its traditional name Hindusthan, as that would offend the Muslim. The name ‘India’ given by the British was accepted. Taking that name, the ‘new nation’ was called the ‘Indian Nation’.”

Notably, most associate organisations of the RSS have the word Bharatiya in their name, including the Akhil Bharatiya Vidyarthi Parishad, Bharatiya Mazdoor Sangh, and Bharatiya Kisan Sangh. In its press releases and statements, the RSS uses Bharat to denote India and translates terms such as “Akhil Bharatiya” to ‘All Bharat” and not “All India”.
Bhagwat emphasised in his Guwahati speech that Bharat had been an integral part of the nation’s identity since ancient times and should be embraced and popularised. “The name of our country has been Bharat for ages. Whatever may be the language, the name remains the same …Our country is Bharat, and we will have to stop using the word India and start using Bharat in all practical fields. Only then will change happen. We will have to call our country Bharat and explain it to others as well.”

– With PTI inputs

'''

In [None]:
prediction = predict_news_article(sample_news3)
print("Prediction:", prediction)

Prediction: Real News


Conclusion:

1. Predictions are correct.
2. Will try RoBERTa.
3. RoBERTa is a great choice for better generalization, as it is pre-trained on a larger corpus compared to BERT and DistilBERT.