About the Dataset:

1. title: the title of a news article
2. text: body text of news article
3. date: publish date of news article
4. subject: subject of news article
5. label_new: a label that marks whether the news article is real or fake:
           1: Real news
           0: Fake News





Importing the Dependencies

In [1]:
import numpy as np
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [4]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
# printing the stopwords in English
print(stopwords.words('english'))

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an', 'and', 'any', 'are', 'aren', "aren't", 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between', 'both', 'but', 'by', 'can', 'couldn', "couldn't", 'd', 'did', 'didn', "didn't", 'do', 'does', 'doesn', "doesn't", 'doing', 'don', "don't", 'down', 'during', 'each', 'few', 'for', 'from', 'further', 'had', 'hadn', "hadn't", 'has', 'hasn', "hasn't", 'have', 'haven', "haven't", 'having', 'he', "he'd", "he'll", 'her', 'here', 'hers', 'herself', "he's", 'him', 'himself', 'his', 'how', 'i', "i'd", 'if', "i'll", "i'm", 'in', 'into', 'is', 'isn', "isn't", 'it', "it'd", "it'll", "it's", 'its', 'itself', "i've", 'just', 'll', 'm', 'ma', 'me', 'mightn', "mightn't", 'more', 'most', 'mustn', "mustn't", 'my', 'myself', 'needn', "needn't", 'no', 'nor', 'not', 'now', 'o', 'of', 'off', 'on', 'once', 'only', 'or', 'other', 'our', 'ours', 'ourselves', 'out', 'over', 'own', 're', 's', 'same', 'shan', "shan't", 'she

Data Pre-processing

In [6]:
# loading the dataset to a pandas DataFrame
fake_df = pd.read_csv("Fake.csv")
real_df = pd.read_csv("True.csv")

# Add labels
fake_df["label_new"] = 0  # Fake
real_df["label_new"] = 1  # Real

# Combine datasets
news_dataset = pd.concat([fake_df, real_df], ignore_index=True)
news_dataset = news_dataset.sample(frac=1, random_state=42).reset_index(drop=True)


In [7]:
news_dataset.shape


(44898, 5)

In [8]:
# print the first 5 rows of the dataframe
news_dataset.head()

Unnamed: 0,title,text,subject,date,label_new
0,Ben Stein Calls Out 9th Circuit Court: Committ...,"21st Century Wire says Ben Stein, reputable pr...",US_News,"February 13, 2017",0
1,Trump drops Steve Bannon from National Securit...,WASHINGTON (Reuters) - U.S. President Donald T...,politicsNews,"April 5, 2017",1
2,Puerto Rico expects U.S. to lift Jones Act shi...,(Reuters) - Puerto Rico Governor Ricardo Rosse...,politicsNews,"September 27, 2017",1
3,OOPS: Trump Just Accidentally Confirmed He Le...,"On Monday, Donald Trump once again embarrassed...",News,"May 22, 2017",0
4,Donald Trump heads for Scotland to reopen a go...,"GLASGOW, Scotland (Reuters) - Most U.S. presid...",politicsNews,"June 24, 2016",1


In [9]:
news_dataset.head(20)

Unnamed: 0,title,text,subject,date,label_new
0,Ben Stein Calls Out 9th Circuit Court: Committ...,"21st Century Wire says Ben Stein, reputable pr...",US_News,"February 13, 2017",0
1,Trump drops Steve Bannon from National Securit...,WASHINGTON (Reuters) - U.S. President Donald T...,politicsNews,"April 5, 2017",1
2,Puerto Rico expects U.S. to lift Jones Act shi...,(Reuters) - Puerto Rico Governor Ricardo Rosse...,politicsNews,"September 27, 2017",1
3,OOPS: Trump Just Accidentally Confirmed He Le...,"On Monday, Donald Trump once again embarrassed...",News,"May 22, 2017",0
4,Donald Trump heads for Scotland to reopen a go...,"GLASGOW, Scotland (Reuters) - Most U.S. presid...",politicsNews,"June 24, 2016",1
5,Paul Ryan Responds To Dem’s Sit-In On Gun Con...,"On Wednesday, Democrats took a powerful stance...",News,"June 22, 2016",0
6,AWESOME! DIAMOND AND SILK Rip Into The Press: ...,President Trump s rally in FL on Saturday was ...,Government News,"Feb 19, 2017",0
7,STAND UP AND CHEER! UKIP Party Leader SLAMS Ge...,He s been Europe s version of the outspoken Te...,left-news,"Mar 8, 2016",0
8,North Korea shows no sign it is serious about ...,WASHINGTON (Reuters) - The State Department sa...,worldnews,"December 13, 2017",1
9,Trump signals willingness to raise U.S. minimu...,(This version of the story corrects the figur...,politicsNews,"May 4, 2016",1


In [10]:
# counting the number of missing values in the dataset
news_dataset.isnull().sum()

Unnamed: 0,0
title,0
text,0
subject,0
date,0
label_new,0


In [11]:
# replacing the null values with empty string
news_dataset = news_dataset.fillna('')
news_dataset['title'] = news_dataset['title'].fillna('').astype(str)
news_dataset['text'] = news_dataset['text'].fillna('').astype(str)

In [12]:
# Use only first 50 words of text
news_dataset['short_text'] = news_dataset['text'].apply(lambda x: ' '.join(x.split()[:50]))


# Combine title + short text
news_dataset['content'] = news_dataset['title'] + ' ' + news_dataset['short_text']

In [13]:
print(news_dataset['content'])

0        Ben Stein Calls Out 9th Circuit Court: Committ...
1        Trump drops Steve Bannon from National Securit...
2        Puerto Rico expects U.S. to lift Jones Act shi...
3         OOPS: Trump Just Accidentally Confirmed He Le...
4        Donald Trump heads for Scotland to reopen a go...
                               ...                        
44893    UNREAL! CBS’S TED KOPPEL Tells Sean Hannity He...
44894    PM May seeks to ease Japan's Brexit fears duri...
44895    Merkel: Difficult German coalition talks can r...
44896     Trump Stole An Idea From North Korean Propaga...
44897    BREAKING: HILLARY CLINTON’S STATE DEPARTMENT G...
Name: content, Length: 44898, dtype: object


In [14]:
# separating the data & label
X = news_dataset.drop(columns='label_new', axis=1)
Y = news_dataset['label_new']

In [15]:
print(X)
print(Y)

                                                   title  \
0      Ben Stein Calls Out 9th Circuit Court: Committ...   
1      Trump drops Steve Bannon from National Securit...   
2      Puerto Rico expects U.S. to lift Jones Act shi...   
3       OOPS: Trump Just Accidentally Confirmed He Le...   
4      Donald Trump heads for Scotland to reopen a go...   
...                                                  ...   
44893  UNREAL! CBS’S TED KOPPEL Tells Sean Hannity He...   
44894  PM May seeks to ease Japan's Brexit fears duri...   
44895  Merkel: Difficult German coalition talks can r...   
44896   Trump Stole An Idea From North Korean Propaga...   
44897  BREAKING: HILLARY CLINTON’S STATE DEPARTMENT G...   

                                                    text       subject  \
0      21st Century Wire says Ben Stein, reputable pr...       US_News   
1      WASHINGTON (Reuters) - U.S. President Donald T...  politicsNews   
2      (Reuters) - Puerto Rico Governor Ricardo Rosse... 

Stemming:

Stemming is the process of reducing a word to its Root word

example:
actor, actress, acting --> act

In [16]:
port_stem = PorterStemmer()

In [17]:
def stemming(content):
    stemmed_content = re.sub('[^a-zA-Z]',' ',content)
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    stemmed_content = [port_stem.stem(word) for word in stemmed_content if not word in stopwords.words('english')]
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

In [18]:
news_dataset['content'] = news_dataset['content'].apply(stemming)

In [19]:
print(news_dataset['content'])

0        ben stein call th circuit court commit coup ta...
1        trump drop steve bannon nation secur council w...
2        puerto rico expect u lift jone act ship restri...
3        oop trump accident confirm leak isra intellig ...
4        donald trump head scotland reopen golf resort ...
                               ...                        
44893    unreal cb ted koppel tell sean hanniti bad ame...
44894    pm may seek eas japan brexit fear trade visit ...
44895    merkel difficult german coalit talk reach deal...
44896    trump stole idea north korean propaganda parod...
44897    break hillari clinton state depart gave russia...
Name: content, Length: 44898, dtype: object


In [20]:
# Remove rows with empty labels
news_dataset = news_dataset[news_dataset['label_new'] != '']

# Convert labels to integer type (from float)
news_dataset['label_new'] = news_dataset['label_new'].astype(float).astype(int)



#separating the data and label
X = news_dataset['content'].values
Y = news_dataset['label_new'].values

In [21]:
print(X)

['ben stein call th circuit court commit coup tat constitut st centuri wire say ben stein reput professor pepperdin univers also hollywood fame appear tv show film ferri bueller day made provoc statement judg jeanin pirro show recent discuss halt impos presid trump'
 'trump drop steve bannon nation secur council washington reuter u presid donald trump remov chief strategist steve bannon nation secur council wednesday revers controversi decis earli year give polit advis unpreced role secur discuss trump overhaul nsc confirm white hous offici also'
 'puerto rico expect u lift jone act ship restrict reuter puerto rico governor ricardo rossello said wednesday expect feder govern waiv jone act would lift restrict ship provid aid island devast hurrican maria said speak member congress parti'
 ...
 'merkel difficult german coalit talk reach deal berlin reuter chancellor angela merkel said german parti face difficult task bridg differ crunch coalit talk thursday believ reach agreement work tog

In [22]:
print(Y)

[0 1 1 ... 1 0 0]


In [23]:
Y.shape

(44898,)

In [24]:
# converting the textual data to numerical data
vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2), stop_words='english', max_df=0.8, min_df=5)
vectorizer.fit(X)

X = vectorizer.transform(X)

In [25]:
print(X)

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1388679 stored elements and shape (44898, 5000)>
  Coords	Values
  (0, 227)	0.11683770656043571
  (0, 402)	0.32091460056449556
  (0, 665)	0.12148320626589014
  (0, 666)	0.12925943267553275
  (0, 739)	0.18162344388202523
  (0, 740)	0.19409972395140732
  (0, 843)	0.13924773078575892
  (0, 907)	0.15000179109311226
  (0, 957)	0.17669234724791602
  (0, 961)	0.10582934954114702
  (0, 1045)	0.09716811028234813
  (0, 1200)	0.12255206055620033
  (0, 1502)	0.19894602338603748
  (0, 1564)	0.16332188687221136
  (0, 1813)	0.16344338772523725
  (0, 1917)	0.15258127341723907
  (0, 2020)	0.1518728780455529
  (0, 2175)	0.18449619040663495
  (0, 2231)	0.1250782916927333
  (0, 2232)	0.18620998416939408
  (0, 3240)	0.057514836424099554
  (0, 3261)	0.1151283974507787
  (0, 3321)	0.16839791591248102
  (0, 3351)	0.1903689263061843
  (0, 3448)	0.11877571679018496
  :	:
  (44896, 4830)	0.11354360996017186
  (44896, 4847)	0.1928998791549065
  (44897,

In [26]:
print(news_dataset['label_new'].value_counts())


label_new
0    23481
1    21417
Name: count, dtype: int64


In [27]:
print(set(Y))  # Show all unique label values

{np.int64(0), np.int64(1)}


Splitting the dataset to training & test data

In [28]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=42)

Training the Model: Logistic Regression

In [29]:
model = LogisticRegression(class_weight='balanced', max_iter=300)
model.fit(X_train, Y_train)


In [30]:
model.fit(X_train, Y_train)

Evaluation

accuracy score

In [31]:
# accuracy score on the training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [32]:
print('Accuracy score of the training data : ', training_data_accuracy)

Accuracy score of the training data :  0.9975221337490952


In [33]:
# accuracy score on the test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [34]:
print('Accuracy score of the test data : ', test_data_accuracy)

Accuracy score of the test data :  0.9961024498886414


Making a Predictive System

In [38]:
X_new = X_test[76]

prediction = model.predict(X_new)
print(prediction)

if (prediction[0]==1):
  print('The news is Real')
else:
  print('The news is Fake')


[0]
The news is Fake


In [37]:
print(Y_test[76])

0


In [39]:
!pip install gradio




In [43]:
import gradio as gr
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re

# Same stemmer used in training
port_stem = PorterStemmer()

# Preprocessing function (same as training)
def preprocess_input(title, text):
    # Use first 50 words of the text
    short_text = ' '.join(text.split()[:50])
    content = title + ' ' + short_text

    # Apply stemming & stopword removal
    content = re.sub('[^a-zA-Z]', ' ', content)
    content = content.lower().split()
    content = [port_stem.stem(word) for word in content if word not in stopwords.words('english')]
    return ' '.join(content)

# Prediction function
def fake_news_predict(title, text):
    processed_text = preprocess_input(title, text)
    vectorized_input = vectorizer.transform([processed_text])
    prediction = model.predict(vectorized_input)

    if prediction[0] == 1:
        return "📰 Real News"
    else:
        return "🚨 Fake News"

# Gradio UI: separate inputs for title and text
interface = gr.Interface(
    fn=fake_news_predict,
    inputs=[
        gr.Textbox(label="News Title", placeholder="Enter the headline..."),
        gr.Textbox(label="News Content", lines=10, placeholder="Paste the news article here...")
    ],
    outputs="text",
    title="🧠 Fake News Detector",
    description="Paste a news headline and article. This model predicts whether it's real or fake."
)

interface.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1fa1f3468765d3cd81.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


