The dataset I am using here for the fake news detection task has been taken from kaggle.
The dataset contains both fake and real news.

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

data = pd.read_csv("fake_or_real_news.csv")
print(data.head())

   Unnamed: 0                                              title  \
0        8476                       You Can Smell Hillary’s Fear   
1       10294  Watch The Exact Moment Paul Ryan Committed Pol...   
2        3608        Kerry to go to Paris in gesture of sympathy   
3       10142  Bernie supporters on Twitter erupt in anger ag...   
4         875   The Battle of New York: Why This Primary Matters   

                                                text label  
0  Daniel Greenfield, a Shillman Journalism Fello...  FAKE  
1  Google Pinterest Digg Linkedin Reddit Stumbleu...  FAKE  
2  U.S. Secretary of State John F. Kerry said Mon...  REAL  
3  — Kaydee King (@KaydeeKing) November 9, 2016 T...  FAKE  
4  It's primary day in New York and front-runners...  REAL  


This dataset has no missing values so let’s use the title column as the feature we need to train a machine learning model and the label column as the values we want to predict

In [3]:
x = np.array(data["title"])
y = np.array(data["label"])

cv = CountVectorizer()
x = cv.fit_transform(x)

Now let’s separate the dataset into training and testing sets, as it is a text classification problem we will use the Multinomial Naive Bayes algorithm to train the fake news detection mode.


In [4]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = MultinomialNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.8074191002367798


Now let’s test this model. To test our trained model, we will first write down the title of any news item found on google news to see if our model predicts that the news is real or not

In [5]:

news_headline = "CA Exams 2021: Supreme Court asks ICAI to extend opt-out option for July exams, final order tomorrow"
data = cv.transform([news_headline]).toarray()
print(model.predict(data))

['REAL']


Now let's write a random fake news headline to see if the model predicts the news is fake or not

In [6]:
news_headline = "Cow dung can cure Corona Virus"
data = cv.transform([news_headline]).toarray()
print(model.predict(data))

['FAKE']


So this is how we can train a machine learning model for the task of fake news detection.
 Fake news is one of the biggest problems because it leads to a lot of misinformation in a particular region.