# Fake News Detection with Machine Learning

Fake news is one of the biggest problems with online social media and even some news sites. Most of the time, we see a lot of fake news about politics. So using machine learning for fake news detection is a very challenging task. If you want to learn how to detect fake news using machine learning, this article is for you. In this article, I will walk you through the task of Fake News Detection with Machine Learning using Python.

# Fake News Detection


Fake news is one of the biggest problems because it leads to a lot of misinformation in a particular region. Most of the time, spreading false news about a community’s political and religious beliefs can lead to riots and violence as you must have seen in the country where you live. So, to detect fake news, we can find relationships between the fake news headlines so that we can train a machine learning model that can tell us whether a particular piece of information is fake or real by simply observing the headline in the news. So in the section below, I’m going to introduce you to a machine learning project on fake news detection using the Python programming language.

The dataset I am using here for the fake news detection task has data about the news title, news content, and a column known as label that shows whether the news is fake or real. So we can use this dataset to find relationships between fake and real news headlines to understand what type of headlines are in most fake news. So let’s import the necessary Python libraries and the dataset that we need for this task

# Importing Necessary Libraries

In [177]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer

%matplotlib inline

# Loading Data

In [178]:
data = pd.read_csv(r'data/fake_or_real_news.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [179]:
print(data.head())

   Unnamed: 0                                              title  \
0        8476                       You Can Smell Hillary’s Fear   
1       10294  Watch The Exact Moment Paul Ryan Committed Pol...   
2        3608        Kerry to go to Paris in gesture of sympathy   
3       10142  Bernie supporters on Twitter erupt in anger ag...   
4         875   The Battle of New York: Why This Primary Matters   

                                                text label  
0  Daniel Greenfield, a Shillman Journalism Fello...  FAKE  
1  Google Pinterest Digg Linkedin Reddit Stumbleu...  FAKE  
2  U.S. Secretary of State John F. Kerry said Mon...  REAL  
3  — Kaydee King (@KaydeeKing) November 9, 2016 T...  FAKE  
4  It's primary day in New York and front-runners...  REAL  


# Data Pre_Processing

In [180]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6335 entries, 0 to 6334
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  6335 non-null   int64 
 1   title       6335 non-null   object
 2   text        6335 non-null   object
 3   label       6335 non-null   object
dtypes: int64(1), object(3)
memory usage: 198.1+ KB


In [181]:
data.describe()

Unnamed: 0.1,Unnamed: 0
count,6335.0
mean,5280.415627
std,3038.503953
min,2.0
25%,2674.5
50%,5271.0
75%,7901.0
max,10557.0


In [182]:
print(data.describe())

         Unnamed: 0
count   6335.000000
mean    5280.415627
std     3038.503953
min        2.000000
25%     2674.500000
50%     5271.000000
75%     7901.000000
max    10557.000000


In [183]:
data.columns

Index(['Unnamed: 0', 'title', 'text', 'label'], dtype='object')

# Let's check is there any Null or Missing values

In [184]:
data.isnull()
# print(data.isnull())

Unnamed: 0.1,Unnamed: 0,title,text,label
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False
...,...,...,...,...
6330,False,False,False,False
6331,False,False,False,False
6332,False,False,False,False
6333,False,False,False,False


In [185]:
data.isnull().sum()

Unnamed: 0    0
title         0
text          0
label         0
dtype: int64

This dataset is very large and luckily it still has no missing values so without wasting any time let’s use the title column as the feature we need to train a machine learning model and the label column as the values we want to predict

# Feature Selections

In [186]:
x = np.array(data["title"])
y = np.array(data["label"])

cv = CountVectorizer()
x = cv.fit_transform(x)

# Spliting The Data

In [187]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

In [188]:
xtrain.shape, ytrain.shape

((5068, 10071), (5068,))

In [189]:
xtest.shape, ytest.shape

((1267, 10071), (1267,))

# Model Selection

In [190]:
from sklearn.naive_bayes import MultinomialNB

# Training a machine learning model

In [191]:
model = MultinomialNB()
model.fit(xtrain, ytrain)

# Model Score

In [192]:
print(model.score(xtest, ytest))

0.8074191002367798


In [193]:
from sklearn.metrics import mean_absolute_error
predict = model.predict(xtest)
model.score(xtest, predict)

1.0

Now let’s test this model. To test our trained model, I’ll first write down the title of any news item found on google news to see if our model predicts that the news is real or not

In [198]:

news_headline = "CA Exams 2023: Supreme Court asks BEB to extend opt-out option for July exams, final order tomorrow"
data = cv.transform([news_headline]).toarray()
print(model.predict(data))

['REAL']


Now I’m going to write a random fake news headline to see if the model predicts the news is fake or not

In [199]:
news_headline = "Cow dung can cure Corona Virus"
data = cv.transform([news_headline]).toarray()
print(model.predict(data))

['FAKE']


# Summary

So this is how we can train a machine learning model for the task of fake news detection by using the Python programming language. Fake news is one of the biggest problems because it leads to a lot of misinformation in a particular region. I hope you liked this article on the task of Fake News detection with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

# Sheikh Rasel Ahmed

#### Data Science || Machine Learning || Deep Learning || Artificial Intelligence Enthusiast

##### LinkedIn -  https://www.linkedin.com/in/shekhnirob1

##### GitHub - https://github.com/Rasel1435

##### YouTube - https://www.youtube.com/@codewithsheikhrasel

##### Facebook - https://www.facebook.com/rasel1435

##### Instagram - https://www.instagram.com/shekh_nirob