# Fake News Detection using Python


> The dataset I am using here for the fake news detection task has data about the news title, news content, and a column known as label that shows whether the news is fake or real. So we use this dataset to find relationships between fake and real news headlines to understand what type of headlines area in most fake news.

In [1]:
# Import libraries

import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

In [2]:
# Get the dataset 

data = pd.read_csv('fake_or_real_news.csv')

In [3]:
data.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [6]:
data.shape

(6335, 4)

In [5]:
# check missing values
data.isna().sum()

Unnamed: 0    0
title         0
text          0
label         0
dtype: int64

### Let's use the title column as the feature and label column as the the values we want to predict

In [7]:
X = np.array(data['title'])
Y = np.array(data['label'])

cv = CountVectorizer()

X = cv.fit_transform(X)

## Train Test split

In [8]:
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.25,random_state=42)

## Multinomial Naive Bayes Algorithm

In [9]:
# Train the model

model = MultinomialNB()
model.fit(x_train,y_train)

MultinomialNB()

## Test the model


In [10]:
model_predict = model.predict(x_test)

In [11]:
model_score = model.score(x_test,y_test)

In [14]:
print(model_score*100)



80.68181818181817


## The model score is 80%

## Lets test the trained model

To test the trained model,  i will first write down the title of any news item found on google news to see if the model predicts that news is real or fake.

In [19]:

news_headline  = 'G7 and EU announce price cap on Russian diesel'

pred = cv.transform([news_headline]).toarray()

print(model.predict(pred))

['FAKE']


In [16]:
news_headline_1 = 'Over 120,000 Russian soldiers have died in Ukraine since the start of the war'

pred = cv.transform([news_headline_1]).toarray()

model.predict(pred)

array(['FAKE'], dtype='<U4')

In [20]:
breaking_news = 'Kerry to go to Paris in gesture of sympathy	'

pred = cv.transform([breaking_news]).toarray()
print(model.predict(pred))

['REAL']
