In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier,RandomForestClassifier 
from sklearn.feature_extraction.text import TfidfVectorizer
import re
import string

import warnings
warnings.filterwarnings('ignore')

**Load Data**

In [2]:
fake_df = pd.read_csv('Fake.csv')
true_df = pd.read_csv('True.csv')

In [3]:
fake_df.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [4]:
true_df.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


**Adding a column "class" as target feature**

In [5]:
fake_df['class'] = 0
true_df['class'] = 1

In [6]:
fake_df.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

In [7]:
fake_df.shape, true_df.shape, 

((23481, 5), (21417, 5))

In [8]:
# Removing last 10 rows for manual testing
fake_manual_testing_df = fake_df.tail(10)
for i in range(23480,23470,-1):
    fake_df.drop([i],axis = 0, inplace = True)
    
true_manual_testing_df = true_df.tail(10)
for i in range(21416,21406,-1):
    true_df.drop([i],axis = 0, inplace = True)

In [9]:
fake_df.shape, true_df.shape

((23471, 5), (21407, 5))

In [10]:
fake_manual_testing_df['class'] = 0
true_manual_testing_df['class'] = 1

In [11]:
fake_manual_testing_df

Unnamed: 0,title,text,subject,date,class
23471,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
23472,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
23473,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
23474,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0
23475,Hillary Clinton: ‘Israel First’ (and no peace ...,Robert Fantina CounterpunchAlthough the United...,Middle-east,"January 18, 2016",0
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016",0
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016",0
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016",0
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016",0
23480,10 U.S. Navy Sailors Held by Iranian Military ...,21st Century Wire says As 21WIRE predicted in ...,Middle-east,"January 12, 2016",0


In [12]:
true_manual_testing_df

Unnamed: 0,title,text,subject,date,class
21407,"Mata Pires, owner of embattled Brazil builder ...","SAO PAULO (Reuters) - Cesar Mata Pires, the ow...",worldnews,"August 22, 2017",1
21408,"U.S., North Korea clash at U.N. forum over nuc...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21409,"U.S., North Korea clash at U.N. arms forum on ...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21410,Headless torso could belong to submarine journ...,COPENHAGEN (Reuters) - Danish police said on T...,worldnews,"August 22, 2017",1
21411,North Korea shipments to Syria chemical arms a...,UNITED NATIONS (Reuters) - Two North Korean sh...,worldnews,"August 21, 2017",1
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017",1
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017",1
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017",1
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017",1
21416,Indonesia to buy $1.14 billion worth of Russia...,JAKARTA (Reuters) - Indonesia will buy 11 Sukh...,worldnews,"August 22, 2017",1


In [13]:
manual_testing_df = pd.concat([fake_manual_testing_df,true_manual_testing_df],axis=0)
manual_testing_df.to_csv('manual_testing.csv')

**Merging True and Fake Dataframes**

In [14]:
merge_df = pd.concat([fake_df,true_df],axis=0)
merge_df.head()

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [15]:
merge_df.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

**Removing columns which are not required**

In [16]:
merge_new_df=merge_df.drop(['title','subject', 'date'],axis=1)

In [17]:
merge_new_df.shape

(44878, 2)

In [18]:
merge_new_df.isnull().sum()

text     0
class    0
dtype: int64

**Random Shuffling the dataframe**

In [19]:
merge_new_df = merge_new_df.sample(frac=1)

In [20]:
merge_new_df.head()

Unnamed: 0,text,class
10120,"SACRAMENTO, Calif. (Reuters) - A plan to raise...",1
9450,SINGAPORE (Reuters) - Donald Trump’s “isolatio...,1
17086,BRASILIA (Reuters) - A congressional committee...,1
19498,SEOUL (Reuters) - South Korea approved a plan ...,1
22976,21st Century Wire says Trump supporters gather...,0


In [21]:
merge_new_df.reset_index(inplace=True)

In [22]:
merge_new_df.head()

Unnamed: 0,index,text,class
0,10120,"SACRAMENTO, Calif. (Reuters) - A plan to raise...",1
1,9450,SINGAPORE (Reuters) - Donald Trump’s “isolatio...,1
2,17086,BRASILIA (Reuters) - A congressional committee...,1
3,19498,SEOUL (Reuters) - South Korea approved a plan ...,1
4,22976,21st Century Wire says Trump supporters gather...,0


In [23]:
merge_new_df.drop(['index'],axis=1,inplace=True)

In [24]:
merge_new_df.columns

Index(['text', 'class'], dtype='object')

In [25]:
merge_new_df.head()

Unnamed: 0,text,class
0,"SACRAMENTO, Calif. (Reuters) - A plan to raise...",1
1,SINGAPORE (Reuters) - Donald Trump’s “isolatio...,1
2,BRASILIA (Reuters) - A congressional committee...,1
3,SEOUL (Reuters) - South Korea approved a plan ...,1
4,21st Century Wire says Trump supporters gather...,0


**Creating a function to process the texts**

In [26]:
def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W", " ", text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text

In [27]:
merge_new_df['text'] = merge_new_df['text'].apply(wordopt)

In [28]:
#Defining dependent and independent variables
X = merge_new_df['text']
y = merge_new_df['class']

In [29]:
#Splitting Training and Testing
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25)

In [30]:
X_train.shape, X_test.shape

((33658,), (11220,))

In [31]:
#Convert text to vectors
vectorization = TfidfVectorizer() 
Xv_train = vectorization.fit_transform(X_train)
Xv_test = vectorization.transform(X_test)

In [32]:
#Logistic Regression
lr = LogisticRegression()
lr.fit(Xv_train,y_train)

LogisticRegression()

In [33]:
lr_pred = lr.predict(Xv_test)
lr.score(Xv_test,y_test)

0.9871657754010695

In [34]:
print(classification_report(y_test, lr_pred))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5888
           1       0.98      0.99      0.99      5332

    accuracy                           0.99     11220
   macro avg       0.99      0.99      0.99     11220
weighted avg       0.99      0.99      0.99     11220



In [35]:
#Decision Tree Classification
dt = DecisionTreeClassifier()
dt.fit(Xv_train,y_train)
dt_pred = dt.predict(Xv_test)
dt.score(Xv_test,y_test)


0.9963458110516934

In [36]:
print(classification_report(y_test, dt_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5888
           1       1.00      1.00      1.00      5332

    accuracy                           1.00     11220
   macro avg       1.00      1.00      1.00     11220
weighted avg       1.00      1.00      1.00     11220



In [37]:
#Gradient Boosting Classifier
gbc = GradientBoostingClassifier(random_state=0)
gbc.fit(Xv_train,y_train)
gbc_pred = gbc.predict(Xv_test)
gbc.score(Xv_test,y_test)

0.9957219251336898

In [38]:
print(classification_report(y_test,gbc_pred))

              precision    recall  f1-score   support

           0       1.00      0.99      1.00      5888
           1       0.99      1.00      1.00      5332

    accuracy                           1.00     11220
   macro avg       1.00      1.00      1.00     11220
weighted avg       1.00      1.00      1.00     11220



In [39]:
#Random Forest Classifier
rfc = RandomForestClassifier()
rfc.fit(Xv_train,y_train)
rfc_pred = rfc.predict(Xv_test)
rfc.score(Xv_test,y_test)

0.991711229946524

In [40]:
print(classification_report(y_test,rfc_pred))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5888
           1       0.99      0.99      0.99      5332

    accuracy                           0.99     11220
   macro avg       0.99      0.99      0.99     11220
weighted avg       0.99      0.99      0.99     11220



**Model Testing**

In [41]:
def output_label(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Not Fake News"

def manual_testing(news):
    testing_news = {"text":[news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test['text'] = new_def_test['text'].apply(wordopt)
    new_x_test = new_def_test['text'] 
    new_xv_test = vectorization.transform(new_x_test)
    lr_pred = lr.predict(new_xv_test)
    dt_pred = dt.predict(new_xv_test)
    gbc_pred = gbc.predict(new_xv_test)
    rfc_pred = rfc.predict(new_xv_test)
    
    return print("\n\nLR Prediction:{}\nDT Prediction:{}\nGBC Prediction:{}\nRFC Prediction:{} "
                 .format(output_label(lr_pred[0]),output_label(dt_pred[0]),output_label(gbc_pred[0]),output_label(rfc_pred[0])))

In [42]:
news = str(input())
manual_testing(news)

BRUSSELS (Reuters) - NATO allies on Tuesday welcomed President Donald Trump s decision to commit more forces to Afghanistan, as part of a new U.S. strategy he said would require more troops and funding from America s partners. Having run for the White House last year on a pledge to withdraw swiftly from Afghanistan, Trump reversed course on Monday and promised a stepped-up military campaign against  Taliban insurgents, saying:  Our troops will fight to win .  U.S. officials said he had signed off on plans to send about 4,000 more U.S. troops to add to the roughly 8,400 now deployed in Afghanistan. But his speech did not define benchmarks for successfully ending the war that began with the U.S.-led invasion of Afghanistan in 2001, and which he acknowledged had required an   extraordinary sacrifice of blood and treasure .  We will ask our NATO allies and global partners to support our new strategy, with additional troops and funding increases in line with our own. We are confident they w

In [43]:
news = str(input())
manual_testing(news)

Vic Bishop Waking TimesOur reality is carefully constructed by powerful corporate, political and special interest sources in order to covertly sway public opinion. Blatant lies are often televised regarding terrorism, food, war, health, etc. They are fashioned to sway public opinion and condition viewers to accept what have become destructive societal norms.The practice of manipulating and controlling public opinion with distorted media messages has become so common that there is a whole industry formed around this. The entire role of this brainwashing industry is to figure out how to spin information to journalists, similar to the lobbying of government. It is never really clear just how much truth the journalists receive because the news industry has become complacent. The messages that it presents are shaped by corporate powers who often spend millions on advertising with the six conglomerates that own 90% of the media:General Electric (GE), News-Corp, Disney, Viacom, Time Warner, a