Importing Libraries

In [283]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier

import re
import string

Importing Datasets

In [284]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [285]:
fake = pd.read_csv("/content/drive/MyDrive/Brainwave_Matrix_Intern/datasets/Fake.csv")
true = pd.read_csv("/content/drive/MyDrive/Brainwave_Matrix_Intern/datasets/True.csv")

In [286]:
true.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [287]:
fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [288]:
true["label"]=1
fake["label"]=0

In [289]:
true.head()

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",1


Combining Datasets

In [290]:
fake.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [291]:
news = pd.concat([fake, true], axis=0)

In [292]:
news.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [293]:
news.tail()

Unnamed: 0,title,text,subject,date,label
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017",1
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017",1
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017",1
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017",1
21416,Indonesia to buy $1.14 billion worth of Russia...,JAKARTA (Reuters) - Indonesia will buy 11 Sukh...,worldnews,"August 22, 2017",1


In [294]:
news.isnull().sum()

title      0
text       0
subject    0
date       0
label      0
dtype: int64

Dropping Unnecessaries Columns

In [295]:
news = news.drop(['title','subject','date'], axis=1)

In [296]:
news.head()

Unnamed: 0,text,label
0,Donald Trump just couldn t wish all Americans ...,0
1,House Intelligence Committee Chairman Devin Nu...,0
2,"On Friday, it was revealed that former Milwauk...",0
3,"On Christmas day, Donald Trump announced that ...",0
4,Pope Francis used his annual Christmas Day mes...,0


Shuffling the rows


In [297]:
news = news.sample(frac=1)

In [298]:
news.head()

Unnamed: 0,text,label
22479,Tune in to the Alternate Current Radio Network...,0
21918,"In case you missed it Sen. Harry Reid (R-NV), ...",0
4218,TOKYO (Reuters) - Japanese Deputy Prime Minist...,1
17595,LUXEMBOURG (Reuters) - European Commission Pre...,1
10782,NEW YORK (Reuters) - Puerto Rico will be the s...,1


Indexing

In [299]:
news.reset_index(inplace=True)

In [300]:
news.head()

Unnamed: 0,index,text,label
0,22479,Tune in to the Alternate Current Radio Network...,0
1,21918,"In case you missed it Sen. Harry Reid (R-NV), ...",0
2,4218,TOKYO (Reuters) - Japanese Deputy Prime Minist...,1
3,17595,LUXEMBOURG (Reuters) - European Commission Pre...,1
4,10782,NEW YORK (Reuters) - Puerto Rico will be the s...,1


Dropping the 'Index' column

In [301]:
news.drop(['index'], axis=1, inplace=True)

In [302]:
news.head()

Unnamed: 0,text,label
0,Tune in to the Alternate Current Radio Network...,0
1,"In case you missed it Sen. Harry Reid (R-NV), ...",0
2,TOKYO (Reuters) - Japanese Deputy Prime Minist...,1
3,LUXEMBOURG (Reuters) - European Commission Pre...,1
4,NEW YORK (Reuters) - Puerto Rico will be the s...,1


Removing URLs, HTML tags, punctuation, digits and newline characters from the text

In [303]:
def wordopt(text):
    text = text.lower()
    text = re.sub(r'https?://\S+|www\.\S+','',text)
    text = re.sub(r'<.*?>','',text)
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\d','',text)
    text = re.sub(r'\n','',text)

    return text

In [304]:
news['text'] = news['text'].apply(wordopt)

In [305]:
news['text']

0        tune in to the alternate current radio network...
1        in case you missed it sen harry reid rnv who a...
2        tokyo reuters  japanese deputy prime minister ...
3        luxembourg reuters  european commission presid...
4        new york reuters  puerto rico will be the subj...
                               ...                        
44893    washington reuters  presumptive republican us ...
44894    new york reuters  the probability that republi...
44895    on saturday green party candidate jill stein w...
44896    washington reuters  defeated democratic presid...
44897    apparently being the wife of a former democrat...
Name: text, Length: 44898, dtype: object

In [306]:
x = news['text']
y = news['label']

In [307]:
x

0        tune in to the alternate current radio network...
1        in case you missed it sen harry reid rnv who a...
2        tokyo reuters  japanese deputy prime minister ...
3        luxembourg reuters  european commission presid...
4        new york reuters  puerto rico will be the subj...
                               ...                        
44893    washington reuters  presumptive republican us ...
44894    new york reuters  the probability that republi...
44895    on saturday green party candidate jill stein w...
44896    washington reuters  defeated democratic presid...
44897    apparently being the wife of a former democrat...
Name: text, Length: 44898, dtype: object

In [308]:
y

0        0
1        0
2        1
3        1
4        1
        ..
44893    1
44894    1
44895    0
44896    1
44897    0
Name: label, Length: 44898, dtype: int64

Traning and Testing the data

In [309]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = .3)

In [310]:
x_train.shape

(31428,)

In [311]:
x_test.shape

(13470,)

In [312]:
vectorization = TfidfVectorizer()

In [313]:
xv_train = vectorization.fit_transform(x_train)

In [314]:
xv_test = vectorization.transform(x_test)

In [315]:
xv_train

<31428x175170 sparse matrix of type '<class 'numpy.float64'>'
	with 6476266 stored elements in Compressed Sparse Row format>

In [316]:
xv_test

<13470x175170 sparse matrix of type '<class 'numpy.float64'>'
	with 2704408 stored elements in Compressed Sparse Row format>

Logistic Regression

In [317]:
LR = LogisticRegression()

In [318]:
LR.fit(xv_train, y_train)

In [319]:
pred_lr = LR.predict(xv_test)

In [320]:
LR.score(xv_test, y_test)

0.9872308834446919

In [321]:
print(classification_report(y_test,pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6926
           1       0.99      0.99      0.99      6544

    accuracy                           0.99     13470
   macro avg       0.99      0.99      0.99     13470
weighted avg       0.99      0.99      0.99     13470



Decision Tree Classifier

In [322]:
DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)

In [323]:
pred_dt = DT.predict(xv_test)

In [324]:
DT.score(xv_test, y_test)

0.9971046770601336

In [325]:
print(classification_report(y_test, pred_dt))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6926
           1       1.00      1.00      1.00      6544

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



Gradient Boosting Classifier

In [326]:
GB = GradientBoostingClassifier(random_state = 0)
GB.fit(xv_train, y_train)

In [327]:
pred_gb = GB.predict(xv_test)

In [328]:
GB.score(xv_test, y_test)

0.9968819599109131

In [329]:
print(classification_report(y_test, pred_gb))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6926
           1       1.00      1.00      1.00      6544

    accuracy                           1.00     13470
   macro avg       1.00      1.00      1.00     13470
weighted avg       1.00      1.00      1.00     13470



Random Forest Classifier

In [330]:
RF = RandomForestClassifier(random_state = 0)
RF.fit(xv_train, y_train)

In [331]:
pred_rf = RF.predict(xv_test)

In [332]:
RF.score(xv_test, y_test)

0.9873793615441723

In [333]:
print(classification_report(y_test, pred_rf))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6926
           1       0.99      0.99      0.99      6544

    accuracy                           0.99     13470
   macro avg       0.99      0.99      0.99     13470
weighted avg       0.99      0.99      0.99     13470



Function for Testing the input news

In [337]:
def output_label(n):
    if  n == 0:
        return "Fake News"
    elif n == 1:
        return "Real News"

def manual_testing(news):
    testing_news = {"text":[news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test["text"] = new_def_test["text"].apply(wordopt)
    new_x_test = new_def_test["text"]
    new_xv_test = vectorization.transform(new_x_test)
    pred_lr = LR.predict(new_xv_test)
    pred_dt = DT.predict(new_xv_test)
    pred_gb = GB.predict(new_xv_test)
    pred_rf = RF.predict(new_xv_test)

    return "LR Prediction: {}   DT Prediction: {}   RF Prediction: {}    GB Prediction: {}".format(
    output_label(pred_lr[0]), output_label(pred_dt[0]), output_label(pred_rf[0]), output_label(pred_gb[0]))

Input News Article

In [338]:
news_article = str(input())

'''
WASHINGTON (Reuters) - The head of a conservative Republican faction in the U.S. Congress, who voted this month for a huge expansion of the national debt to pay for tax cuts, called himself a â€œfiscal conservativeâ€ on Sunday and urged budget restraint in 2018. In keeping with a sharp pivot under way among Republicans, U.S. Representative Mark Meadows, speaking on CBSâ€™ â€œFace the Nation,â€ drew a hard line on federal spending, which lawmakers are bracing to do battle over in January. When they return from the holidays on Wednesday, lawmakers will begin trying to pass a federal budget in a fight likely to be linked to other issues, such as immigration policy, even as the November congressional election campaigns approach in which Republicans will seek to keep control of Congress. President Donald Trump and his Republicans want a big budget increase in military spending, while Democrats also want proportional increases for non-defense â€œdiscretionaryâ€ spending on programs that support education, scientific research, infrastructure, public health and environmental protection. â€œThe (Trump) administration has already been willing to say: â€˜Weâ€™re going to increase non-defense discretionary spending ... by about 7 percent,â€™â€ Meadows, chairman of the small but influential House Freedom Caucus, said on the program. â€œNow, Democrats are saying thatâ€™s not enough, we need to give the government a pay raise of 10 to 11 percent. For a fiscal conservative, I donâ€™t see where the rationale is. ... Eventually you run out of other peopleâ€™s money,â€ he said. Meadows was among Republicans who voted in late December for their partyâ€™s debt-financed tax overhaul, which is expected to balloon the federal budget deficit and add about $1.5 trillion over 10 years to the $20 trillion national debt. â€œItâ€™s interesting to hear Mark talk about fiscal responsibility,â€ Democratic U.S. Representative Joseph Crowley said on CBS. Crowley said the Republican tax bill would require the  United States to borrow $1.5 trillion, to be paid off by future generations, to finance tax cuts for corporations and the rich. â€œThis is one of the least ... fiscally responsible bills weâ€™ve ever seen passed in the history of the House of Representatives. I think weâ€™re going to be paying for this for many, many years to come,â€ Crowley said. Republicans insist the tax package, the biggest U.S. tax overhaul in more than 30 years,  will boost the economy and job growth. House Speaker Paul Ryan, who also supported the tax bill, recently went further than Meadows, making clear in a radio interview that welfare or â€œentitlement reform,â€ as the party often calls it, would be a top Republican priority in 2018. In Republican parlance, â€œentitlementâ€ programs mean food stamps, housing assistance, Medicare and Medicaid health insurance for the elderly, poor and disabled, as well as other programs created by Washington to assist the needy. Democrats seized on Ryanâ€™s early December remarks, saying they showed Republicans would try to pay for their tax overhaul by seeking spending cuts for social programs. But the goals of House Republicans may have to take a back seat to the Senate, where the votes of some Democrats will be needed to approve a budget and prevent a government shutdown. Democrats will use their leverage in the Senate, which Republicans narrowly control, to defend both discretionary non-defense programs and social spending, while tackling the issue of the â€œDreamers,â€ people brought illegally to the country as children. Trump in September put a March 2018 expiration date on the Deferred Action for Childhood Arrivals, or DACA, program, which protects the young immigrants from deportation and provides them with work permits. The president has said in recent Twitter messages he wants funding for his proposed Mexican border wall and other immigration law changes in exchange for agreeing to help the Dreamers. Representative Debbie Dingell told CBS she did not favor linking that issue to other policy objectives, such as wall funding. â€œWe need to do DACA clean,â€ she said.  On Wednesday, Trump aides will meet with congressional leaders to discuss those issues. That will be followed by a weekend of strategy sessions for Trump and Republican leaders on Jan. 6 and 7, the White House said. Trump was also scheduled to meet on Sunday with Florida Republican Governor Rick Scott, who wants more emergency aid. The House has passed an $81 billion aid package after hurricanes in Florida, Texas and Puerto Rico, and wildfires in California. The package far exceeded the $44 billion requested by the Trump administration. The Senate has not yet voted on the aid.
'''

(Reuters) - Alabama officials on Thursday certified Democrat Doug Jones the winner of the stateâ€™s U.S. Senate race, after a state judge denied a challenge by Republican Roy Moore, whose campaign was derailed by accusations of sexual misconduct with teenage girls. Jones won the vacant seat by about 22,000 votes, or 1.6 percentage points, election officials said. That made him the first Democrat in a quarter of a century to win a Senate seat in Alabama.  The seat was previously held by Republican Jeff Sessions, who was tapped by U.S. President Donald Trump as attorney general. A state canvassing board composed of Alabama Secretary of State John Merrill, Governor Kay Ivey and Attorney General Steve Marshall certified the election results. Seating Jones will narrow the Republican majority in the Senate to 51 of 100 seats. In a statement, Jones called his victory â€œa new chapterâ€ and pledged to work with both parties. Moore declined to concede defeat even after Trump urged him to do so

'\nWASHINGTON (Reuters) - The head of a conservative Republican faction in the U.S. Congress, who voted this month for a huge expansion of the national debt to pay for tax cuts, called himself a â€œfiscal conservativeâ€\x9d on Sunday and urged budget restraint in 2018. In keeping with a sharp pivot under way among Republicans, U.S. Representative Mark Meadows, speaking on CBSâ€™ â€œFace the Nation,â€\x9d drew a hard line on federal spending, which lawmakers are bracing to do battle over in January. When they return from the holidays on Wednesday, lawmakers will begin trying to pass a federal budget in a fight likely to be linked to other issues, such as immigration policy, even as the November congressional election campaigns approach in which Republicans will seek to keep control of Congress. President Donald Trump and his Republicans want a big budget increase in military spending, while Democrats also want proportional increases for non-defense â€œdiscretionaryâ€\x9d spending on pro

Checking the Article is Fake or not

In [339]:
manual_testing(news_article)

'LR Prediction: Real News   DT Prediction: Real News   RF Prediction: Real News    GB Prediction: Real News'