# Information Retrieval Mini Project

## Develop **Fake News Detection System**.

#### Import Essential Libraries

In [2]:
import re
import string
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


#### Import Datasets

In [3]:
df_fake = pd.read_csv("Fake.csv")
df_true = pd.read_csv("True.csv")


#### Preprocessing

In [4]:
df_true.head()


Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [5]:
df_fake.head()


Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [6]:
df_true.describe()


Unnamed: 0,title,text,subject,date
count,21417,21417,21417,21417
unique,20826,21192,2,716
top,Factbox: Trump fills top jobs for his administ...,(Reuters) - Highlights for U.S. President Dona...,politicsNews,"December 20, 2017"
freq,14,8,11272,182


In [7]:
df_fake.describe()


Unnamed: 0,title,text,subject,date
count,23481,23481.0,23481,23481
unique,17903,17455.0,6,1681
top,MEDIA IGNORES Time That Bill Clinton FIRED His...,,News,"May 10, 2017"
freq,6,626.0,9050,46


In [8]:
df_true.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21417 entries, 0 to 21416
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    21417 non-null  object
 1   text     21417 non-null  object
 2   subject  21417 non-null  object
 3   date     21417 non-null  object
dtypes: object(4)
memory usage: 669.4+ KB


In [9]:
df_fake.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23481 entries, 0 to 23480
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   title    23481 non-null  object
 1   text     23481 non-null  object
 2   subject  23481 non-null  object
 3   date     23481 non-null  object
dtypes: object(4)
memory usage: 733.9+ KB


In [10]:
df_true.shape


(21417, 4)

In [11]:
df_fake.shape


(23481, 4)

In [12]:
df_true['class'] = 1
df_fake['class'] = 0


In [13]:
df_true_manual_testing = df_true.tail(10)
for i in range(21416, 21406, -1):
    df_true.drop([i], axis=0, inplace=True)


In [14]:
df_fake_manual_testing = df_fake.tail(10)
for i in range(23480, 23470, -1):
    df_fake.drop([i], axis=0, inplace=True)


In [15]:
df_true.shape, df_fake.shape


((21407, 5), (23471, 5))

In [16]:
df_true_manual_testing.head()


Unnamed: 0,title,text,subject,date,class
21407,"Mata Pires, owner of embattled Brazil builder ...","SAO PAULO (Reuters) - Cesar Mata Pires, the ow...",worldnews,"August 22, 2017",1
21408,"U.S., North Korea clash at U.N. forum over nuc...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21409,"U.S., North Korea clash at U.N. arms forum on ...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21410,Headless torso could belong to submarine journ...,COPENHAGEN (Reuters) - Danish police said on T...,worldnews,"August 22, 2017",1
21411,North Korea shipments to Syria chemical arms a...,UNITED NATIONS (Reuters) - Two North Korean sh...,worldnews,"August 21, 2017",1


In [17]:
df_fake_manual_testing.head()


Unnamed: 0,title,text,subject,date,class
23471,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
23472,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
23473,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
23474,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0
23475,Hillary Clinton: ‘Israel First’ (and no peace ...,Robert Fantina CounterpunchAlthough the United...,Middle-east,"January 18, 2016",0


In [18]:
df_merge = pd.concat([df_true, df_fake], axis=0)


In [19]:
df_merge


Unnamed: 0,title,text,subject,date,class
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",1
...,...,...,...,...,...
23466,Boston Brakes? How to Hack a New Car With Your...,21st Century Wire says For those who still ref...,Middle-east,"January 22, 2016",0
23467,Oregon Governor Says Feds ‘Must Act’ Against P...,"21st Century Wire says So far, after nearly 20...",Middle-east,"January 21, 2016",0
23468,Ron Paul on Burns Oregon Standoff and Jury Nul...,21st Century Wire says If you ve been followin...,Middle-east,"January 21, 2016",0
23469,BOILER ROOM: As the Frogs Slowly Boil – EP #40,Tune in to the Alternate Current Radio Network...,Middle-east,"January 20, 2016",0


In [20]:
df_merge.columns


Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

In [21]:
df = df_merge.drop(['title', 'subject', 'date'], axis=1)


In [22]:
df.isna().sum()


text     0
class    0
dtype: int64

In [23]:
df = df.sample(frac=1)


In [24]:
df.head()


Unnamed: 0,text,class
7280,The game plan for Republicans isn t exactly di...,0
6106,As much as he and his supporters want to say t...,0
13096,WASHINGTON (Reuters) - President Donald Trump ...,1
15948,Officials allegedly affiliated with the United...,0
4195,"KENOSHA, Wis. (Reuters) - President Donald Tru...",1


In [25]:
df.reset_index(inplace=True)


In [26]:
df.head()


Unnamed: 0,index,text,class
0,7280,The game plan for Republicans isn t exactly di...,0
1,6106,As much as he and his supporters want to say t...,0
2,13096,WASHINGTON (Reuters) - President Donald Trump ...,1
3,15948,Officials allegedly affiliated with the United...,0
4,4195,"KENOSHA, Wis. (Reuters) - President Donald Tru...",1


In [27]:
df.drop(['index'], axis=1, inplace=True)


In [28]:
df.head()


Unnamed: 0,text,class
0,The game plan for Republicans isn t exactly di...,0
1,As much as he and his supporters want to say t...,0
2,WASHINGTON (Reuters) - President Donald Trump ...,1
3,Officials allegedly affiliated with the United...,0
4,"KENOSHA, Wis. (Reuters) - President Donald Tru...",1


In [29]:
df.columns


Index(['text', 'class'], dtype='object')

#### Text Processing

In [30]:
def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W", " ", text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text


In [31]:
df["text"] = df["text"].apply(wordopt)


In [32]:
X = df['text']
y = df['class']


In [33]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


In [34]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
Xv_train = vectorizer.fit_transform(X_train)
Xv_test = vectorizer.transform(X_test)


In [35]:
from sklearn.linear_model import LogisticRegression

lrm = LogisticRegression()
lrm.fit(Xv_train, y_train)


In [36]:
y_preds_lrm = lrm.predict(Xv_test)


In [37]:
from sklearn.metrics import accuracy_score
print("Accuracy Score for Linear Regression Model: ",
      accuracy_score(y_test, y_preds_lrm))


Accuracy Score for Linear Regression Model:  0.9849598930481284


In [38]:
from sklearn.metrics import classification_report
print("Classification Report for Linear Regression Model: ")
print(classification_report(y_test, y_preds_lrm))


Classification Report for Linear Regression Model: 
              precision    recall  f1-score   support

           0       0.99      0.98      0.99      4696
           1       0.98      0.99      0.98      4280

    accuracy                           0.98      8976
   macro avg       0.98      0.99      0.98      8976
weighted avg       0.98      0.98      0.98      8976



In [39]:
from sklearn.tree import DecisionTreeClassifier

dtcm = DecisionTreeClassifier()
dtcm.fit(Xv_train, y_train)


In [40]:
y_preds_dtcm = dtcm.predict(Xv_test)


In [41]:
print("Accuracy Score for Decision Tree Classification Model: ",
      accuracy_score(y_test, y_preds_dtcm))


Accuracy Score for Decision Tree Classification Model:  0.9957664884135472


In [42]:
print("Classification Report for Decision Tree Classifier: ")
print(classification_report(y_test, y_preds_dtcm))


Classification Report for Decision Tree Classifier: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      4696
           1       1.00      1.00      1.00      4280

    accuracy                           1.00      8976
   macro avg       1.00      1.00      1.00      8976
weighted avg       1.00      1.00      1.00      8976



In [43]:
from sklearn.ensemble import GradientBoostingClassifier

gbcm = GradientBoostingClassifier(random_state=42)
gbcm.fit(Xv_train, y_train)


In [44]:
y_preds_gbcm = gbcm.predict(Xv_test)


In [45]:
print("Accuracy Score for Gradient Boosting Classification Model: ",
      accuracy_score(y_test, y_preds_gbcm))


Accuracy Score for Gradient Boosting Classification Model:  0.9943181818181818


In [46]:
print("Classification Report for Gradient Boosting Classifier: ")
print(classification_report(y_test, y_preds_gbcm))


Classification Report for Gradient Boosting Classifier: 
              precision    recall  f1-score   support

           0       1.00      0.99      0.99      4696
           1       0.99      1.00      0.99      4280

    accuracy                           0.99      8976
   macro avg       0.99      0.99      0.99      8976
weighted avg       0.99      0.99      0.99      8976



In [47]:
from sklearn.ensemble import RandomForestClassifier

rfcm = RandomForestClassifier()
rfcm.fit(Xv_train, y_train)


In [48]:
y_preds_rfcm = rfcm.predict(Xv_test)


In [49]:
print("Accuracy Score for Random Forest Classification Model: ",
      accuracy_score(y_test, y_preds_gbcm))


Accuracy Score for Random Forest Classification Model:  0.9943181818181818


In [50]:
print("Classification Report for Random Forest Classifier: ")
print(classification_report(y_test, y_preds_rfcm))


Classification Report for Random Forest Classifier: 
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4696
           1       0.99      0.99      0.99      4280

    accuracy                           0.99      8976
   macro avg       0.99      0.99      0.99      8976
weighted avg       0.99      0.99      0.99      8976



In [51]:
def output_lable(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Not A Fake News"


def manual_testing(news):
    testing_news = {"text": [news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test["text"] = new_def_test["text"].apply(wordopt)
    new_x_test = new_def_test["text"]
    new_xv_test = vectorizer.transform(new_x_test)
    pred_lrm = lrm.predict(new_xv_test)
    pred_dtcm = dtcm.predict(new_xv_test)
    pred_gbcm = gbcm.predict(new_xv_test)
    pred_rfcm = rfcm.predict(new_xv_test)

    return print("\n\nLinear Regression Prediction: {} \nDecision Tree Classifier Prediction: {} \nGradient Boosting Classifier Prediction: {} \nRandom Forest Classifier Prediction: {}".format(output_lable(pred_lrm[0]), output_lable(pred_dtcm[0]), output_lable(pred_gbcm[0]), output_lable(pred_rfcm[0])))


In [55]:
news = """A week after Supreme Court-appointed panel of experts examining India’s regulatory mechanism in an investigation
linked to the Hindenburg allegations has given a clean chit to the Adani Group, the short-seller Hindenburg has announced 
that it will now expose Supreme Court who has given clean chit to Adani.Speaking to The Fauxy, Hindenburg said that 
Hindenburg is the biggest authority and SC of India has done a contempt by not aligning itself with Hindenburg report. 
“We will soon release a report on Supreme Court of India” said the Hindenburg Research CEO Nathan Anderson. 
Hindenburg is currently finding SC shares to take short position and make profits when the share prices drop after its 
report. Reportedly, Hindenburg is likely to raise questions on Supreme Court’s collegium system."""
manual_testing(news)




Linear Regression Prediction: Fake News 
Decision Tree Classifier Prediction: Fake News 
Gradient Boosting Classifier Prediction: Fake News 
Random Forest Classifier Prediction: Fake News


In [57]:
news2 = """BRUSSELS (Reuters) - NATO allies on Tuesday welcomed President Donald Trump s decision to commit more forces 
to Afghanistan, as part of a new U.S. strategy he said would require more troops and funding from America s partners. 
Having run for the White House last year on a pledge to withdraw swiftly from Afghanistan, Trump reversed course on 
Monday and promised a stepped-up military campaign against  Taliban insurgents, saying:  Our troops will fight to win.  
U.S. officials said he had signed off on plans to send about 4,000 more U.S. troops to add to the roughly 8,400 now 
in Afghanistan. But his speech did not define benchmarks for successfully ending the war that began with the U.S.-led 
invasion of Afghanistan in 2001, and which he acknowledged had required an   extraordinary sacrifice of blood and treasure.  
We will ask our NATO allies and global partners to support our new strategy, with additional troops and funding increases 
in line with our own. We are confident they will,  Trump said. That comment signaled he would further increase pressure 
on U.S. partners who have already been jolted by his repeated demands to step up their contributions to NATO and his 
description of the alliance as  obsolete  - even though, since taking office, he has said this is no longer the case. 
NATO Secretary General Jens Stoltenberg said in a statement:  NATO remains fully committed to Afghanistan and I am 
looking forward to discussing the way ahead with (Defense) Secretary (James) Mattis and our Allies and international 
partners. NATO has 12,000 troops in Afghanistan, and 15 countries have pledged more, Stoltenberg said. Britain, a 
leading NATO member, called the U.S. commitment  very welcome ."""
manual_testing(news2)




Linear Regression Prediction: Not A Fake News 
Decision Tree Classifier Prediction: Not A Fake News 
Gradient Boosting Classifier Prediction: Not A Fake News 
Random Forest Classifier Prediction: Not A Fake News
