# Fake News Detector

[![Twitter Follow](https://img.shields.io/twitter/follow/dialhaseeb?style=social)](www.twitter.com/dialhaseeb)

![Logo](https://github.com/zenyc/zenyc/blob/master/logo-small.png)

## 🕯 About
**fake-news-detector** is a *machine learning model* that predicts if a given news article is fake or not. It uses Deep Learning techniques to do so.


## Before we beigin, let's cofigure some stuff so that the notebook runs both on your local machine and on *Google's Colaboratory*

1- If you are running locally, run the following cell:

In [1]:
proj_dir = "proj-dir"

2- If you are running on *Colab*, 
- Make sure you have uploaded all the project files to your *Google Drive*. Then, mount your drive by running the following cell:

In [6]:
from google.colab import drive
drive.mount("/content/drive")

- Then write out the path to the project files relative to your drive's root directory after `/content/drive/My Drive/` in the following cell:

In [7]:
proj_dir = "/content/drive/My Drive/Projects/fake-news-detector/" + "proj-dir"

## Next up, let's import everything we need. Run the following:

In [1]:
import pandas as pd
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras import Input
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.preprocessing.text import Tokenizer
import matplotlib.pyplot as plt
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import load_model

## Let's import the *True.csv* and *Fake.csv* files

In [11]:
true = pd.read_csv(proj_dir+"True.csv")

In [12]:
true = true.drop(["subject","date"], axis=1)

In [13]:
true["label"] = 1

In [14]:
true

Unnamed: 0,title,text,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,1
...,...,...,...
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,1
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",1
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,1
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,1


In [15]:
fake = pd.read_csv(proj_dir+"Fake.csv")

In [16]:
fake = fake.drop(["subject","date"], axis=1)

In [17]:
fake["label"] = 0

In [18]:
fake

Unnamed: 0,title,text,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,0
...,...,...,...
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,0
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,0
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,0
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,0


## Concatinating both the Datadrames and shuffling:

In [19]:
news = pd.concat([fake, true])

In [20]:
news = news.sample(frac=1)

In [13]:
news

Unnamed: 0,title,text,label
8708,Supreme Court nominee out in cold as election ...,WASHINGTON (Reuters) - Merrick Garland hit an...,1
22670,Plastic Persona: Behind the Scenes of the Ted ...,21st Century Wire says Most people accept that...,0
16734,Kenya to charge opposition leader's sister wit...,NAIROBI (Reuters) - Kenyan authorities will ar...,1
10243,KID ROCK BAND MEMBER PREDICTS: ‘That F*cker Co...,Kid Rock Keyboardist Jimmie Bones is shocked b...,0
23457,Activist: ‘This is where you can make the most...,21st Century Wire says If you ve been followin...,0
...,...,...,...
10253,SARAH HUCKABEE SANDERS MOCKS WH PRESS: Lists A...,Sarah Huckabee Sanders let the White House pre...,0
14398,DID BEYONCE AND JAY Z’s “Vacation” To Communis...,Notorious radical Black Panther and NJ cop kil...,0
7007,Republicans Alarmed As Wisconsin’s Racist Vot...,"Ever since the 2008 election, Republicans have...",0
6046,Trump Responds To Devastating Clinton Attack ...,After former Secretary of State Hillary Clinto...,0


## Reseting Index adds a new chronological Index to the Dataframe:

In [21]:
news = news.reset_index(drop=True)

In [22]:
news

Unnamed: 0,title,text,label
0,Iran warns U.S. against imposing further sanct...,BEIRUT/DUBAI (Reuters) - Iran warned the Unite...,1
1,U.S. Senate confirms Haley as Trump's U.N. amb...,WASHINGTON (Reuters) - The U.S. Senate voted a...,1
2,JUDGE JEANINE PIRRO’S TRUTH BOMB On Fired US A...,https://www.youtube.com/watch?v=yRXmFmgoPTk,0
3,"Finally Asked About Bribery Scandal, Trump Im...","For a long time now, the mainstream media has ...",0
4,Poland's refusal to accept Muslim migrants may...,WARSAW (Reuters) - The European Commission s d...,1
...,...,...,...
44893,WATCH: Trump Minion Makes HUGE Slip About Who...,The American people are NOT going to like this...,0
44894,Syria's militant ex-Qaeda group denies leader ...,AMMAN (Reuters) - Syria s Tahrir al-Sham milit...,1
44895,Japan defense minister backs all U.S. options ...,SINGAPORE (Reuters) - Japan’s defense minister...,1
44896,Police union: Open carry of guns should be sus...,CLEVELAND (Reuters) - The head of the Clevelan...,1


## Loading *Glove Word Embeddings*:

In [16]:
embeddings_index = {}
f = open(proj_dir+'glove.6B.100d.txt',encoding="utf")
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))

Found 400000 word vectors.


In [23]:
texts = news["text"]

In [24]:
labels = news.label

In [None]:
texts.shape

## Tokenizing the Text Data:

In [26]:
text_tokenizer = Tokenizer()
text_tokenizer.fit_on_texts(texts.values)
text_sequences = text_tokenizer.texts_to_sequences(texts.values)
text_sequences = sequence.pad_sequences(text_sequences, maxlen=500)

In [35]:
text_sequences.shape

(44898, 500)

In [None]:
len(text_tokenizer.word_index)

## Creating Embedding Matrix:

In [8]:
max_words = 138022
embedding_dim = 100
texts_embedding_matrix = np.zeros((max_words, embedding_dim))

for word, i in text_tokenizer.word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            texts_embedding_matrix[i] = embedding_vector

NameError: name 'text_tokenizer' is not defined

In [None]:
text_sequences.shape

## Defining the Model Using Keras' `Functional API`:

In [None]:
texts_input = Input(shape=(500,), dtype='int32', name='texts_input')
y = layers.Embedding(len(text_tokenizer.word_index)+1, 100, input_length=500)(texts_input)
y = layers.LSTM(32,
dropout=0.1,
recurrent_dropout=0.5,
return_sequences=True)(y)
y = layers.LSTM(32,
dropout=0.1,
recurrent_dropout=0.5,
return_sequences=False)(y)


In [None]:
y = layers.Dense(100, activation='relu')(y)
output_combined = layers.Dense(1, activation='sigmoid')(y)

In [123]:
model_combined = Model(texts_input, output_combined)

## Summarizing:

In [124]:
model_combined.summary()

Model: "model_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
texts_input (InputLayer)     [(None, 500)]             0         
_________________________________________________________________
embedding_17 (Embedding)     (None, 500, 100)          13802200  
_________________________________________________________________
lstm_34 (LSTM)               (None, 500, 32)           17024     
_________________________________________________________________
lstm_35 (LSTM)               (None, 32)                8320      
_________________________________________________________________
dense_15 (Dense)             (None, 100)               3300      
_________________________________________________________________
dense_16 (Dense)             (None, 1)                 101       
Total params: 13,830,945
Trainable params: 13,830,945
Non-trainable params: 0
_______________________________________________

## Setting Weights of Embeddings Layer:

In [125]:
model_combined.layers[1].set_weights([texts_embedding_matrix])
model_combined.layers[1].trainable = False

In [126]:
model_combined.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

## Training for 20 Epochs using early Stopping

In [128]:
es = EarlyStopping(monitor='val_loss', mode='min')

In [130]:
history = model_combined.fit(text_sequences, labels.values ,epochs=20, validation_split=0.2, 
                             callbacks = [es]
                            )

Train on 35918 samples, validate on 8980 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20


## We got a 99.72% Validation Accuracy! Let's save the model:

In [131]:
model_combined.save(proj_dir + "trained.h5")

## Let's Evaluate the model finally:

In [None]:
sequences = tokenizer.texts_to_sequences(x_test.values)
sequences = sequence.pad_sequences(sequences, maxlen=200)

In [None]:
x_test = sequences

In [73]:
score = model.evaluate(x_test, y_test.values, batch_size=200, verbose=2)

3000/1 - 5s - loss: 0.2057 - acc: 0.9233


In [74]:
score

[0.1826317250728607, 0.92333335]

## We have a 92.3% percent testing accuracy. Amazing! Now let's see the model in action:

In [36]:
def encoder(text):
    text = text_tokenizer.texts_to_sequences([text])
    text = sequence.pad_sequences(text, maxlen=500)
    return text

In [28]:
model = load_model(proj_dir+"trained.h5")

In [40]:
def predict(text):
    encoded_text = encoder(text)
#     print(encoded_text)
    prediction = (model.predict(encoded_text))
    print(prediction)
    prediction = np.round(prediction)
    if prediction==1:
        return "Not Fake"
    return "Fake"

In [59]:
predict("Donald Trump becomes Muslim")

[[3.3198838e-05]]


'Fake'

In [102]:
predict("This will make your life easier")

[[0.98155046]]


'Clickbait'

# The End?

## 👀 Contact

If you want to contact me you can reach me at <zenyc@live.com>.