### Importing the Necessary Libraries

In [1]:
import numpy as np
import pandas as pd
import keras
import tensorflow as tf
from keras.layers import Dense
from keras.models import Sequential
import matplotlib.pyplot as plt
%matplotlib inline

Using TensorFlow backend.


### Reading the Dataset

In [2]:
df_true = pd.read_csv("True.csv")
df_fake = pd.read_csv("Fake.csv")
df_true.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


### Concatenating the true and fake datasets 

In [3]:
df_true['category'] = 1
df_fake['category'] = 0
df = pd.concat([df_true,df_fake])

### Importing Libraries for Deep Learning

In [4]:
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.callbacks import Callback

In [15]:
#Initializing the Hyperparameters
vocab_size = 100000
embedding_dim_title = 128
max_length_title = 40
embedding_dim_text = 500
max_length_text = 500
trunc_type = 'post'
padding_type = 'post'
test_ratio = .2
embedding_dim = 500

### Tokenizing the Words (Mapping Words to Vectors)

In [28]:
df['text'] = df['title'] + df['text'] + df['subject']
X_train,X_test,y_train,y_test = train_test_split(df.text,df.category, test_size = 0.20)

t = Tokenizer(num_words = vocab_size)
t.fit_on_texts(X_train)
train_sequences = t.texts_to_sequences(X_train)
train_padded = pad_sequences(train_sequences, maxlen=max_length_title,
                                padding=padding_type,
                                truncating=trunc_type)
t.fit_on_texts(X_test)
test_sequences = t.texts_to_sequences(X_test)
test_padded = pad_sequences(test_sequences, maxlen=max_length_title,
                                padding=padding_type,
                                truncating=trunc_type)

train_padded = np.array(train_padded)
y_train = np.array(y_train)

### Custom Callback Function for Early Stopping

In [26]:
class AccuracyHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.acc = []

    def on_epoch_end(self, batch, logs={}):
        if logs.get('acc') > 0.95:
                print(f'Accuracy reached {logs.get("acc")*100:0.2f}. Stopping the training')
                self.model.stop_training = True

history = AccuracyHistory()


### Creating the Model

In [16]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(embedding_dim)))
model.add(tf.keras.layers.Dense(embedding_dim, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])

### Training the Model

In [29]:
model.fit(train_padded , y_train, epochs=5,batch_size=32, callbacks=[history])

Train on 35918 samples
Epoch 1/5
Epoch 2/5


<tensorflow.python.keras.callbacks.History at 0x10cd8499ba8>

### Saving Model

In [20]:
model.save('modelLSTM09892.h5')

### Testing the Model 

In [33]:
model = tf.keras.models.load_model('modelLSTM09892.h5')

### Testing On Known Fake News Article
From Politifact: https://www.politifact.com/factchecks/2020/apr/20/facebook-posts/news-photo-stay-home-protest-was-not-doctored/

In [34]:
fakeee = ["Conspiracies about mainstream news media are flourishing amid the government response to the COVID-19 pandemic. What a bunch of BS, screamed a Facebook post about a news photo from a Wisconsin rally against stay-at-home orders. Sharing two images from the demonstration, the post essentially claims the Milwaukee Journal Sentinel doctored a photo to put a Confederate Battle Flag in the hands of one protester. But the Milwaukee Journal Sentinel did not alter its photo.   The post was flagged as part of Facebook’s efforts to combat false news and misinformation on its News Feed. (Read more about our partnership with Facebook.) Here’s what happened. The Journal Sentinel, which publishes PolitiFact Wisconsin, posted a news story about the April 18 rally in Brookfield, a Milwaukee suburb. Nearly 1,000 people packed the sidewalk adjacent to a busy thoroughfare, most shoulder to shoulder, to protest Gov. Tony Evers’ decision to extend Wisconsin’s safer-at-home order until May 26. The Facebook post shows two photographs from the rally side by side — one from the Journal Sentinel and one said to be taken by the poster’s daughter.  The Journal Sentinel photo shows a man wearing a plaid shirt and jeans among a group of people and holding two flags — a Confederate flag and just above it, a yellow flag that is harder to make out.  The other photo with the post shows a man, also in a plaid shirt and jeans, who is not so close to other people. He is clearly holding only a yellow flag. The implication is that in its photo, the Journal Sentinel added the Confederate flag into the man’s hands."]
t = Tokenizer(num_words =vocab_size)
t.fit_on_texts(fakeee)
fake_sequence = t.texts_to_sequences(fakeee)
fake_padded = pad_sequences(fake_sequence, maxlen= 40,
                                padding= 'post',
                                truncating= 'post')

pred = model.predict(fake_padded)
if pred >= 0.5:
    print("True with a Confidence of: ", (pred[0][0]) * 100, "%")
else:
    print("False with a Confidence of: ", (1-pred[0][0]) * 100, "%")

False with a Confidence of:  99.77364437654614 %


### Testing on Known True News Article
From Politifact: https://www.politifact.com/factchecks/2020/apr/20/andrew-cuomo/cuomo-accurately-says-other-countries-reopened-saw/

In [35]:
trueee = ["During a recent press briefing regarding Covid-19 in New York state, Gov. Andrew Cuomo said the economy must be reopened and that people need to get back to work. But, he said the rate of infection is currently being kept down because people are staying in their homes.  And if you start acting differently, you will see a corresponding increase in that rate of infection. And the worst scenario would be if we did all of this, we got that number down, everybody went to extraordinary means, and then we go to reopen and we reopen too fast or we reopen and there’s unanticipated consequences, and we see that number go up again, he said.  He warned that people who think he is being hyper-cautious should look at what is happening in other countries and their responses to the new coronavirus.  Go look at other countries that went through exactly this, started to reopen, and then they saw the infection rate go back up again, he said.   Since talk of reopening, and the possible dangers of doing so, dominates headlines across the United States and around the world, we wondered about whether other countries have already experienced a second surge in infections after relaxing lockdown orders for people, schools, businesses, and borders.  Experience elsewher In Asia, authorities have been dealing with Covid-19 for more months than the United States has, and they have more experience in responding to the virus.  In some cases, the number of infections in a city or country has increased after welcoming inbound travelers, or after a relaxing of social distancing measures.  In Hong Kong, an early response seemed to contain the virus, and then life resumed as people returned to work and restaurants.  With most everyone’s guard down, the predator lashed back last week. Cases of Covid-19 surged, STAT News reported on March 26. The government ordered people back home, and closed facilities that had been reopened.    In Hong Kong, it soon became clear that while the majority were coming from overseas, quarantine measures in place were not sufficient, and local transmission had resumed, CNN reported on March 23. "]
t = Tokenizer(num_words =vocab_size)
t.fit_on_texts(trueee)
true_sequence = t.texts_to_sequences(trueee)
true_padded = pad_sequences(true_sequence, maxlen= 40,
                                padding= 'post',
                                truncating= 'post')
pred = model.predict(true_padded)
if pred >= 0.5:
    print("True with a Confidence of: ", (pred[0][0]) * 100, "%")
else:
    print("False with a Confidence of: ", (1-pred[0][0]) * 100, "%")

True with a Confidence of:  95.37554383277893 %
