# LSTM model for fake news detection

In this notebook, we implement and show results of the LSTM model for the fake news detection task.

In [1]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re



## Dataset

We use the Information Security and Object Technology (ISOT) lab's dataset. It predominantly contains political news from the US and around the world segregated into two CSV files, ‘Fake.csv’ and ‘True.csv’. 

In [None]:
# Loading dataset
df_true = pd.read_csv('data/True.csv')
df_fake = pd.read_csv('data/Fake.csv')

df_true.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [25]:
df_true.shape, df_fake.shape

((21417, 4), (23481, 4))

In [14]:
df_fake.sample(5)

Unnamed: 0,title,text,subject,date
2876,Trump HUMILIATED By Presidential Historian’s ...,Trump s first few days in the White House have...,News,"January 24, 2017"
15344,FLASHBACK: SOMALI CAB DRIVER Threatens To Kill...,Just so we have this straight a Somali cab dri...,politics,"Aug 10, 2015"
8452,Bernie Sanders Just Got One Of The Most Impor...,Bernie Sanders just picked up one of the most ...,News,"January 29, 2016"
18867,FLASHBACK VIDEO Shows Leftist Media Members Pr...,With Senate Republicans poised to use the nuc...,left-news,"Apr 4, 2017"
18114,KATIE COURIC HITS NEW CAREER LOW: Asks Perfect...,Katie Couric s career has been pretty much in ...,left-news,"Aug 18, 2017"


In [20]:
df_true.sample(5)

Unnamed: 0,title,text,subject,date
18322,Italian government gets economy bill through S...,ROME (Reuters) - The Italian parliament approv...,worldnews,"October 4, 2017"
21276,U.N. nuclear watchdog opens uranium bank in Ka...,ASTANA (Reuters) - The International Atomic En...,worldnews,"August 29, 2017"
11187,"Obama, Australia's Turnbull pledge more cooper...",WASHINGTON (Reuters) - U.S. President Barack O...,politicsNews,"January 19, 2016"
19906,Guatemala Congress withdraws bill that cut ant...,GUATEMALA CITY (Reuters) - Guatemala s Congres...,worldnews,"September 15, 2017"
7234,Trump adviser calls for tax reform as bipartis...,WASHINGTON (Reuters) - A co-author of U.S. Pre...,politicsNews,"November 16, 2016"


We observe that the true dataset contains a clearly identifiable body compared to the fake dataset, as it includes the source and city of the news. To effectively leverage the power of the LSTM model—or any other machine learning model—for this task, we need to remove this part from the true articles.

In [22]:
def remove_intro(text):
    # Remove everything from the beginning up to and including the first dash followed by a space
    return re.sub(r'^.*?-\s+', '', text)

In [23]:
df_true['text'] = df_true['text'].apply(remove_intro)

df_true.sample(5)

Unnamed: 0,title,text,subject,date
9059,Man arrested at Trump rally said he wanted to ...,A man arrested over the weekend trying to wres...,politicsNews,"June 20, 2016"
4238,Trump administration says no U.S. trading part...,U.S. President Donald Trump’s administration d...,politicsNews,"April 14, 2017"
11412,Bus drives into pedestrian underpass in Moscow...,A passenger bus swerved off course and drove i...,worldnews,"December 25, 2017"
7587,"Kerry says confident on Philippines ties, hope...",U.S. Secretary of State John Kerry said on Thu...,politicsNews,"November 3, 2016"
3134,Trump lawyer says president not informed he is...,One of President Donald Trump’s personal lawye...,politicsNews,"June 18, 2017"


Now, we can merge both datasets after adding a 'label' column.

In [26]:
df_true['label'] = 1
df_fake['label'] = 0

df = pd.concat([df_true, df_fake], ignore_index=True)
df.sample(5)

Unnamed: 0,title,text,subject,date,label
30455,Cenk Uygur Rages At White Apathy After CPD Mu...,"On December 26, the father of 19-year-old Quin...",News,"January 1, 2016",0
5437,"U.S. secretaries Tillerson, Kelly to visit Mex...",U.S. Secretary of State Rex Tillerson and Home...,politicsNews,"February 15, 2017",1
16286,Islamic State shores up last stronghold on Syr...,Islamic State is building up its defenses in a...,worldnews,"October 27, 2017",1
9289,"U.S. lawmakers probe Fed cyber breaches, cite ...",A U.S. congressional committee has launched an...,politicsNews,"June 3, 2016",1
44770,EXPOSED: Facebook Blacklists Conservative News...,21st Century Wire says Censorship is running r...,Middle-east,"May 10, 2016",0


In [28]:
# Check for null values
df.isna().sum()

title      0
text       0
subject    0
date       0
label      0
dtype: int64

Now, we create a unique column with clean text to be used for our model by merging the title and body of the articles.

In [29]:
def clean_and_merge(row):
    # Merge title and text
    full_text = f"{row['title']} {row['text']}"
    # Lowercase
    full_text = full_text.lower()
    # Remove special characters (keep letters, numbers, and spaces)
    full_text = re.sub(r'[^a-z0-9\s]', '', full_text)
    return full_text

In [30]:
df['full_text'] = df.apply(clean_and_merge, axis=1)

df.head()

Unnamed: 0,title,text,subject,date,label,full_text
0,"As U.S. budget fight looms, Republicans flip t...",The head of a conservative Republican faction ...,politicsNews,"December 31, 2017",1,as us budget fight looms republicans flip thei...
1,U.S. military to accept transgender recruits o...,Transgender people will be allowed for the fir...,politicsNews,"December 29, 2017",1,us military to accept transgender recruits on ...
2,Senior U.S. Republican senator: 'Let Mr. Muell...,The special counsel investigation of links bet...,politicsNews,"December 31, 2017",1,senior us republican senator let mr mueller do...
3,FBI Russia probe helped by Australian diplomat...,Trump campaign adviser George Papadopoulos tol...,politicsNews,"December 30, 2017",1,fbi russia probe helped by australian diplomat...
4,Trump wants Postal Service to charge 'much mor...,President Donald Trump called on the U.S. Post...,politicsNews,"December 29, 2017",1,trump wants postal service to charge much more...
