# Fake News Prediction

In [50]:
import numpy as np
import pandas as pd
import re
from IPython.display import display, HTML

# Words that don't add any meaning to an article (stopwords)
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\grans\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [51]:
news_df = pd.read_csv('train.csv')
news_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20800 entries, 0 to 20799
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      20800 non-null  int64 
 1   title   20242 non-null  object
 2   author  18843 non-null  object
 3   text    20761 non-null  object
 4   label   20800 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 812.6+ KB


In [52]:
news_df.isnull().sum()

id           0
title      558
author    1957
text        39
label        0
dtype: int64

In [53]:
# replacing null values with empty string
news_df = news_df.fillna("")

In [54]:
news_df['content'] = news_df['author'] + ' ' + news_df['title']

In [55]:
X = news_df.drop(columns='label', axis = 1)
y = news_df['label']

Stemming is a process of reducing a word to its root word:

e.g. actor, actress, acting --> act

In [56]:
port_stem = PorterStemmer()

def stemming(content):
    # Only keeping alphabets in the content 
    stemmed_content = re.sub('[^a-zA-Z]', ' ', content)
    # Removing spaces and converting to lowercase
    stemmed_content = stemmed_content.lower()
    stemmed_content = stemmed_content.split()
    # Stemming the words that are not in stopwords
    stemmed_content = [port_stem.stem(word) for word in stemmed_content if not word in stopwords.words('english')]
    # Joining the list of words into a string
    stemmed_content = ' '.join(stemmed_content)
    return stemmed_content

In [57]:
news_df['content'] = news_df['content'].apply(stemming)

In [58]:
X = news_df['content'].values
y = news_df['label'].values

In [59]:
vectorizer = TfidfVectorizer()
vectorizer.fit(X)
X = vectorizer.transform(X)

### Split into train and test data

In [60]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify = y, random_state=2)

model_lr = LogisticRegression()

In [61]:
model_lr.fit(X_train, y_train)

### Evaluation

In [62]:
# accuracy of training data

X_train_prediction = model_lr.predict(X_train)
print(f'Accuracy score of training data: {accuracy_score(X_train_prediction, y_train):.4f}%')

Accuracy score of training data: 0.9866%


In [63]:
# accuracy of testing data

X_test_prediction = model_lr.predict(X_test)
print(f'Accuracy score of testing data: {accuracy_score(X_test_prediction, y_test):.4f}%')

Accuracy score of testing data: 0.9791%


### Making a Predicitive Model

In [91]:
test_df = pd.read_csv('test.csv')

# Select a random test sample
random_test = test_df.sample(n=1)

# Preprocess the content
random_test['content'] = random_test['author'] + ' ' + random_test['title']

random_test['content'] = random_test['content'].apply(stemming)
random_test_X = random_test['content'].values

# Transform the content using the vectorizer
random_test_X = vectorizer.transform(random_test_X)

# Make predictions
prediction = model_lr.predict(random_test_X)

# Create a table to display the prediction
prediction_table = pd.DataFrame({'Title': random_test['title'],
                                 'Author': random_test['author'],
                                 'Text': random_test['text'],
                                 'Prediction': ['Real' if prediction[0] == 0 else 'Fake']})

# Display the table
display(HTML(prediction_table.to_html(index=False)))


Title,Author,Text,Prediction
"Girl Asks Her 20 Boyfriends To Each Give Her an iPhone, Uses The Money To Buy a House",Eddy Lavine,"Many Women Suffer From Fungal Infections And They Can Cure Them At Home Ξ [November 1, 2016] BLOG Girl Asks Her 20 Boyfriends To Each Give Her an iPhone, Uses The Money To Buy a House posted by Eddie We have heard the stories about Apple fanatics selling body parts and changing their passport in order to get the latest products from the company , but this latest story coming from China gives a different perspective on how you can use Apple devices as a tool. As a rather strange tool, in fact: an entrepreneurial Chinese woman is said to have convinced 20 men that she has been dating to buy her iPhones, which she would then sell and use the money as a downpayment for a house. The purchase of a house is probably the single biggest one-time spending in the lifetime of a person, and it could take long years of hard work, so it is indeed remarkable that the woman found the quick way to get there (however, not the most ethical one, arguably). Let’s make it clear: the story seems to have started on local Tian Ya Yi Du forum, where someone by the nickname Proud Qiaoba told the story of how her co-worker Xiaoli (not her real name) asked each of her 20 current boyfriends to buy her the new iPhone 7. How in the world is it possible to simulatenously date 20 people remains a mystery that gives this whole story a bit of a fairy tale aura, but knowing that China suffers from a terrible male-to-female ratio let’s assume that is somehow possible. Xiaoli, whose life remains veiled in secrecy, then sold all of the phones to a suspiciously particular place: mobile phone recycling site Hui Shou Bao, all for a total of 120,000 Chinese yuan, the equivalent of around $17,500. The woman then used the cash for a downpayment towards a countryhouse home. Xiaoli broke the news when she invited her co-workers for a house warming party. “Everyone in the office is talking about this now. Who knows what her boyfriends think now this news has become public.” From Qiaoba we also learn that Xiaoli “is not from a wealthy family. Her mum is a housewife and her dad is a migrant worker, and she is the oldest daughter. Her parents are getting old and she might be under a lot pressure hoping to buy them a house… But it’s still unbelievable that she could use this method!” \nIt does indeed sound quite fantastical and made-up, and some have theorized that it’s all just a theory to popularize the local Hui Shou Bao phone re-seller (the company has confirmed that it had indeed sold 20 iPhones from a single person recently). Meanwhile, the whole story quickly started becoming popular in China: a ’20 mobiles for a house’ hashtag went trending on local microblog Weibo and has been used more than 13 million times. Reaction sway from admiration for Xiaoli’s boyfriend-getting and management skills to sheer awe and condemnation of her actions. Whatever it is, Xiaoli – if she even exists – is now said to shy away from the public eye and to refuse interviews from media. That’s actually not surprising… those 20 boyfriends surely would not be happy about her actions. Source:",Fake
