### Exploring the World of ML: Fake News Prediction Project by Kartik (Age 13) 🚀📰

In this Jupyter Notebook, join me on a journey into the fascinating realm of machine learning as I delve into a project focused on predicting fake news. Together, we'll unravel the intricacies of algorithms, analyze news articles, and strive to distinguish fact from fiction. Stay tuned for updates and insights into my exploration of AI and misinformation detection! 👨‍💻✨

###### The Dataset Used: https://www.kaggle.com/datasets/jainpooja/fake-news-detection

#### Importing Important libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
import warnings 

warnings.simplefilter("ignore")

### Importing Dataset

In [2]:
df_real = pd.read_csv('./dataset/True.csv')
df_fake = pd.read_csv('./dataset/Fake.csv')

In [3]:
df_real.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [4]:
df_fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [5]:
df_fake.shape, df_real.shape

((23481, 4), (21417, 4))

In [6]:
# adding a column called class
df_real["class"] = 1
df_fake["class"] = 0

### Preparing portion of dataset for Manual Testing


In [7]:
# Creating manual testing datasets
df_fake_manual_testing = df_fake.tail(10)
df_real_manual_testing = df_real.tail(10)

# Removing last 10 rows from training datasets
for i in range(23480, 23470, -1):
    df_fake.drop([i], axis=0, inplace=True)
    
for i in range(21416, 21406, -1):
    df_real.drop([i], axis=0, inplace=True)

In [8]:
df_fake_manual_testing.head()

Unnamed: 0,title,text,subject,date,class
23471,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
23472,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
23473,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
23474,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0
23475,Hillary Clinton: ‘Israel First’ (and no peace ...,Robert Fantina CounterpunchAlthough the United...,Middle-east,"January 18, 2016",0


In [9]:
df_real_manual_testing.head()

Unnamed: 0,title,text,subject,date,class
21407,"Mata Pires, owner of embattled Brazil builder ...","SAO PAULO (Reuters) - Cesar Mata Pires, the ow...",worldnews,"August 22, 2017",1
21408,"U.S., North Korea clash at U.N. forum over nuc...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21409,"U.S., North Korea clash at U.N. arms forum on ...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21410,Headless torso could belong to submarine journ...,COPENHAGEN (Reuters) - Danish police said on T...,worldnews,"August 22, 2017",1
21411,North Korea shipments to Syria chemical arms a...,UNITED NATIONS (Reuters) - Two North Korean sh...,worldnews,"August 21, 2017",1


In [10]:
df_fake_manual_testing["class"] = 0
df_real_manual_testing["class"] = 1

#### Exporting the dataset which will be convenient in the future for us

In [11]:
# Combining manual testing datasets
manual_testing_data = pd.concat([df_fake_manual_testing, df_real_manual_testing], axis=0)

# Saving manual testing data to a CSV file
manual_testing_data.to_csv("./dataset/manual_testing.csv")

## Merging df_real and df_fake into one single dataset

In [12]:
df = pd.concat([df_fake, df_real], axis=0)
df.head()

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [13]:
df.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

#### Checking Null values (luckily we don't have any null values)

In [14]:
df.isnull().sum()

title      0
text       0
subject    0
date       0
class      0
dtype: int64

In [15]:
df.reset_index(inplace = True)
df.drop(["index"], axis = 1, inplace = True)

df.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

In [16]:
df.head()

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [17]:
df

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0
...,...,...,...,...,...
44873,Exclusive: Trump's Afghan decision may increas...,ON BOARD A U.S. MILITARY AIRCRAFT (Reuters) - ...,worldnews,"August 22, 2017",1
44874,U.S. puts more pressure on Pakistan to help wi...,WASHINGTON (Reuters) - The United States sugge...,worldnews,"August 21, 2017",1
44875,Exclusive: U.S. to withhold up to $290 million...,WASHINGTON (Reuters) - The United States has d...,worldnews,"August 22, 2017",1
44876,Trump talks tough on Pakistan's 'terrorist' ha...,ISLAMABAD (Reuters) - Outlining a new strategy...,worldnews,"August 22, 2017",1


## Importing Important Libraries for the model creation

In [18]:
import pandas as pd
import re
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

## Creating Function for Text Preprocessing 

In [19]:
def textpreprocess(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W"," ",text) 
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)    
    return text


## Applying text preprocessing to 'title,' 'text,' and 'subject'

In [20]:
df['processed_title'] = df['title'].apply(textpreprocess)
df['processed_text'] = df['text'].apply(textpreprocess)
df['processed_subject'] = df['subject'].apply(textpreprocess)

# Combine processed text columns into a single column
df['combined_processed_text'] = (
    df['processed_title'] + ' ' +
    df['processed_text'] + ' ' +
    df['processed_subject']
)

## Vectorize Text Data

In [21]:
tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
text_matrix = tfidf_vectorizer.fit_transform(df['combined_processed_text'])

## Split into Features and Target

In [22]:
X = text_matrix
y = df['class']

# Step 4: Split into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Train Logistic Regression Model

In [23]:
logreg_model = LogisticRegression()
logreg_model.fit(X_train, y_train)

# Make Predictions and Evaluate

# Make predictions on the test set
y_pred = logreg_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
# classification_report_result = classification_report(y_test, y_pred)

# Print the results
print(f"Accuracy: {accuracy:.2f}")
# print("Classification Report:\n", classification_report_result)


Accuracy: 0.99


## Train Random Forest Model

In [24]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create Random Forest model
random_forest_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
random_forest_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = random_forest_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)

# Print the results
print(f"Random Forest Accuracy: {accuracy:.2f}")

Random Forest Accuracy: 1.00


## Manual Testing

In [25]:
def output_label(n):
    if n == 0:
        return "It's a Fake News"
    elif n == 1:
        return "It's a Real News"
    
def manual_testing(news, logreg_model, random_forest_model, tfidf_vectorizer):
    testing_news = {"text": [news]}
    new_def_test = pd.DataFrame(testing_news)
    # Assuming no separate text preprocessing is needed
    new_x_test = new_def_test["text"]
    # Assuming no separate vectorization is needed
    new_xv_test = tfidf_vectorizer.transform(new_x_test)  # Make sure to replace 'tfidf_vectorizer' with your actual vectorization method

    y_pred_logreg = logreg_model.predict(new_xv_test)
    y_pred_rf = random_forest_model.predict(new_xv_test)

    print("\n\nLogistic Regression Prediction: {}".format(output_label(y_pred_logreg[0])))
    print("Random Forest Prediction: {}".format(output_label(y_pred_rf[0])))

# Example usage:
news = str(input("Enter news: "))
manual_testing(news, logreg_model, random_forest_model, tfidf_vectorizer)


Enter news: john


Logistic Regression Prediction: It's a Fake News
Random Forest Prediction: It's a Fake News


## Exporting the model for Creating USER INTERFACE

In [26]:
import pickle

model_save_path = "./model/logisticreg_model.pkl"
# Save the model using pickle
with open(model_save_path, 'wb') as file:
    pickle.dump(logreg_model, file)

## Exporting Vectorization Method for integrating in User Interface

In [27]:

# Create and fit the TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
text_matrix = tfidf_vectorizer.fit_transform(df['combined_processed_text'])

# Save the TF-IDF vectorizer
vectorizer_save_path = "./model/tfidf_vectorizer.pkl"
with open(vectorizer_save_path, 'wb') as vectorizer_file:
    pickle.dump(tfidf_vectorizer, vectorizer_file)