<h1 style='text-align: center;'>Fake News Detection</h1>

### Introduction:  
<div align="justify">This notebook contains the code to build fake news detection model. Fake news detection has been a tremendously challenging problem that affects real-world politics and information propagation. The fact that content spreads so quickly and easily suggests that people (and algorithms behind the platforms) are potentially vulnerable to misinformation, be it accidental or intentional. Despite systematic efforts and fact-checking against misinformation, fake news still persists, leading people to see and share misleading information.</div>

### Objective:  
<div align="justify">
The main objective of this project is to build fake news detection system using various machine learning models. The models included in this study are:</div>

- Logistic Regression<br>
- Decision Tree Classification<br>
- Gradient Boosting Classifier<br>
- Random Forest Classifier<br>

### Dataset:  
<div align="justify">The dataset contains two types of articles: fake and real news. This dataset was collected from real-world sources. The truthful articles were obtained by crawling articles from Reuters.com, a news website. The fake news articles were collected from different sources, including unreliable websites flagged by PolitiFact (a fact-checking organization in the USA) and Wikipedia.
The dataset contains different types of articles on various topics, with the majority focusing on political and world news.</div><br>

**The dataset consists of two CSV files:** 
- True.csv
- Fake.csv

**Each article contains the following information:**<br>
- **Title:** The title of the article.
- **Text:** The full text of the article
- **Subject:** The subject of the article
- **Date:** The date the article was published<br>

<div align="justify">The data was collected primarily from 2016 to 2017. The collected data was cleaned and processed, but the punctuations and mistakes that existed in the fake news were kept in the text.</div>

### Analysis Approach:    
<div align="justify">To tackle this problem effectively, I have established a structured data analysis approach. <br>

- **Data Preprocessing:** Clean and preprocess the data, ensuring that text data is ready for model training
- **Model Training:** Train multiple models (Logistic Regression, Decision Tree Classification, Gradient Boosting Classifier, Random Forest Classifier) on the dataset
- **Evaluation:** Evaluate the performance of each model using appropriate metrics
- **Comparative Analysis:** Provide a comprehensive analysis comparing the performance of each model
 </div>

### To Import Libraries:

In [1]:
# Importing pandas for data manipulation and analysis:
import pandas as pd

# Importing numpy for numerical computations:
import numpy as np

# Importing seaborn for statistical data visualization:
import seaborn as sns

# Importing matplotlib.pyplot for plotting graphs:
import matplotlib.pyplot as plt

# Importing train_test_split from sklearn.model_selection to split data into training and testing sets:
from sklearn.model_selection import train_test_split

# Importing accuracy_score from sklearn.metrics for evaluating model accuracy:
from sklearn.metrics import accuracy_score

# Importing classification_report from sklearn.metrics for generating classification metrics:
from sklearn.metrics import classification_report

# Importing re for regular expressions:
import re

# Importing string for string operations:
import string

### To Import Dataset:

In [4]:
# To define the file paths:
fake_news_csv = r"C:\Users\shish\Desktop\Fake_news_detection_project\Fake.csv"
true_news_csv = r"C:\Users\shish\Desktop\Fake_news_detection_project\True.csv"

# To load the CSV files into pandas DataFrames:
df_fake = pd.read_csv(fake_news_csv)
df_true = pd.read_csv(true_news_csv)

In [5]:
# To display the first few rows of the df_fake DataFrame:
df_fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [6]:
# To display the first five rows of the df_true DataFrame:
df_true.head(5)

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


### To Insert A Column "class" As Target Feature:

In [7]:
# To assign a class label of 0 to the df_fake DataFrame:
df_fake["class"] = 0

# To assign a class label of 1 to the df_true DataFrame:
df_true["class"] = 1

In [8]:
# To display the shape of df_fake DataFrame:
df_fake.shape

# To display the shape of df_true DataFrame:
df_true.shape

(21417, 5)

In [9]:
# To create a DataFrame for manual testing with the last 10 rows of df_fake:
df_fake_manual_testing = df_fake.tail(10)

# To remove the last 10 rows from df_fake for training:
for i in range(23480, 23470, -1):
    df_fake.drop([i], axis=0, inplace=True)
    
# To create a DataFrame for manual testing with the last 10 rows of df_true:
df_true_manual_testing = df_true.tail(10)

# To remove the last 10 rows from df_true for training:
for i in range(21416, 21406, -1):
    df_true.drop([i], axis=0, inplace=True)

In [10]:
# To display the updated shape of df_fake DataFrame after removing last 10 rows:
df_fake.shape

# To display the updated shape of df_true DataFrame after removing last 10 rows:
df_true.shape

(21407, 5)

In [11]:
# To Assign a class label of 0 to df_fake_manual_testing for manual testing:
df_fake_manual_testing["class"] = 0

# To Assign a class label of 1 to df_true_manual_testing for manual testing:
df_true_manual_testing["class"] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fake_manual_testing["class"] = 0
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_true_manual_testing["class"] = 1


In [12]:
# To display the first ten rows of df_fake_manual_testing for manual testing:
df_fake_manual_testing.head(10)

Unnamed: 0,title,text,subject,date,class
23471,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
23472,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
23473,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
23474,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0
23475,Hillary Clinton: ‘Israel First’ (and no peace ...,Robert Fantina CounterpunchAlthough the United...,Middle-east,"January 18, 2016",0
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016",0
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016",0
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016",0
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016",0
23480,10 U.S. Navy Sailors Held by Iranian Military ...,21st Century Wire says As 21WIRE predicted in ...,Middle-east,"January 12, 2016",0


In [13]:
# To display the first ten rows of df_true_manual_testing for manual testing:
df_true_manual_testing.head(10)

Unnamed: 0,title,text,subject,date,class
21407,"Mata Pires, owner of embattled Brazil builder ...","SAO PAULO (Reuters) - Cesar Mata Pires, the ow...",worldnews,"August 22, 2017",1
21408,"U.S., North Korea clash at U.N. forum over nuc...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21409,"U.S., North Korea clash at U.N. arms forum on ...",GENEVA (Reuters) - North Korea and the United ...,worldnews,"August 22, 2017",1
21410,Headless torso could belong to submarine journ...,COPENHAGEN (Reuters) - Danish police said on T...,worldnews,"August 22, 2017",1
21411,North Korea shipments to Syria chemical arms a...,UNITED NATIONS (Reuters) - Two North Korean sh...,worldnews,"August 21, 2017",1
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017",1
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017",1
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017",1
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017",1
21416,Indonesia to buy $1.14 billion worth of Russia...,JAKARTA (Reuters) - Indonesia will buy 11 Sukh...,worldnews,"August 22, 2017",1


In [14]:
# To concatenate df_fake_manual_testing and df_true_manual_testing vertically into df_manual_testing:
df_manual_testing = pd.concat([df_fake_manual_testing, df_true_manual_testing], axis=0)

# To save df_manual_testing to a CSV file named "manual_testing.csv":
df_manual_testing.to_csv("manual_testing.csv", index=False)

### To Merge True And Fake Dataframes:

In [15]:
# To concatenate df_fake and df_true vertically into df_merge:
df_merge = pd.concat([df_fake, df_true], axis=0)

# To display the first ten rows of df_merge:
df_merge.head(10)

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0
5,Racist Alabama Cops Brutalize Black Boy While...,The number of cases of cops brutalizing and ki...,News,"December 25, 2017",0
6,"Fresh Off The Golf Course, Trump Lashes Out A...",Donald Trump spent a good portion of his day a...,News,"December 23, 2017",0
7,Trump Said Some INSANELY Racist Stuff Inside ...,In the wake of yet another court decision that...,News,"December 23, 2017",0
8,Former CIA Director Slams Trump Over UN Bully...,Many people have raised the alarm regarding th...,News,"December 22, 2017",0
9,WATCH: Brand-New Pro-Trump Ad Features So Muc...,Just when you might have thought we d get a br...,News,"December 21, 2017",0


In [16]:
# To display the column names of df_merge DataFrame:
df_merge.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

### To Remove Unnecessary Columns:

In [17]:
# To drop columns 'title', 'subject', and 'date' from df_merge DataFrame:
df = df_merge.drop(["title", "subject", "date"], axis=1)

In [18]:
# To check for missing values (null values) in each column of the df DataFrame:
df.isnull().sum()

text     0
class    0
dtype: int64

### Random Shuffling the Dataframe:

In [19]:
# To shuffle the rows of the df DataFrame:
df = df.sample(frac=1)

In [20]:
# To display the first few rows of the shuffled df DataFrame:
df.head()

Unnamed: 0,text,class
9696,ZURICH (Reuters) - Companies should not blame ...,1
3840,America will have a fascist in the White House...,0
16887,VALLETTA (Reuters) - Daphne Caruana Galizia po...,1
19751,,0
2726,The State Department has a special cable desig...,0


In [21]:
# To reset the index of df DataFrame:
df.reset_index(inplace=True)

# To drop the 'index' column from df DataFrame:
df.drop(["index"], axis=1, inplace=True)

In [22]:
# To display the column names of the df DataFrame:
df.columns

Index(['text', 'class'], dtype='object')

In [23]:
# To display the first few rows of the df DataFrame:
df.head()

Unnamed: 0,text,class
0,ZURICH (Reuters) - Companies should not blame ...,1
1,America will have a fascist in the White House...,0
2,VALLETTA (Reuters) - Daphne Caruana Galizia po...,1
3,,0
4,The State Department has a special cable desig...,0


### To Create A Function To Process The Texts:

In [24]:
import re
import string

def wordopt(text):
    # To convert text to lowercase:
    text = text.lower()
    
    # To remove text between square brackets:
    text = re.sub('\[.*?\]', '', text)
    
    # To remove non-word characters and replace with a space:
    text = re.sub("\\W"," ",text)
    
    # To remove URLs:
    text = re.sub('https?://\S+|www\.\S+', '', text)
    
    # To remove tags:
    text = re.sub('<.*?>+', '', text)
    
    # To remove punctuation:
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    
    # To remove newline characters:
    text = re.sub('\n', '', text)
    
    # To remove words containing digits:
    text = re.sub('\w*\d\w*', '', text)
    
    return text

  text = re.sub('\[.*?\]', '', text)
  text = re.sub('https?://\S+|www\.\S+', '', text)
  text = re.sub('\w*\d\w*', '', text)


In [25]:
# To apply the wordopt function to clean the "text" column in the df DataFrame:
df["text"] = df["text"].apply(wordopt)

### To Define Dependent And Independent Variables:

In [26]:
# To assign the 'text' column from df to variable x:
x = df["text"]

# To assign the 'class' column from df to variable y:
y = df["class"]

### To Split Training And Testing:

In [27]:
# To split x and y into training and testing sets:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

### To Convert Text To Vectors:

In [28]:
from sklearn.feature_extraction.text import TfidfVectorizer

# To initialize TfidfVectorizer:
vectorization = TfidfVectorizer()

# To transform x_train into TF-IDF features:
xv_train = vectorization.fit_transform(x_train)

# To transform x_test into TF-IDF features using the same vectorization parameters as x_train:
xv_test = vectorization.transform(x_test)

### Logistic Regression:

In [29]:
from sklearn.linear_model import LogisticRegression

# To initialize Logistic Regression model:
LR = LogisticRegression()

# To fit the Logistic Regression model on xv_train and y_train:
LR.fit(xv_train, y_train)

In [30]:
# To generate predictions using the trained Logistic Regression model (LR) on xv_test:
pred_lr = LR.predict(xv_test)

In [31]:
# To calculate the accuracy score of the Logistic Regression model on xv_test and y_test:
LR.score(xv_test, y_test)

0.9877896613190731

In [32]:
from sklearn.metrics import classification_report

# To print the classification report for the Logistic Regression model's predictions:
print(classification_report(y_test, pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5935
           1       0.99      0.99      0.99      5285

    accuracy                           0.99     11220
   macro avg       0.99      0.99      0.99     11220
weighted avg       0.99      0.99      0.99     11220



### Decision Tree Classification:

In [33]:
from sklearn.tree import DecisionTreeClassifier

# To initialize Decision Tree Classifier:
DT = DecisionTreeClassifier()

# To fit the Decision Tree Classifier on xv_train and y_train:
DT.fit(xv_train, y_train)

In [34]:
# To generate predictions using the trained Decision Tree Classifier (DT) on xv_test:
pred_dt = DT.predict(xv_test)

In [35]:
# To calculate the accuracy score of the Decision Tree Classifier (DT) on xv_test and y_test:
DT.score(xv_test, y_test)

0.9964349376114082

In [36]:
from sklearn.metrics import classification_report

# To print the classification report for the Decision Tree Classifier's predictions:
print(classification_report(y_test, pred_dt))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5935
           1       1.00      1.00      1.00      5285

    accuracy                           1.00     11220
   macro avg       1.00      1.00      1.00     11220
weighted avg       1.00      1.00      1.00     11220



### Gradient Boosting Classifier:

In [37]:
from sklearn.ensemble import GradientBoostingClassifier

# To initialize Gradient Boosting Classifier with random_state set to 0 for reproducibility:
GBC = GradientBoostingClassifier(random_state=0)

# To fit the Gradient Boosting Classifier on xv_train and y_train:
GBC.fit(xv_train, y_train)

In [38]:
# To generate predictions using the trained Gradient Boosting Classifier (GBC) on xv_test:
pred_gbc = GBC.predict(xv_test)

In [39]:
# To calculate the accuracy score of the Gradient Boosting Classifier (GBC) on xv_test and y_test:
GBC.score(xv_test, y_test)

0.9953654188948307

In [40]:
from sklearn.metrics import classification_report

# To print the classification report for the Gradient Boosting Classifier's predictions:
print(classification_report(y_test, pred_gbc))

              precision    recall  f1-score   support

           0       1.00      0.99      1.00      5935
           1       0.99      1.00      1.00      5285

    accuracy                           1.00     11220
   macro avg       1.00      1.00      1.00     11220
weighted avg       1.00      1.00      1.00     11220



### Random Forest Classifier:

In [41]:
from sklearn.ensemble import RandomForestClassifier

# To initialize Random Forest Classifier with random_state set to 0 for reproducibility:
RFC = RandomForestClassifier(random_state=0)

# To fit the Random Forest Classifier on xv_train and y_train:
RFC.fit(xv_train, y_train)

In [42]:
# To generate predictions using the trained Random Forest Classifier (RFC) on xv_test:
pred_rfc = RFC.predict(xv_test)

In [43]:
# To calculate the accuracy score of the Random Forest Classifier (RFC) on xv_test and y_test:
RFC.score(xv_test, y_test)

0.9899286987522281

In [44]:
from sklearn.metrics import classification_report

# To print the classification report for the Random Forest Classifier's predictions:
print(classification_report(y_test, pred_rfc))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5935
           1       0.99      0.99      0.99      5285

    accuracy                           0.99     11220
   macro avg       0.99      0.99      0.99     11220
weighted avg       0.99      0.99      0.99     11220



### Model Testing:

In [45]:
def output_lable(n):
    # function to map predicted labels to human-readable categories:
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Not A Fake News"

def manual_testing(news):
    # To create a dictionary with the input news text:
    testing_news = {"text": [news]}
    
    # To convert the dictionary into a DataFrame:
    new_def_test = pd.DataFrame(testing_news)
    
    # To clean the text using the wordopt function:
    new_def_test["text"] = new_def_test["text"].apply(wordopt) 
    
    # To extract the cleaned text for testing:
    new_x_test = new_def_test["text"]
    
    # To transform the cleaned text into TF-IDF features using the vectorization object:
    new_xv_test = vectorization.transform(new_x_test)
    
    # To make predictions using each of the trained models:
    pred_LR = LR.predict(new_xv_test)
    pred_DT = DT.predict(new_xv_test)
    pred_GBC = GBC.predict(new_xv_test)
    pred_RFC = RFC.predict(new_xv_test)

    # To print the predictions for each model:
    return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC Prediction: {} \nRFC Prediction: {}".format(output_lable(pred_LR[0]),
                                                                                                              output_lable(pred_DT[0]), 
                                                                                                              output_lable(pred_GBC[0]), 
                                                                                                              output_lable(pred_RFC[0])))


In [47]:
# To prompt the user to enter a news article:
news = str(input())

# To call the manual_testing function with the user-provided news article:
manual_testing(news)

In [None]:
# To prompt the user to enter a news article:
news = str(input())

# To Call the manual_testing function with the user-provided news article:
manual_testing(news)

In [None]:
# To Prompt the user to enter a news article:
news = str(input())

# To Call the manual_testing function with the user-provided news article:
manual_testing(news)

SAO PAULO (Reuters) - Cesar Mata Pires, the owner and co-founder of Brazilian engineering conglomerate OAS SA, one of the largest companies involved in Brazil s corruption scandal, died on Tuesday. He was 68. Mata Pires died of a heart attack while taking a morning walk in an upscale district of S o Paulo, where OAS is based, a person with direct knowledge of the matter said. Efforts to contact his family were unsuccessful. OAS declined to comment. The son of a wealthy cattle rancher in the northeastern state of Bahia, Mata Pires  links to politicians were central to the expansion of OAS, which became Brazil s No. 4 builder earlier this decade, people familiar with his career told Reuters last year. His big break came when he befriended Antonio Carlos Magalh es, a popular politician who was Bahia governor several times, and eventually married his daughter Tereza. Brazilians joked that OAS stood for  Obras Arranjadas pelo Sogro  - or  Work Arranged by the Father-In-Law.   After years of

### Conclusion: 
 <div align="justify"> This project aims to provide insights into the effectiveness of different models for fake news detection. By comparing the performance of various models, I hope to contribute to the ongoing efforts to combat misinformation and improve the reliability of information dissemination.</div>

### Technologies Used:
- Python, version 3 
- NumPy for numerical computations
- Matplotlib and seaborn for data visualization
- Pandas for data manipulation
- Statsmodels for statistical modeling
- Sklearn for machine learning tasks
- Jupyter Notebook for interactive analysis

### References:
- Python documentations
- Python Regex (Regular Expressions)
- Stack Overflow
- Kaggle
- Medium documentations
- Deeplearning.ai

### Contact Information:
Created by https://github.com/Erkhanal - feel free to contact!