# **Fake News Prediction System Using Machine Learning**

 Developed a Python-based machine learning model to detect fake news using **Logistic Regression**. Utilized **NLP techniques with NLTK and feature extraction** to convert text data into numerical format for accurate prediction.

## **Overview of the dataset**

The dataset consists of 2,000 entries with the following features:



*   id: A unique identifier for each news article.
*   title: The headline or title of the news article.
*   author: The name of the author who wrote the article.
*   text: The main content or body of the article.
*   label: The classification label indicating whether the news is real or fake.

The dataset provides a diverse set of news articles, allowing the model to learn and differentiate between real and fake news effectively.

In [None]:
 # Importing the packages
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import re
 import nltk
 nltk.download('stopwords')
 from nltk.corpus import stopwords
 from nltk.stem.porter import PorterStemmer
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.model_selection import train_test_split
 from sklearn.linear_model import LogisticRegression
 from sklearn.metrics import accuracy_score


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


## Data Preprocessing

In [None]:
news_dataset = pd.read_csv('/content/Fake_news_dataset.csv')

In [None]:
news_dataset.shape

(2000, 5)

In [None]:
news_dataset.head()

Unnamed: 0,id,title,author,text,label
0,0,House Dem Aide: We Didn’t Even See Comey’s Let...,Darrell Lucus,House Dem Aide: We Didn’t Even See Comey’s Let...,1
1,1,"FLYNN: Hillary Clinton, Big Woman on Campus - ...",Daniel J. Flynn,Ever get the feeling your life circles the rou...,0
2,2,Why the Truth Might Get You Fired,Consortiumnews.com,"Why the Truth Might Get You Fired October 29, ...",1
3,3,15 Civilians Killed In Single US Airstrike Hav...,Jessica Purkiss,Videos 15 Civilians Killed In Single US Airstr...,1
4,4,Iranian woman jailed for fictional unpublished...,Howard Portnoy,Print \nAn Iranian woman has been sentenced to...,1


In [None]:
#counting missing values

news_dataset.isnull().sum()

Unnamed: 0,0
id,0
title,52
author,211
text,4
label,0


In [None]:
#replacing the null values with empty string
news_dataset = news_dataset.fillna(" ")

In [None]:
# label value count
news_dataset['label'].value_counts()

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,1018
0,982


In [None]:
#merging author name and title
news_dataset['content'] = news_dataset['author'] + news_dataset['title']
print(news_dataset['content'])

0       Darrell LucusHouse Dem Aide: We Didn’t Even Se...
1       Daniel J. FlynnFLYNN: Hillary Clinton, Big Wom...
2       Consortiumnews.comWhy the Truth Might Get You ...
3       Jessica Purkiss15 Civilians Killed In Single U...
4       Howard PortnoyIranian woman jailed for fiction...
                              ...                        
1995    Guest AuthorNo, Hate Crimes Have NOT ‘Intensif...
1996                            Everything gentrification
1997    Nate ChurchUbisoft Surprises with ’Mario’ Cros...
1998    Jeff PoorTurley: Trump Making Same Arguments a...
1999    Ari LiebermanFrom Bad to Worse: Obama’s Ransom...
Name: content, Length: 2000, dtype: object


In [None]:
#separating the data labels
X = news_dataset.drop('label', axis = 1)
Y = news_dataset['label']
print(X)
print(Y)


        id                                              title  \
0        0  House Dem Aide: We Didn’t Even See Comey’s Let...   
1        1  FLYNN: Hillary Clinton, Big Woman on Campus - ...   
2        2                  Why the Truth Might Get You Fired   
3        3  15 Civilians Killed In Single US Airstrike Hav...   
4        4  Iranian woman jailed for fictional unpublished...   
...    ...                                                ...   
1995  1995  No, Hate Crimes Have NOT ‘Intensified’ Since T...   
1996  1996                          Everything gentrification   
1997  1997  Ubisoft Surprises with ’Mario’ Crossover Title...   
1998  1998  Turley: Trump Making Same Arguments as Obama i...   
1999  1999  From Bad to Worse: Obama’s Ransom Payment to I...   

                  author                                               text  \
0          Darrell Lucus  House Dem Aide: We Didn’t Even See Comey’s Let...   
1        Daniel J. Flynn  Ever get the feeling your life circ

### Stemming: the process of reducing the words to root word

In [None]:
port_stem = PorterStemmer()



In [None]:
def stemming(content):
  stemmed_content = re.sub('[^a-zA-Z]',' ', content)
  stemmed_content = stemmed_content.lower()
  stemmed_content = stemmed_content.split()
  stemmed_content = [port_stem.stem(word) for word in stemmed_content if not word in stopwords.words('english')]
  stemmed_content = ' '.join(stemmed_content)
  return stemmed_content

In [None]:
news_dataset['content'] = news_dataset['content'].apply(stemming)
print(news_dataset['content'])

0       darrel lucushous dem aid even see comey letter...
1       daniel j flynnflynn hillari clinton big woman ...
2               consortiumnew comwhi truth might get fire
3       jessica purkiss civilian kill singl us airstri...
4       howard portnoyiranian woman jail fiction unpub...
                              ...                        
1995    guest authorno hate crime intensifi sinc trump...
1996                                      everyth gentrif
1997    nate churchubisoft surpris mario crossov titl ...
1998    jeff poorturley trump make argument obama defe...
1999    ari liebermanfrom bad wors obama ransom paymen...
Name: content, Length: 2000, dtype: object


In [None]:
#separating the data and the label
X = news_dataset['content'].values
Y = news_dataset['label'].values

print(X)
print(Y)

['darrel lucushous dem aid even see comey letter jason chaffetz tweet'
 'daniel j flynnflynn hillari clinton big woman campu breitbart'
 'consortiumnew comwhi truth might get fire' ...
 'nate churchubisoft surpris mario crossov titl e press confer breitbart'
 'jeff poorturley trump make argument obama defens immigr execut order breitbart'
 'ari liebermanfrom bad wors obama ransom payment iran tip iceberg']
[1 0 1 ... 0 0 1]


In [None]:
#converting the text data to numerical data
vectorizer = TfidfVectorizer()
vectorizer.fit(X)

X = vectorizer.transform(X)
print(X)
X.shape

  (0, 5950)	0.3017426468860144
  (0, 5090)	0.26718920587942463
  (0, 3411)	0.35853965054812
  (0, 3302)	0.30873143642309686
  (0, 2948)	0.24982533665516557
  (0, 1875)	0.2541464514769773
  (0, 1460)	0.3017426468860144
  (0, 1389)	0.3401569206868321
  (0, 1135)	0.26426319775691437
  (0, 948)	0.35853965054812
  (0, 132)	0.2855719357407125
  (1, 6284)	0.37678797453550716
  (1, 2613)	0.2500672279654395
  (1, 2090)	0.4683149318779968
  (1, 1378)	0.3498990344519622
  (1, 1072)	0.2500672279654395
  (1, 859)	0.4683149318779968
  (1, 746)	0.19125536227852988
  (1, 601)	0.3678557957818516
  (2, 5934)	0.39443660287710897
  (2, 3677)	0.44438221209958667
  (2, 2261)	0.3223713124505221
  (2, 2035)	0.3290969530232737
  (2, 1199)	0.44438221209958667
  (2, 1165)	0.4870735035592132
  :	:
  (1997, 1315)	0.38191147208808407
  (1997, 1177)	0.38191147208808407
  (1997, 1028)	0.38191147208808407
  (1997, 746)	0.14797230532787073
  (1998, 5929)	0.15045543284523258
  (1998, 4378)	0.4073079025027321
  (1998, 40

(2000, 6391)

## Training and Testing: Logistic regression model

In [None]:
# Splitting the dataset to training and test data
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify = Y, random_state =2)

In [None]:
model = LogisticRegression()
model.fit(X_train, Y_train)

Evaluation:accuracy score

In [None]:
#accuracy score on the training data
X_train_prediction = model.predict(X_train)
print(X_train_prediction)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score of the training data :', training_data_accuracy)

[0 1 0 ... 1 0 1]
Accuracy score of the training data : 0.954375


In [None]:
#accuracy score on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score of the test data :', test_data_accuracy)


Accuracy score of the test data : 0.92
