<a href="https://colab.research.google.com/github/Rupsha-Chatterjee/my_projects/blob/main/FakeNewsDetection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **What is Fake News?**

The fundamental definition of fake news is information that leads people wrong. Nowadays, fake news spreads like wildfire, and people share it without confirming it. This is frequently done to advance or enforce specific beliefs and is frequently accomplished through political agendas.

The ability to draw users to media organizations' websites is required to create online advertising revenue. As a result, it is vital to recognize fake news.

## **Step 1: Importing Libraries.**

In [1]:
import pandas as pd  ## offers quick, adaptable, and expressive data structures
import numpy as np  ## used to manipulate arrays
import seaborn as sns  ## uses Matplotlib as its foundation to plot graphs and is used to see random distributions
import matplotlib.pyplot as plt  ## a graphing library used for integrating charts
from sklearn.model_selection import train_test_split  ## compare the output of our own machine-learning model to that of other machines
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report  ## used to assess the accuracy of a classification algorithm's predictions. How many predictions are correct and how many are incorrect?
import re  ## used to determine whether a given text fits a given regular expression
import string

## **Step 2: Importing the Dataset**

In [2]:
data_fake = pd.read_csv('/content/drive/MyDrive/Fake news Detection Dataset/Fake.csv')
data_true = pd.read_csv('/content/drive/MyDrive/Fake news Detection Dataset/True.csv')

In [3]:
data_fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [4]:
data_true.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


## **Step 3: Assigning Classes to the Dataset**

In [5]:
data_fake['class'] = 0
data_true['class'] = 1

## **Step 4: Checking Number of Rows and Columns in the Dataset**

In [7]:
data_fake.shape, data_true.shape

((23481, 5), (21417, 5))

## **Step 5: Manual Testing for Both the Dataset**

The process of manually checking software for faults is known as manual testing. It requires a tester to act like an end user, using the majority of the application's capabilities to ensure proper behavior.

In [8]:
data_fake_manual_testing = data_fake.tail(10)
for i in range (23480, 23470,-1):
  data_fake.drop([i], axis= 0, inplace=True)

data_true_manual_testing = data_true.tail(10)
for i in range (21416, 21406, -1):
  data_true.drop([i], axis= 0, inplace=True)

## **Step 6: Assigning Classes to the Dataset**

In [10]:
data_fake_manual_testing['class'] = 0
data_true_manual_testing['class'] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_fake_manual_testing['class'] = 0
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_true_manual_testing['class'] = 1


## **Step 7: Merging Both the Dataset**

In [11]:
data_merge = pd.concat([data_fake, data_true], axis = 0) ## add or merge two dataset using concat function
data_merge.head(10)

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0
5,Racist Alabama Cops Brutalize Black Boy While...,The number of cases of cops brutalizing and ki...,News,"December 25, 2017",0
6,"Fresh Off The Golf Course, Trump Lashes Out A...",Donald Trump spent a good portion of his day a...,News,"December 23, 2017",0
7,Trump Said Some INSANELY Racist Stuff Inside ...,In the wake of yet another court decision that...,News,"December 23, 2017",0
8,Former CIA Director Slams Trump Over UN Bully...,Many people have raised the alarm regarding th...,News,"December 22, 2017",0
9,WATCH: Brand-New Pro-Trump Ad Features So Muc...,Just when you might have thought we d get a br...,News,"December 21, 2017",0


## **Step 8: Dropping Unwanted Columns**

In [12]:
data = data_merge.drop(['title', 'subject', 'date'], axis = 1) ## deletes the given row or column

## **Step 9: Create a Function to Clean Text**

In [13]:
def wordopt (text):
  text = text.lower()
  text = re.sub ('\[*?\]', '', text)
  text= re.sub("\\W", " ", text)
  text = re.sub('https?://\S+ |www\.\S+', '', text)
  text = re.sub('<.*?>+', '', text)
  text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
  text = re.sub('\n', '', text)
  text = re.sub('\w\d\w*', '', text)
  return text

re.escape() : Automatically escaping each space.

re.sub() : The Python Regular Expressions (re) module contains the sub() method. All instances of the supplied pattern that match are replaced by the replace string in the returned string. We must import the re-module first before we can utilize this function.

string.punctuation : A pre-initialized string called punctuation is utilized as a string constant. Python's string.punctuation function returns all available punctuation

## **Step 10: Applying Function to Text Column and Assigning X and Y**

In [18]:
data['text'] = data['text'].apply(wordopt) ## applies a function that is provided as input to a whole DataFrame
x = data['text']
y = data['class']

## **Step 11: Defining Training and Testing Data and Splitting Them Into &5 -25 Percent Ratio.**

In [19]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.25)

## **Step 12: Converting Raw Data Into Matrix for Further Process.**

In [20]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)

TfidVectorizer : The TfidfVectorizer turns a set of raw documents into a TF-IDF feature matrix.

fit_transform : It is used to train data in order to scale it and learn the scaling parameters.

## **Step 13: Creating First Model.**

In [21]:
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(xv_train, y_train)

LogisticRegression : Based on a collection of independent variables, logistic regression assesses the likelihood of an event occurring, such as voting or not voting. Because the outcome is a probability, the dependent variable is limited to values between 0 and 1.

LR.fit : Linear regression fits a line to the data in order to predict a new quantity, whereas logistic regression fits a line in order to optimally distinguish the two classes. The input data is given by X with n examples, and the output by y with one output for each input.

## **Step 14: Checking the Model Accuracy and Classification Report**

In [28]:
pred_lr = LR.predict(xv_test)
LR.score(xv_test, y_test)
print("Logistic Regression Model Accuracy: ", accuracy_score(y_test, pred_lr), "\n")
print("Classification Report: \n")
print(classification_report(y_test, pred_lr))

Logistic Regression Model Accuracy:  0.9871657754010695 

Classification Report: 

              precision    recall  f1-score   support

           0       0.99      0.98      0.99      5870
           1       0.98      0.99      0.99      5350

    accuracy                           0.99     11220
   macro avg       0.99      0.99      0.99     11220
weighted avg       0.99      0.99      0.99     11220



## **Step 15: Creating a Second Model.**

In [29]:
from sklearn.tree import DecisionTreeClassifier
DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)

DecisionTreeClassifier : The DecisionTreeClassifier class may conduct multi-class classification on a dataset. If numerous classes have the same and highest probability, the classifier will forecast the class with the lowest index among those classes.

## **Step 16: Checking the Model Accuracy and Classification Report**

In [33]:
pred_dt = DT.predict(xv_test)
DT.score (xv_test, y_test)
print("Decision Tree Model Accuracy: ", accuracy_score(y_test, pred_dt), "\n")
print("Classification Report: \n")
print(classification_report (y_test, pred_dt))

Decision Tree Model Accuracy:  0.9959893048128342 

Classification Report: 

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5870
           1       1.00      1.00      1.00      5350

    accuracy                           1.00     11220
   macro avg       1.00      1.00      1.00     11220
weighted avg       1.00      1.00      1.00     11220



## **Step 17: Checking Fake News**

In [38]:
def output_lable(n):
  if n == 0:
    return "Fake News"
  elif n == 1:
    return "Not A Fake News"

def manual_testing(news):
  testing_news = {"text":[news]}
  new_def_test = pd.DataFrame(testing_news)
  new_def_test["text"] = new_def_test["text"].apply(wordopt)
  new_x_test = new_def_test["text"]
  new_xv_test = vectorization.transform(new_x_test)
  pred_LR = LR.predict(new_xv_test)
  pred_DT = DT.predict(new_xv_test)

  return print("\n\nLR Prediction: {} \nDT Prediction: {}".format(output_lable(pred_LR[0]), output_lable(pred_DT[0])))

In [39]:
news = str(input())
manual_testing(news)

 Pro-Russian users have often repeated the Kremlin's original position that the invasion of Ukraine is a "special military operation" to "denazify" and "demilitarise" a "Neo-Nazi state". Many have downpl ayed allegations of Russian war crimes or even claimed that the war is a supposed "hoax". In one wid ely shared video, a news reporter could be seen standing in front of lines of body bags, one of which was moving. However, the footage did not show invented war casualties in Ukraine, but a "Fridays for Future" climate change protest in Vienna in February, three weeks before the invasion began. Days la ter, another viral video of a mannequin claimed to be proof that Ukrainian authorities had "staged" t he mass killing of civilians in the town of Bucha. The misleading clip showed a prosthetic doll bein g dressed and prepared by two men. Nadezhda, an assistant director for a Russian television programm e, confirmed to Euronews that the video showed their film set near St. Petersburg and n