## Fake News Classifier
This project is created to identify the fake news articles.Fake news is false or misleading information presented as news. It often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.
<br>Aim of this model is to classify the news article as "FAKE" or "REAL".<br>
<br><code>Dataset link : <link>https://drive.google.com/file/d/1-uryVEgAsSPb8NWnj2c6GXQNhuEPP-8z/view?usp=sharing</link></code>
#### Tools used for classification process:
1. **Tfidf Vectorizer :** It can be seperated in two parts i.e. (a)**Tf (Term Frequency)** : It means how many times a word occur in a document.A higher value means a term appears more often than others, and so, the document is a good match when the term is part of the search terms.(b)**IDF (Inverse Document Frequency)** : The inverse document frequency is a measure of whether a term is common or rare in a given document corpus. It is obtained by dividing the total number of documents by the number of documents containing the term in the corpus.
2. **Passive Aggressive Classifier :** Passive Aggressive algorithms are high level algorithms.This are few of the online algorithms,and are used for large scale data.These algorithms takes data continuously rather than taking a whole chukn of data.Passive Aggressive algorithms have two parts :**1. Passive:** If the prediction is correct, keep the model and do not make any changes. i.e., the data in the example is not enough to cause any changes in the model. **2. Aggressive:** If the prediction is incorrect, make changes to the model. i.e., some change to the model may correct it. 
<br>

#### Steps to Follow :
1. Import necessary liberaries given in the code.
2. Load the csv data file into the dataframe.
3. Do the required preprocessing like checking for null values and  droping them as we cannot impute the text data.
4. Create Training and test data using our **"text"** column containig the raw text data and **"label"** column.
5. One of the most important step in this model is to convert the text data in a sparse matrix using **TfidfVectorizer**(Convert a collection of raw documents to a matrix of TF-IDF features).
6. Now Fit the tfidf features and tfidf target variables in the machine learning model i.e. **PassiveAggressiveClassifier**.And use test data for accuracy and other performance metrics.


### Importing Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,accuracy_score
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

### Loading the dataset

In [2]:
data=pd.read_csv('news.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


### Checking For null values

In [3]:
data.shape,data.isnull().sum()

((6335, 4),
 Unnamed: 0    0
 title         0
 text          0
 label         0
 dtype: int64)

### Performing Train Test Split

In [4]:
x_train,x_test,y_train,y_test=train_test_split(data['text'],data['label'],test_size=0.2,random_state=42)
x_train.shape,x_test.shape,y_train.shape,y_test.shape

((5068,), (1267,), (5068,), (1267,))

### Creating the sparse matrix with tfidf features 

In [5]:
tfidf_vectorizer=TfidfVectorizer(stop_words='english',max_df=0.7)
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)

### Creating and Fitting the Classifier

In [6]:
PA_classifier=PassiveAggressiveClassifier(max_iter=80)
PA_classifier.fit(tfidf_train,y_train)

PassiveAggressiveClassifier(max_iter=80)

### Prediction and model accuracy

In [7]:
predict=PA_classifier.predict(tfidf_test)
predict

array(['FAKE', 'FAKE', 'FAKE', ..., 'REAL', 'REAL', 'REAL'], dtype='<U4')

In [8]:
print('Accuracy of the mode : ',accuracy_score(y_test,predict))
print('-'*50)
print('Classification report :','\n',classification_report(y_test,predict))

Accuracy of the mode :  0.9344909234411997
--------------------------------------------------
Classification report : 
               precision    recall  f1-score   support

        FAKE       0.93      0.94      0.93       628
        REAL       0.94      0.93      0.93       639

    accuracy                           0.93      1267
   macro avg       0.93      0.93      0.93      1267
weighted avg       0.93      0.93      0.93      1267



### Below is a custom data for testing this model
This is a fake news article taken from : <code><link>https://www.factcheck.org/2020/02/fake-coronavirus-cures-part-2-garlic-isnt-a-cure/</link></code>

In [9]:
##below data is fake
fake_test=['Treatments billed as miracle cures have cropped up across the internet since the new coronavirus began spreading in Wuhan, China, at the end of December.One rumor claims that consuming garlic will treat the illness, which the World Health Organization has now named COVID-19. Another says loading up on vitamin C will do the trick. Yet another would have people, essentially, drink bleach. None of these will treat or cure the virus.We’re addressing each of these claims in separate articles. Here, we examine the claim that boiled garlic will “cure” the virus.A recipe circulating on social media is spreading this false information: “Good news, Wuhan’s corona virus can be cured by one bowl of freshly boiled garlic water. Old Chinese doctor has proven it’s efficacy.  Many patients has also proven this to be effective. Eight (8) cloves of chopped garlics add seven (7)cups of water and bring to boil.,  Eat and drink the boiled garlic water, overnight improvement and healing. Glad to share this.”More important than thetypos in the post is the bogus claim that consuming garlic can treat a virus that has killed 1,018 people, as of Feb. 11.There are no vaccines or antiviral treatments that are recommended to prevent or treat the new coronavirus, according to the Centers for Disease Control and Prevention. Patients, however, can receive supportive care to treat their symptoms.But the claim about garlic has spread widely enough that the WHO knocked it down, saying on a webpage dedicated to rumors about the virus: “Garlic is a healthy food that may have some antimicrobial properties. However, there is no evidence from the current outbreak that eating garlic has protected people from the new coronavirus.”Garlic has a reputation for being antimicrobial and antiviral, according to a systematic review cited by the National Center for Complementary and Integrative Health, which researches alternative medicine. But that review included only one trial and found that there was insufficient evidence that garlic could prevent or treat a common cold, according to the NCCIH.It is generally safe to eat garlic in the amount usually found in food, according to the NCCIH. But garlic supplements could increase the risk of bleeding for those takin blood thinners, and garlic can interfere with the effectiveness of some drugs. The center recommends consulting a doctor before using alternative medicines.']


In [10]:
tfidf_fake_test=tfidf_vectorizer.transform(fake_test)


In [11]:
PA_classifier.predict(tfidf_fake_test)

array(['FAKE'], dtype='<U4')