# Importing all Libraries
- numpy
```python
    pip3 install numpy
```
- pandas
```python
    pip3 install pandas
```
- scikit
```python
    pip install -U scikit-learn
```

In [1]:
import itertools
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.linear_model import PassiveAggressiveClassifier

# Importing Data 
Here Data was in my local folder which is a **CSV** file.
<br>
[The file can be downloaded from here](https://drive.google.com/file/d/1er9NJTLUA3qnRuyhfzuN0XUsoIC4a-_q/edit)



In [2]:
final_data = pd.read_csv("news.csv")
final_data.shape

(6335, 4)

# Glance at Data Set

In [3]:
print(final_data)

      Unnamed: 0                                              title  \
0           8476                       You Can Smell Hillary’s Fear   
1          10294  Watch The Exact Moment Paul Ryan Committed Pol...   
2           3608        Kerry to go to Paris in gesture of sympathy   
3          10142  Bernie supporters on Twitter erupt in anger ag...   
4            875   The Battle of New York: Why This Primary Matters   
...          ...                                                ...   
6330        4490  State Department says it can't find emails fro...   
6331        8062  The ‘P’ in PBS Should Stand for ‘Plutocratic’ ...   
6332        8622  Anti-Trump Protesters Are Tools of the Oligarc...   
6333        4021  In Ethiopia, Obama seeks progress on peace, se...   
6334        4330  Jeb Bush Is Suddenly Attacking Trump. Here's W...   

                                                   text label  
0     Daniel Greenfield, a Shillman Journalism Fello...  FAKE  
1     Google Pinter

# Creating the labels


In [4]:
labels = final_data.label

# Printing labels

In [5]:
print(labels)

0       FAKE
1       FAKE
2       REAL
3       FAKE
4       REAL
        ... 
6330    REAL
6331    FAKE
6332    FAKE
6333    REAL
6334    REAL
Name: label, Length: 6335, dtype: object


# Spliting the Data Set into Training Data Set and Testing Data Set

In [6]:
x_train , x_test, y_train, y_test = train_test_split(final_data['text'], labels, test_size=0.2, random_state=0)

# Counting frequency of words
Here we used **TfidfVectorizer** which count the frequency of specifice words in the given test.

In [7]:
tfidf_vector = TfidfVectorizer(stop_words="english",max_df=0.7)
tfidf_train = tfidf_vector.fit_transform(x_train)
tfidf_test = tfidf_vector.transform(x_test)

# Training Algorithm 
Here we used **Passive Aggressive Classifier** to train our algorithm on the Training Data Set

In [8]:
pac = PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

PassiveAggressiveClassifier(max_iter=50)

# Prediction with Testing Data Set

In [9]:
y_predicted = pac.predict(tfidf_test)

# Comparing our results with Actual Labels

In [10]:
results = pd.DataFrame({"Actual":y_test,"Predicted":y_predicted})
print(results)

     Actual Predicted
3789   REAL      REAL
733    FAKE      FAKE
4783   FAKE      FAKE
3067   FAKE      FAKE
5288   REAL      REAL
...     ...       ...
5121   REAL      REAL
6112   REAL      REAL
2661   FAKE      FAKE
59     REAL      REAL
4573   REAL      FAKE

[1267 rows x 2 columns]


# Getting accuracy Score of our Data Set

In [11]:
score = accuracy_score(y_test,y_predicted)
print(f"Accuracy:{round(score*100,2)}%")

Accuracy:93.13%


# Building a Confusion Matrix

In [12]:
confusion_matrix(y_test,y_predicted,labels=['FAKE','REAL'])

array([[570,  45],
       [ 42, 610]], dtype=int64)