## Step 1 --> Installing required libraries

In [4]:
pip install numpy pandas sklearn

Note: you may need to restart the kernel to use updated packages.


## Step 2 ---> importing necessary files and libraries

In [5]:
import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model  import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

## Step 3 ----> Now, let’s read the data into a DataFrame(using the pandas library), and get the shape of the data and the first 5 records

In [6]:
##Reading the data
data_frame = pd.read_csv('/home/ashish/projects/Fake-NEWS-Detection/news.csv')

##Getting the head and shape of the dataframe
data_frame.head(n=700)

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL
...,...,...,...,...
695,324,Manhunt shifts to lone cabin in upstate New Yo...,The hunt for two convicts who escaped from a m...,REAL
696,5482,Armed Dakota Access Contractor Accused Of Tryi...,An armed Dakota Access security contractor con...,FAKE
697,10241,18 State Swat Team Drill In Prep for Backlash ...,Previous 18 State Swat Team Drill In Prep for ...,FAKE
698,9364,Michael Moore: Joe Blow Will Vote Trump As “Ul...,Tweet Home » Headlines » World News » Michael ...,FAKE


In [7]:
print("The shape of the dataset is:: ",end=" ")
data_frame.shape

The shape of the dataset is::  

(6335, 4)

## Step 4 ----> Getting the labels from the dataframe 

In [8]:
labels = data_frame.label
labels.head(n=700)

0      FAKE
1      FAKE
2      REAL
3      FAKE
4      REAL
       ... 
695    REAL
696    FAKE
697    FAKE
698    FAKE
699    FAKE
Name: label, Length: 700, dtype: object

## Step 5 ---->  Splitting the data into two sets i.e. training set and testing set

In [9]:
##Splitting the dataset

##sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None(it may be any integer between 0 to 42), shuffle=True, stratify=None)
x_train,x_test,y_train,y_test = train_test_split(data_frame['text'],labels, test_size=0.2,random_state=7)

In [19]:
x_train

6237    The head of a leading survivalist group has ma...
3722    ‹ › Arnaldo Rodgers is a trained and educated ...
5774    Patty Sanchez, 51, used to eat 13,000 calories...
336     But Benjamin Netanyahu’s reelection was regard...
3622    John Kasich was killing it with these Iowa vot...
                              ...                        
5699                                                     
2550    It’s not that Americans won’t elect wealthy pr...
537     Anyone writing sentences like ‘nevertheless fu...
1220    More Catholics are in Congress than ever befor...
4271    It was hosted by CNN, and the presentation was...
Name: text, Length: 5068, dtype: object

In [20]:
x_test

3534    A day after the candidates squared off in a fi...
6265    VIDEO : FBI SOURCES SAY INDICTMENT LIKELY FOR ...
3123    It's debate season, where social media has bro...
3940    Mitch McConnell has decided to wager the Repub...
2856    Donald Trump, the actual Republican candidate ...
                              ...                        
4986    Washington (CNN) President Barack Obama announ...
5789    The revival of middle-class jobs has been one ...
4338    "I can guarantee that," Obama answered when as...
5924    Videos 30 Civilians Die In US Airstrike Called...
6030    The retired neurosurgeon lashed out Friday mor...
Name: text, Length: 1267, dtype: object

In [21]:
y_train

6237    FAKE
3722    FAKE
5774    FAKE
336     REAL
3622    REAL
        ... 
5699    FAKE
2550    REAL
537     REAL
1220    REAL
4271    REAL
Name: label, Length: 5068, dtype: object

In [22]:
y_test

3534    REAL
6265    FAKE
3123    REAL
3940    REAL
2856    REAL
        ... 
4986    REAL
5789    REAL
4338    REAL
5924    FAKE
6030    REAL
Name: label, Length: 1267, dtype: object

# To Be Remembered ::

## 1. class sklearn.feature_extraction.text.TfidfVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer='word', stop_words=None, token_pattern='(?u)\b\w\w+\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.float64'>, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)

## 2. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

## 3. Stop words are words like “and”, “the”, “him”, which are presumed to be uninformative in representing the content of a text, and which may be removed to avoid them being construed as signal for prediction. Sometimes, however, similar words are useful for prediction, such as in classifying writing style or personality. There are several known issues in our provided ‘english’ stop word list.

## 4. max_df  :  float or int, default=1.0 ----- >  When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpus-specific stop words). If float in range [0.0, 1.0], the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.


## 5. fit_transform(raw_documents, y=None  ---->  Learn vocabulary and idf, return document-term matrix.

## This is equivalent to fit followed by transform, but more efficiently implemented.

# Parameters :
## raw_documents : iterable ----> An iterable which generates either str, unicode or file objects.

## y : None
##     This parameter is ignored.

## Returns :  X : sparse matrix of (n_samples, n_features)
## Tf-idf-weighted document-term matrix.




## 6. TF (Term Frequency): The number of times a word appears in a document is its Term Frequency. A higher value means a term appears more often than others, and so, the document is a good match when the term is part of the search terms.


## 7. DF (Inverse Document Frequency): Words that occur many times a document, but also occur many times in many others, may be irrelevant. IDF is a measure of how significant a term is in the entire corpus.

# --> The TfidfVectorizer converts a collection of raw documents into a matrix of TF-IDF features


## 8. fit_transform(raw_documents, y=None)  ------> Learn vocabulary and idf, return document-term matrix.
## This is equivalent to fit followed by transform, but more efficiently implemented.
## returns  :  X  :  sparse matrix of (n_samples, n_features)  = Tf-idf-weighted document-term matrix.



## 9. transform(raw_documents) -----> Transform documents to document-term matrix.
## Uses the vocabulary and document frequencies (df) learned by fit (or fit_transform).
## returns  :  X  :  sparse matrix of (n_samples, n_features)  = Tf-idf-weighted document-term matrix.

# Step 6 ----> Let’s initialize a TfidfVectorizer with stop words from the English language and a maximum document frequency of 0.7 (terms with a higher document frequency will be discarded). Stop words are the most common words in a language that are to be filtered out before processing the natural language data. 

## And a TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF features.

In [35]:
## fiting and transforming the vectorizer on the train set, and then transform the vectorizer on the test set.
tfidf_vectorizer = TfidfVectorizer(stop_words = 'english',max_df=0.7)


## Fit and transform the train_set, then transfrom the test_set
tfidf_train = tfidf_vectorizer.fit_transform(x_train)
tfidf_test = tfidf_vectorizer.transform(x_test)
for x in tfidf_train:
    print(x)
    ##print(y)
##print(tfidf_train)


  (0, 56381)	0.03622223988286098
  (0, 16314)	0.053492157980948106
  (0, 19620)	0.030351855107005405
  (0, 52607)	0.04266045446208797
  (0, 14900)	0.039165339742818085
  (0, 53749)	0.029756205182552464
  (0, 15211)	0.07772572986248194
  (0, 61154)	0.06726619958695557
  (0, 59042)	0.047893261248723944
  (0, 42972)	0.03152542343098286
  (0, 54232)	0.038673616329284524
  (0, 59249)	0.04106143649018827
  (0, 28891)	0.06514397995138038
  (0, 41708)	0.03983513460128018
  (0, 50192)	0.045331181477256094
  (0, 44691)	0.0318676439567658
  (0, 11820)	0.046381950858248124
  (0, 7682)	0.04137048243377956
  (0, 50343)	0.10196965191544219
  (0, 48095)	0.021092647294770877
  (0, 17916)	0.03674587236023286
  (0, 46027)	0.10236534701241509
  (0, 16993)	0.02775494464904786
  (0, 55006)	0.03368300200002207
  (0, 51389)	0.03397042876291898
  :	:
  (0, 15390)	0.1827409457633508
  (0, 8952)	0.06669214616557034
  (0, 58158)	0.039310370431902907
  (0, 19585)	0.06609677424160541
  (0, 4259)	0.12523995742312244

  (0, 41461)	0.08835793744233235
  (0, 50313)	0.06957374805291353
  (0, 44286)	0.06711160068054127
  (0, 35687)	0.10479634748183098
  (0, 21646)	0.09657714246208166
  (0, 358)	0.07691419509194472
  (0, 43832)	0.31438904244549293
  (0, 38162)	0.10479634748183098
  (0, 43404)	0.08013873242258303
  (0, 19194)	0.09393114944858512
  (0, 15993)	0.07268481268679404
  (0, 6279)	0.09176921573978922
  (0, 17590)	0.18353843147957843
  (0, 53288)	0.07350291679448785
  (0, 12927)	0.08260087979495528
  (0, 59178)	0.13976352067960846
  (0, 41062)	0.0794198573987367
  (0, 7600)	0.06407679229867563
  (0, 50939)	0.15917870707536813
  (0, 44290)	0.06662754137584063
  (0, 29594)	0.06426960640586873
  (0, 3113)	0.1470058335889757
  (0, 58802)	0.06466490322087202
  (0, 6302)	0.06214011541586606
  (0, 15126)	0.06593659994870966
  :	:
  (0, 5032)	0.07583546936520945
  (0, 37457)	0.04308824408786284
  (0, 55170)	0.059542981477237106
  (0, 54131)	0.07484674639558994
  (0, 40311)	0.05055031071941857
  (0, 51529)

  (0, 37596)	0.08209200708963739
  (0, 24882)	0.08209200708963739
  (0, 43725)	0.07573620370257649
  (0, 15581)	0.060977536890372545
  (0, 40111)	0.0806613172231101
  (0, 59109)	0.08209200708963739
  (0, 24881)	0.15644761099472326
  (0, 5739)	0.07768385773744053
  (0, 40114)	0.0855864307436437
  (0, 43250)	0.07768385773744053
  (0, 40946)	0.07445640800667083
  (0, 46153)	0.06874735639456389
  (0, 52975)	0.07445640800667083
  (0, 6944)	0.06644866703157215
  (0, 5456)	0.13012921221450272
  (0, 43970)	0.06900308846181222
  (0, 6075)	0.07293607327179938
  (0, 5382)	0.08287594518121073
  (0, 21150)	0.2973645801579862
  (0, 11117)	0.05912417941325474
  (0, 27608)	0.060583551287272316
  (0, 46362)	0.056928665266381
  (0, 52014)	0.07486820420611205
  (0, 33837)	0.06849689753101386
  (0, 52510)	0.06525293274055759
  :	:
  (0, 40815)	0.13108207333365016
  (0, 654)	0.03629989690226998
  (0, 27692)	0.08729589074859569
  (0, 6741)	0.031201006104128508
  (0, 39037)	0.04387439073837297
  (0, 42247)	0

  (0, 11962)	0.05308489838100385
  (0, 24143)	0.06807340941806278
  (0, 35568)	0.052056363351717864
  (0, 44898)	0.05308489838100385
  (0, 49074)	0.048933239578006174
  (0, 23872)	0.05567659966521232
  (0, 54930)	0.06495028564435108
  (0, 56672)	0.06273439406261448
  (0, 25478)	0.06807340941806278
  (0, 45871)	0.06495028564435108
  (0, 39400)	0.06807340941806278
  (0, 20662)	0.06807340941806278
  (0, 59081)	0.04747558964404894
  (0, 7608)	0.05494246624048317
  (0, 15055)	0.05427225493345448
  (0, 58895)	0.06807340941806278
  (0, 27595)	0.05648814651519109
  (0, 23206)	0.05427225493345448
  (0, 26280)	0.05308489838100385
  (0, 60133)	0.05494246624048317
  (0, 34962)	0.05842391373645216
  (0, 47041)	0.052056363351717864
  (0, 23612)	0.12990057128870217
  (0, 46926)	0.05033758430976402
  (0, 11281)	0.05114913115974278
  :	:
  (0, 55006)	0.012916485569948772
  (0, 51389)	0.02605341132722839
  (0, 38909)	0.029005631111021023
  (0, 54706)	0.01638982127576283
  (0, 59170)	0.031007217868633984

  (0, 50048)	0.044430940189534925
  (0, 61106)	0.05027934864722935
  (0, 44078)	0.04856398151223838
  (0, 51071)	0.05027934864722935
  (0, 22786)	0.044430940189534925
  (0, 49770)	0.04310039807042625
  (0, 13657)	0.04310039807042625
  (0, 25127)	0.04068272388271376
  (0, 58173)	0.04856398151223838
  (0, 11614)	0.05027934864722935
  (0, 40696)	0.0437286331368134
  (0, 15032)	0.036751827959874965
  (0, 45291)	0.04153599389039502
  (0, 28543)	0.0437286331368134
  (0, 6077)	0.03362423635692724
  (0, 29308)	0.0437286331368134
  (0, 27326)	0.03959559181410994
  (0, 10740)	0.03896735674772279
  (0, 22757)	0.03959559181410994
  (0, 2260)	0.042532090324750436
  (0, 39142)	0.04109410893458281
  (0, 21088)	0.03696106761187935
  (0, 26566)	0.0437286331368134
  (0, 57554)	0.03813399566745237
  (0, 24345)	0.037636814628614124
  :	:
  (0, 55006)	0.029996749428838468
  (0, 54706)	0.05075078762253471
  (0, 59170)	0.024003323500906584
  (0, 54235)	0.018746902470799107
  (0, 23649)	0.01219284181683396
  

  (0, 19379)	0.03156281973725951
  (0, 37708)	0.032306388518844334
  (0, 15332)	0.03321644315794951
  (0, 2529)	0.03604333106726229
  (0, 2197)	0.03156281973725951
  (0, 56907)	0.03438970764657228
  (0, 35620)	0.03604333106726229
  (0, 25675)	0.14417332426904916
  (0, 22679)	0.03604333106726229
  (0, 35618)	0.09280242387078212
  (0, 52515)	0.03093414129026071
  (0, 22840)	0.14417332426904916
  (0, 52031)	0.03604333106726229
  (0, 16451)	0.03038955524863673
  (0, 29357)	0.03604333106726229
  (0, 46107)	0.03156281973725951
  (0, 12678)	0.03438970764657228
  (0, 54887)	0.032306388518844334
  (0, 39192)	0.03321644315794951
  (0, 8517)	0.03156281973725951
  (0, 2639)	0.05621450676189586
  (0, 8603)	0.03038955524863673
  (0, 21259)	0.058959001219063105
  (0, 39096)	0.03038955524863673
  (0, 31932)	0.025742558061113592
  :	:
  (0, 50192)	0.018408060136264684
  (0, 44691)	0.02588158910667444
  (0, 46027)	0.013856153772207684
  (0, 55006)	0.013677974104809521
  (0, 54706)	0.008678039772440563
 

  (0, 41134)	0.13390857816229787
  (0, 30180)	0.14939804282471145
  (0, 41333)	0.13082651783699048
  (0, 7543)	0.26781715632459574
  (0, 47563)	0.10478601759386659
  (0, 9815)	0.11910919015304153
  (0, 44864)	0.09567453478514364
  (0, 16607)	0.09676552818685594
  (0, 10492)	0.10361972549062795
  (0, 39412)	0.090549313410139
  (0, 45736)	0.09337328428651075
  (0, 12503)	0.09962368417203049
  (0, 36661)	0.07500791466527353
  (0, 22780)	0.08882033748137162
  (0, 8501)	0.08882033748137162
  (0, 40555)	0.07056479422705168
  (0, 7587)	0.08768537908597737
  (0, 14791)	0.08260412270465468
  (0, 24249)	0.11225499284926951
  (0, 46444)	0.07223987941724576
  (0, 37345)	0.0927694868682585
  (0, 44563)	0.06628988322188467
  (0, 27588)	0.09918458076878056
  (0, 24044)	0.1457354120459455
  (0, 12541)	0.08768537908597737
  :	:
  (0, 20769)	0.05375662894104334
  (0, 9346)	0.07683064124348495
  (0, 37525)	0.055457554113131455
  (0, 43751)	0.06288076010841256
  (0, 13332)	0.07225065825786667
  (0, 11101)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



  (0, 54946)	0.2152559414134567
  (0, 39501)	0.2152559414134567
  (0, 19187)	0.3262632063441237
  (0, 23871)	0.16042254551557905
  (0, 20283)	0.13498117303367627
  (0, 39458)	0.14697613436853194
  (0, 17547)	0.13888654505889722
  (0, 45999)	0.12831492397727434
  (0, 53221)	0.13009355442457837
  (0, 26771)	0.14697613436853194
  (0, 7989)	0.10565671253026851
  (0, 5995)	0.1279742692723626
  (0, 39500)	0.11635810895305482
  (0, 50952)	0.15798487132812775
  (0, 10485)	0.11248386334020116
  (0, 13261)	0.1194907671015148
  (0, 15507)	0.10260818715756125
  (0, 39636)	0.11787410841895044
  (0, 35688)	0.10345862217848129
  (0, 6107)	0.15185931499882657
  (0, 44033)	0.107482543357204
  (0, 18263)	0.20642519139823615
  (0, 34984)	0.0896699432758738
  (0, 3414)	0.08566598315846505
  (0, 38706)	0.1306396499619947
  :	:
  (0, 60271)	0.06689347391485674
  (0, 56102)	0.14759410488430807
  (0, 17067)	0.10684238770838385
  (0, 54893)	0.07464642112564102
  (0, 31705)	0.06206102274579382
  (0, 57261)	0.07

  (0, 22708)	0.06747777063814393
  (0, 32553)	0.0643819740344684
  (0, 6009)	0.0643819740344684
  (0, 18407)	0.20243331191443178
  (0, 38861)	0.06747777063814393
  (0, 49179)	0.06747777063814393
  (0, 36922)	0.06747777063814393
  (0, 55824)	0.06747777063814393
  (0, 32388)	0.06747777063814393
  (0, 48637)	0.06747777063814393
  (0, 5929)	0.20243331191443178
  (0, 4699)	0.26991108255257573
  (0, 21261)	0.1287639480689368
  (0, 52071)	0.0643819740344684
  (0, 10867)	0.059089674784757615
  (0, 21260)	0.06218547138843314
  (0, 9622)	0.056893172138722345
  (0, 9384)	0.06048173157339993
  (0, 53732)	0.11817934956951523
  (0, 49873)	0.0507015789313713
  (0, 21090)	0.10892344505622953
  (0, 12817)	0.054461722528114764
  (0, 41517)	0.06048173157339993
  (0, 45042)	0.05160087288901155
  (0, 46346)	0.055993878181082084
  :	:
  (0, 37330)	0.033602401200018485
  (0, 26370)	0.02391568495926254
  (0, 32363)	0.01333744116173293
  (0, 13248)	0.028970342907308697
  (0, 20441)	0.04442897278068894
  (0, 35

  (0, 53639)	0.06939536460775013
  (0, 50048)	0.05850997131233508
  (0, 44456)	0.05214242442891579
  (0, 23321)	0.11351562869897754
  (0, 3633)	0.05214242442891579
  (0, 44063)	0.09384185954797633
  (0, 40399)	0.05600942438172703
  (0, 43928)	0.10823156676908399
  (0, 29658)	0.04587242105407373
  (0, 0)	0.13236079227368225
  (0, 27634)	0.09524915603384009
  (0, 26619)	0.03891673775412396
  (0, 18363)	0.04956296073893494
  (0, 5508)	0.09734617347366896
  (0, 47909)	0.04243117695712252
  (0, 17467)	0.03950487417065444
  (0, 40962)	0.04104189023825278
  (0, 43923)	0.03933229246611383
  (0, 15745)	0.03300277060355546
  (0, 13699)	0.04714854430404785
  (0, 21366)	0.05532619787062544
  (0, 39105)	0.14144563291214354
  (0, 36039)	0.10613454932925513
  (0, 58031)	0.042558857029870306
  (0, 12340)	0.09866773063380926
  :	:
  (0, 58834)	0.020184356653420242
  (0, 30281)	0.03213743885555999
  (0, 46216)	0.054087568360686226
  (0, 399)	0.028362010219997832
  (0, 260)	0.05865307581966016
  (0, 654)

  (0, 27667)	0.07670225965902844
  (0, 59976)	0.07318325491567329
  (0, 9495)	0.07670225965902844
  (0, 996)	0.07670225965902844
  (0, 1171)	0.07670225965902844
  (0, 75)	0.07318325491567329
  (0, 1221)	0.06874983325468952
  (0, 8241)	0.06716747657258837
  (0, 1595)	0.07318325491567329
  (0, 54088)	0.06045701288582887
  (0, 301)	0.06190686419133186
  (0, 48141)	0.06716747657258837
  (0, 10611)	0.04876585016418073
  (0, 9461)	0.05589108584824693
  (0, 47714)	0.057163447663744815
  (0, 44606)	0.05589108584824693
  (0, 807)	0.06874983325468952
  (0, 13442)	0.2368602006729978
  (0, 42637)	0.048425456199659
  (0, 52204)	0.047628664577304754
  (0, 47394)	0.0551359198864185
  (0, 8637)	0.056294829974206276
  (0, 45878)	0.048425456199659
  (0, 15789)	0.04662336794360377
  (0, 15031)	0.05865492462977363
  :	:
  (0, 60341)	0.03087174114526889
  (0, 50538)	0.03917339953279822
  (0, 42247)	0.055411601979944436
  (0, 25245)	0.02433110488028086
  (0, 55921)	0.028474917981299863
  (0, 21704)	0.027644

  (0, 25275)	0.09822467660903574
  (0, 24599)	0.09822467660903574
  (0, 26265)	0.09822467660903574
  (0, 21249)	0.09371825001302299
  (0, 28839)	0.07831067571396243
  (0, 45765)	0.08430119898584806
  (0, 46776)	0.17608164787807887
  (0, 46354)	0.08430119898584806
  (0, 51407)	0.08601446286349271
  (0, 36914)	0.08601446286349271
  (0, 47723)	0.07927773891022927
  (0, 49453)	0.07443952004795912
  (0, 2549)	0.0758306101934964
  (0, 16560)	0.08281710230997517
  (0, 12462)	0.07209098524030504
  (0, 15682)	0.06673573289842884
  (0, 29010)	0.07263324963997886
  (0, 6868)	0.07060688856443215
  (0, 39209)	0.07263324963997886
  (0, 25511)	0.07015318411951284
  (0, 12465)	0.08150803626747996
  (0, 52560)	0.05463501969888777
  (0, 15789)	0.05970574086138432
  (0, 59155)	0.06023932567020961
  (0, 60155)	0.05809215671482615
  :	:
  (0, 22392)	0.07015318411951284
  (0, 55827)	0.053787812331697236
  (0, 20441)	0.03233673432142053
  (0, 18720)	0.027658977513411603
  (0, 38921)	0.04220687317638146
  (0,

  (0, 1328)	0.10767314574567885
  (0, 59064)	0.09922831268354834
  (0, 39706)	0.08486838313163056
  (0, 9026)	0.09078347962141783
  (0, 45877)	0.08934849135115522
  (0, 28109)	0.09922831268354834
  (0, 14697)	0.08312493993588779
  (0, 4902)	0.2307041655618611
  (0, 1253)	0.09078347962141783
  (0, 41306)	0.07315520440365808
  (0, 20301)	0.07642355006950005
  (0, 24667)	0.08806485060208435
  (0, 40526)	0.09650968366421485
  (0, 52263)	0.09078347962141783
  (0, 2401)	0.08584356895522127
  (0, 50489)	0.07148363940330349
  (0, 46511)	0.08396550208275198
  (0, 31655)	0.06845655545848985
  (0, 8516)	0.063845943724192
  (0, 1272)	0.07739873589309076
  (0, 27010)	0.08690365629856145
  (0, 47169)	0.06664830722282386
  (0, 57232)	0.06288360272918157
  (0, 38198)	0.06335527397248611
  (0, 28628)	0.06774664677586045
  :	:
  (0, 27424)	0.03266070804332423
  (0, 38807)	0.03321519093574858
  (0, 32530)	0.039878819766169735
  (0, 32804)	0.027273110340179856
  (0, 35300)	0.034510843831496135
  (0, 26519

  (0, 60083)	0.23412807023076773
  (0, 15032)	0.16328502246986937
  (0, 49529)	0.1866611122208658
  (0, 29553)	0.20502383817788047
  (0, 15903)	0.15543640367021766
  (0, 29509)	0.17060347545069937
  (0, 15458)	0.13776181545912575
  (0, 7597)	0.13241401495000776
  (0, 19982)	0.16942586763913023
  (0, 35207)	0.14110086275259737
  (0, 52904)	0.09670481200008146
  (0, 56898)	0.12415949448460023
  (0, 8812)	0.10507655836409487
  (0, 6107)	0.08258663647975586
  (0, 29750)	0.22243480706671145
  (0, 4311)	0.13673619781590135
  (0, 36449)	0.09046972154071253
  (0, 22283)	0.08271724552167398
  (0, 12477)	0.09356808653821755
  (0, 26157)	0.1113456934386682
  (0, 2432)	0.0921599775778023
  (0, 19816)	0.26950235592543215
  (0, 2915)	0.09337160741503465
  (0, 53004)	0.17955417130339596
  (0, 22026)	0.08480500934146624
  :	:
  (0, 18983)	0.08523345771519898
  (0, 48758)	0.2848628225401159
  (0, 47616)	0.10900847348538831
  (0, 35349)	0.08447649725633359
  (0, 13359)	0.08306871752514314
  (0, 47359)	0

In [29]:
print(tfidf_test)

  (0, 60731)	0.05899712902382916
  (0, 60684)	0.033385466151529625
  (0, 60271)	0.04581143542258741
  (0, 60261)	0.07937859313949312
  (0, 59116)	0.10997273171965094
  (0, 59036)	0.08042180974421559
  (0, 58654)	0.07128159375531905
  (0, 58335)	0.0678398429566027
  (0, 57086)	0.12429244186413906
  (0, 55170)	0.20939665348422057
  (0, 54706)	0.035492943055135416
  (0, 54394)	0.10596727423829927
  (0, 54238)	0.06234899619642803
  (0, 53749)	0.04942070163765446
  (0, 53518)	0.117529167732626
  (0, 53144)	0.059982387365669215
  (0, 52555)	0.14065295472127948
  (0, 52483)	0.07360851972393109
  (0, 51960)	0.14065295472127948
  (0, 51955)	0.030414611451489323
  (0, 51663)	0.12057034351821985
  (0, 51527)	0.08134873077710283
  (0, 51159)	0.053278054236854326
  (0, 51005)	0.07360851972393109
  (0, 50068)	0.06481045119580665
  :	:
  (1266, 16835)	0.08080116269909657
  (1266, 16385)	0.0655727278454052
  (1266, 15999)	0.03279926109441314
  (1266, 14890)	0.05544599110567492
  (1266, 13110)	0.066971

# To Be Remembered
## What is a PassiveAggressiveClassifier?

## Passive Aggressive algorithms are online learning algorithms. Such an algorithm remains passive for a correct classification outcome, and turns aggressive in the event of a miscalculation, updating and adjusting. Unlike most other algorithms, it does not converge. Its purpose is to make updates that correct the loss, causing very little change in the norm of the weight vector.

## Step 7 ---> Next, we’ll initialize a PassiveAggressiveClassifier. This is where We’ll fit this PassiveAggressiveClassifier on tfidf_train and y_train.

In [31]:
## Initialising a PassiveAggressiveClassifier
pac = PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

## Making predictions on the test set 
y_pred = pac.predict(tfidf_test)


## Calculating accuracy
score  = accuracy_score(y_test,y_pred)
print(f'Accuracy is : {round(score*100,2)}%')

Accuracy is : 92.74%


#                            ACCCURACY = 92.74%

## Step 8 --->   So, We got an accuracy of 92.74% with this model. Finally, let’s print out a confusion matrix to gain insight into the number of false and true negatives and positives

In [32]:
confusion_matrix(y_test,y_pred,labels=['FAKE','REAL'])

array([[589,  49],
       [ 43, 586]])

# INSIGHT/OUTCOME -------> This implies  that in this model, we have 589 true positives, 587 true negatives, 42 false positives, and 49 false negatives.