**Importing the dependencies**

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk


Reading the data

In [4]:
data = pd.read_csv('/content/data.csv') 

In [5]:
data.head()

Unnamed: 0,URLs,Headline,Body,Label
0,http://www.bbc.com/news/world-us-canada-414191...,Four ways Bob Corker skewered Donald Trump,Image copyright Getty Images\nOn Sunday mornin...,1
1,https://www.reuters.com/article/us-filmfestiva...,Linklater's war veteran comedy speaks to moder...,"LONDON (Reuters) - “Last Flag Flying”, a comed...",1
2,https://www.nytimes.com/2017/10/09/us/politics...,Trump’s Fight With Corker Jeopardizes His Legi...,The feud broke into public view last week when...,1
3,https://www.reuters.com/article/us-mexico-oil-...,Egypt's Cheiron wins tie-up with Pemex for Mex...,MEXICO CITY (Reuters) - Egypt’s Cheiron Holdin...,1
4,http://www.cnn.com/videos/cnnmoney/2017/10/08/...,Jason Aldean opens 'SNL' with Vegas tribute,"Country singer Jason Aldean, who was performin...",1


In [6]:
data.shape

(4009, 4)

In [7]:
data.describe()

Unnamed: 0,Label
count,4009.0
mean,0.466949
std,0.498969
min,0.0
25%,0.0
50%,0.0
75%,1.0
max,1.0


In [8]:
data.isna().sum() #to detect the null value


URLs         0
Headline     0
Body        21
Label        0
dtype: int64

In [9]:
data['Body'] = data['Body'].fillna('') #to remove null value

In [10]:
data.isna().sum()

URLs        0
Headline    0
Body        0
Label       0
dtype: int64

Feature Selection

In [11]:
data['news'] = data['Headline'] + data['Body']

In [12]:
data.head()

Unnamed: 0,URLs,Headline,Body,Label,news
0,http://www.bbc.com/news/world-us-canada-414191...,Four ways Bob Corker skewered Donald Trump,Image copyright Getty Images\nOn Sunday mornin...,1,Four ways Bob Corker skewered Donald TrumpImag...
1,https://www.reuters.com/article/us-filmfestiva...,Linklater's war veteran comedy speaks to moder...,"LONDON (Reuters) - “Last Flag Flying”, a comed...",1,Linklater's war veteran comedy speaks to moder...
2,https://www.nytimes.com/2017/10/09/us/politics...,Trump’s Fight With Corker Jeopardizes His Legi...,The feud broke into public view last week when...,1,Trump’s Fight With Corker Jeopardizes His Legi...
3,https://www.reuters.com/article/us-mexico-oil-...,Egypt's Cheiron wins tie-up with Pemex for Mex...,MEXICO CITY (Reuters) - Egypt’s Cheiron Holdin...,1,Egypt's Cheiron wins tie-up with Pemex for Mex...
4,http://www.cnn.com/videos/cnnmoney/2017/10/08/...,Jason Aldean opens 'SNL' with Vegas tribute,"Country singer Jason Aldean, who was performin...",1,Jason Aldean opens 'SNL' with Vegas tributeCou...


In [13]:
data.drop(['URLs','Headline', 'Body'], axis=1, inplace= True )#features that are not needed

In [14]:
data.head()

Unnamed: 0,Label,news
0,1,Four ways Bob Corker skewered Donald TrumpImag...
1,1,Linklater's war veteran comedy speaks to moder...
2,1,Trump’s Fight With Corker Jeopardizes His Legi...
3,1,Egypt's Cheiron wins tie-up with Pemex for Mex...
4,1,Jason Aldean opens 'SNL' with Vegas tributeCou...


Importing vectorization technique TF-IDF, model libraries, accuracy measure libraries

In [15]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split

Vectorization

In [16]:
count_vec = CountVectorizer(stop_words='english',
                           ngram_range=(1,2),
                        lowercase=True,)

In [17]:


tfidf = TfidfVectorizer(tokenizer=nltk.word_tokenize,
                       stop_words='english', ngram_range=(1,2),
                       lowercase=True,
                       max_features=1024,
                       min_df=3)

Splitting the dataset

1 --> Not a fake news

0 -->fake news

In [18]:
X = data['news']
y = data['Label']

In [19]:
X[130] #reading data in 130

'Hungary’s “Wall” Versus the U.S. “Wall”Op-Ed by Catherine J. Frompovich\nCurrently in the European Union, there’s a huge “war of words,” including an action plan being implemented against what’s called the “Soros Plan” [4]. Soros like in George Soros [1, 2] who helped round up fellow Jews in his native Hungary during the Hitler/Nazi occupation and purges.\nThe prime “mover and shaker” against the Soros Plan is none other than the current Prime Minister of Hungary, Viktor Orban, who says “The whole of the European Union is in trouble because its leaders and bureaucrats adopt decisions like this,” e.g., Hungary’s being told to take in one million refugees a year whereas, in reality, Hungary actually built a wall to keep out immigrants. The wall has been 99% effective. That’s something not in the Soros Plan nor acceptable to some Eurocrats.\nSource: Jack Montgomery ن Hungary builds a wall; cuts illegal immigration by over 99 per cent. Lessons for President Trump…? http://www.breitbart.co

In [20]:
y[130] #labeling weather it is 0 or 1

0

In [21]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [22]:
X_tfidf = tfidf.fit_transform(X)



In [23]:
model = PassiveAggressiveClassifier(C = 0.5, random_state = 5) #model fitting


In [24]:
X_tfidf_train, X_tfidf_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.1)

In [25]:
model.fit(X_tfidf_train, y_train)


In [26]:
import joblib
joblib.dump(tfidf, 'TF-idf-news vectorizer.pkl')

['TF-idf-news vectorizer.pkl']

In [27]:
joblib.dump(model, 'Fake_news_detection_model.pkl')

['Fake_news_detection_model.pkl']

In [28]:
y_pred = model.predict(X_tfidf_test)

In [29]:
print(classification_report(y_test, y_pred))#for checking accuracy

              precision    recall  f1-score   support

           0       0.99      1.00      0.99       202
           1       0.99      0.98      0.99       199

    accuracy                           0.99       401
   macro avg       0.99      0.99      0.99       401
weighted avg       0.99      0.99      0.99       401



In [30]:
news = [ 'A black box that documents all of the activities of an aircraft is a crucial device for air crash investigation.', 


 'China will always be with Nepal on the path of stability, development and prosperity,  Ambassador Chen tells Dahal.',
 
 
 'Amitbh bacchan is passed away',
 
    


]

news_tfidf = tfidf.transform(news) 

In [31]:
model.predict(news_tfidf)# predict result

array([1, 1, 0])

In [32]:
confusion_matrix(y_test,y_pred,labels=[0, 1])

array([[201,   1],
       [  3, 196]])