<a href="https://colab.research.google.com/github/Aditya-ai0/Fake_News_Detection/blob/main/FAKE_NEWS_DETECTION_PROJECT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **FAKE NEWS DETECTION USING MACHINE LEARNING**






## **Reading the data set**

In [21]:
import pandas as pd
dataframe = pd.read_csv('/content/data.csv')

## **Data Preprocessing**

### **data.head() is used to take a general idea of the data set as it shows only first 5 rows of the dataset**

In [22]:
dataframe.head()

Unnamed: 0,URLs,Headline,Body,Label
0,http://www.bbc.com/news/world-us-canada-414191...,Four ways Bob Corker skewered Donald Trump,Image copyright Getty Images\nOn Sunday mornin...,1
1,https://www.reuters.com/article/us-filmfestiva...,Linklater's war veteran comedy speaks to moder...,"LONDON (Reuters) - “Last Flag Flying”, a comed...",1
2,https://www.nytimes.com/2017/10/09/us/politics...,Trump’s Fight With Corker Jeopardizes His Legi...,The feud broke into public view last week when...,1
3,https://www.reuters.com/article/us-mexico-oil-...,Egypt's Cheiron wins tie-up with Pemex for Mex...,MEXICO CITY (Reuters) - Egypt’s Cheiron Holdin...,1
4,http://www.cnn.com/videos/cnnmoney/2017/10/08/...,Jason Aldean opens 'SNL' with Vegas tribute,"Country singer Jason Aldean, who was performin...",1


### **If we want to check that how many columns do we have in the dataset we use data.columns**

In [23]:
dataframe.columns

Index(['URLs', 'Headline', 'Body', 'Label'], dtype='object')

### **Now if we want to check the shape of the data we use data.shape , it tells about the number of rows and columns present in the data set**

In [24]:
dataframe.shape

(4009, 4)

### **data.info() provides the information about the data for eg, about the total number of enteries , number of columns and the data type , is there any null values or not etc**

In [25]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4009 entries, 0 to 4008
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   URLs      4009 non-null   object
 1   Headline  4009 non-null   object
 2   Body      3988 non-null   object
 3   Label     4009 non-null   int64 
dtypes: int64(1), object(3)
memory usage: 125.4+ KB


### **Now we are removing ununnecessary columns of the data set using the drop function**

In [26]:
dataframe = dataframe.drop(["URLs","Body"], axis = 1)

In [27]:
dataframe

Unnamed: 0,Headline,Label
0,Four ways Bob Corker skewered Donald Trump,1
1,Linklater's war veteran comedy speaks to moder...,1
2,Trump’s Fight With Corker Jeopardizes His Legi...,1
3,Egypt's Cheiron wins tie-up with Pemex for Mex...,1
4,Jason Aldean opens 'SNL' with Vegas tribute,1
...,...,...
4004,Trends to Watch,0
4005,Trump Jr. Is Soon To Give A 30-Minute Speech F...,0
4006,"Ron Paul on Trump, Anarchism & the AltRight",0
4007,China to accept overseas trial data in bid to ...,1


### **Randomly shuffling the dataframe to minimize bias**

In [28]:
dataframe = dataframe.sample(frac =1)

In [47]:
dataframe

Unnamed: 0,Headline,Label
3241,Let's paint about sex: racy feminist artists e...,1
2795,"Highlighting Vietnam War's relevance, exhibit ...",1
2426,Brazil studying extradition of Italian ex-left...,1
3390,Exclusive: Alphabet's Waymo demanded $1 billio...,1
2001,"Trump boasts about brain, belittles Tillerson",1
...,...,...
1925,"North Koreans process salmon, snow crab eaten ...",1
2839,Top 75 Asian Innovative Universities Methodology,1
2492,"Fukushima court rules Tepco, government liable...",1
3097,"To ""Help"" Residents Repair Homes After Irma, G...",0


In [36]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer


### **Now here we are having two columns Headline  i.e. text and Label.Now, considering text as X and Label as Y ,so text is independent and Class is dependent because based on our text we can say that whether our news is fake or real**

In [37]:
x = dataframe["Headline"]
y = dataframe["Label"]

### **Splitting the dataset into training and testing set**

In [38]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

### **Vectorization of texts**

**TF-IDF algorithm is a combination of two different algorithms namely TF (Term-Frequency) and IDF (Inverse Document Frequency).
Term Frequency can be expressed as the probability of occurrence of a term in a document.**

**Inverse Document Frequency is the logarithmic value of the inverse probability of the number of documents containing a certain term.**

**This function can be used to measure how important a term is in a document, and that can be used to train our model to quantify the weight of that word in deciding the validity of the news.**

In [39]:
vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)

### **Logistic Regression Testing**

In [40]:
from sklearn.linear_model import LogisticRegression

In [41]:
LR = LogisticRegression()
LR.fit(xv_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [42]:
LR.score(xv_test, y_test)

0.8265204386839482

### **Testing**

In [48]:
news = str(input())
testing_news = {"text":[news]}
new_def_test = pd.DataFrame(testing_news)

new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
pred_LR = LR.predict(new_xv_test)
if pred_LR[0] == 0:
  print("This news looks fake")
elif pred_LR[0] == 1:
  print("This news looks real")

Kenya's President Uhuru Kenyatta (front) and his Deputy William Ruto deliver a statement to members of the media at the State House in Nairobi, Kenya September 21, 2017. REUTERS/Baz Ratner NAIROBI (Reuters) - Kenyan opposition leader Raila Odinga withdrew on Tuesday from a court-ordered re-run of the presidential election due on Oct. 26, saying the vote would not be free or fair and leaving President Uhuru Kenyatta as the only candidate.
This news looks fake


In [50]:
news = str(input())
testing_news = {"text":[news]}
new_def_test = pd.DataFrame(testing_news)

new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
pred_LR = LR.predict(new_xv_test)
if pred_LR[0] == 0:
  print("This news looks fake")
elif pred_LR[0] == 1:
  print("This news looks real")

news = str(input()) testing_news = {"text":[news]} new_def_test = pd.DataFrame(testing_news)  new_x_test = new_def_test["text"] new_xv_test = vectorization.transform(new_x_test) pred_LR = LR.predict(new_xv_test) if pred_LR[0] == 0:   print("This news looks fake") elif pred_LR[0] == 1:   print("This news looks real")
This news looks fake
