<img src="https://th.bing.com/th/id/OIP.PEaVjja4BvUe0UVbAkZCWwAAAA?pid=ImgDet&rs=1" />

# Fake News Detection

### Introduction to Fake News Detection
Fake news detection is a crucial task in today's information landscape, where the spread of misleading or false information can have significant consequences. One approach to tackle this problem is by using machine learning algorithms such as Multinomial Naive Bayes (MultinomialNB) in combination with CountVectorizer.

MultinomialNB is a popular classification algorithm that is commonly used in natural language processing tasks. It is based on the Naive Bayes theorem and is particularly suitable for text classification problems. In the context of fake news detection, MultinomialNB can be trained to classify news articles or textual content as either genuine or fake.

CountVectorizer is a feature extraction technique commonly used in text analysis. It converts a collection of text documents into a matrix representation, where each row represents a document and each column corresponds to a specific word or term in the vocabulary. CountVectorizer assigns a numerical value to each term based on its frequency of occurrence in each document.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
from sklearn.utils import shuffle
from scipy.sparse import hstack
from sklearn.model_selection import cross_val_score,learning_curve
import matplotlib.pyplot as plt
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('./'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

./Fake.csv
./True.csv
./Untitled.ipynb
./.ipynb_checkpoints\Untitled-checkpoint.ipynb


In [2]:
true=pd.read_csv("./True.csv")
fake=pd.read_csv("./Fake.csv")

In [3]:
true.head(50)
true["subject"].value_counts()

politicsNews    11272
worldnews       10145
Name: subject, dtype: int64

In [4]:
fake.head()
fake["subject"].value_counts()

News               9050
politics           6841
left-news          4459
Government News    1570
US_News             783
Middle-east         778
Name: subject, dtype: int64

In [5]:
true.isnull().sum()

title      0
text       0
subject    0
date       0
dtype: int64

In [6]:
fake.isnull().sum()

title      0
text       0
subject    0
date       0
dtype: int64

In [7]:
true.shape

(21417, 4)

In [8]:
fake.shape

(23481, 4)

In [14]:
true["label"]=1
fake["label"]=0

In [15]:
true.head()

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017",1
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017",1
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017",1


In [16]:
fake.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [17]:
data=pd.concat([fake,true],ignore_index=True)
data.head()

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0


In [18]:
X=data["text"]
y=data["label"]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [19]:
vectorizer=CountVectorizer()
X_train_vectors=vectorizer.fit_transform(X_train)
X_test_vectors=vectorizer.transform(X_test)

In [20]:
vectorizer = CountVectorizer()
X_vectors = vectorizer.fit_transform(data['text'])
X_train, X_test, y_train, y_test = train_test_split(X_vectors, data['label'], test_size=0.2, random_state=42)
classifier = MultinomialNB()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9525612472160356


In [22]:
new_texts = ["Scientists discover a secret cure for cancer hidden by pharmaceutical companies.",
             "The government announces new tax regulations affecting small businesses."]
new_texts_vectors = vectorizer.transform(new_texts)
predictions = classifier.predict(new_texts_vectors)
for text, label in zip(new_texts, predictions):
    print(f"Text: {text}\nPrediction: {'Fake' if label == 0 else 'True'}\n")

Text: Scientists discover a secret cure for cancer hidden by pharmaceutical companies.
Prediction: Fake

Text: The government announces new tax regulations affecting small businesses.
Prediction: True



In [24]:
true_df = pd.read_csv('./True.csv')
fake_df = pd.read_csv('./Fake.csv')
fake_df['label'] = 0
true_df['label'] = 1
combined_df = pd.concat([fake_df, true_df], ignore_index=True)
combined_df = combined_df.sample(frac=1, random_state=42).reset_index(drop=True)
X = combined_df['title'] + " " + combined_df['text']
y = combined_df['label']
vectorizer = TfidfVectorizer()
X_vectors = vectorizer.fit_transform(X)
classifier = MultinomialNB(alpha=1.0)
classifier.fit(X_vectors, y)
def predict_label(input_title):
    input_text = ""  # You can add additional user input for the text if necessary
    input_data = input_title + " " + input_text
    input_vector = vectorizer.transform([input_data])
    label = classifier.predict(input_vector)[0]
    return label
input_title ="WASHINGTON (Reuters) - The special counsel"
predicted_label = predict_label(input_title)
if predicted_label == 0:
    print("Predicted Label: Fake")
else:
    print("Predicted Label: True")

Predicted Label: True


### Conclusion:
In conclusion, the combination of MultinomialNB and CountVectorizer provides a solid foundation for fake news detection. By leveraging MultinomialNB's classification capabilities and CountVectorizer's feature extraction techniques, we can train a model to identify and classify news articles as either genuine or fake. This approach is particularly effective in the context of text analysis and can be a valuable tool in combating the spread of misinformation.

However, it's important to acknowledge the limitations of this approach. Fake news detection is a complex problem, and relying solely on MultinomialNB and CountVectorizer may not capture all the nuances and evolving techniques used by malicious actors. It's crucial to continually update and refine the model, considering new data, incorporating more advanced algorithms, and exploring additional features and techniques.

Furthermore, the reliability of any fake news detection system is dependent on the quality and diversity of the training data. A comprehensive and balanced dataset, with representative examples of both genuine and fake news, is essential to build an accurate and robust model.