# Step-1: Business Problem

**Detecting Fake News to Combat Misinformation**

Problem Statement:

In the digital age, misinformation spreads rapidly across social media, news websites, and messaging platforms, influencing public opinion, financial markets, and even political decisions. The challenge is to automate the detection of fake news to help individuals, media organizations, and fact-checking agencies differentiate between reliable and misleading content.



In [2]:
import pandas as pd
import numpy as np

import warnings
warnings.simplefilter('ignore')

In [3]:
df = pd.read_csv(r"C:\Users\divya\Downloads\fake_and_real_news.csv\fake_and_real_news.csv")
df

Unnamed: 0,Text,label
0,Top Trump Surrogate BRUTALLY Stabs Him In The...,Fake
1,U.S. conservative leader optimistic of common ...,Real
2,"Trump proposes U.S. tax overhaul, stirs concer...",Real
3,Court Forces Ohio To Allow Millions Of Illega...,Fake
4,Democrats say Trump agrees to work on immigrat...,Real
...,...,...
9895,Wikileaks Admits To Screwing Up IMMENSELY Wit...,Fake
9896,Trump consults Republican senators on Fed chie...,Real
9897,Trump lawyers say judge lacks jurisdiction for...,Real
9898,WATCH: Right-Wing Pastor Falsely Credits Trum...,Fake


**Text Cleaning**

In [5]:
import nltk
import re
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
ps = PorterStemmer()

In [6]:
corpus = []
for i in range(len(df)):
    rp = re.sub('[^a-zA-Z]'," ",df['Text'][i])
    rp = rp.lower()
    rp = rp.split()
    rp = [ps.stem(word) for word in rp if not word in set(stopwords.words('english'))]
    rp = " ".join(rp)
    corpus.append(rp)
                

**Vectorization**

In [8]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X = cv.fit_transform(corpus).toarray()

In [9]:
y = pd.get_dummies(df['label'],drop_first=True)


**Train-Test Split**

In [11]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=True)

**Modelling**

**Navie Bayes Classifier with default parameters**

In [13]:
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train,y_train)

**Predictions**

In [15]:
ypred_test = model.predict(X_test)
ypred_train = model.predict(X_train)

**Evalution**

In [17]:
from sklearn.metrics import accuracy_score
print("Train Accuracy:",accuracy_score(y_train,ypred_train))
print("Test Accuracy:",accuracy_score(y_test,ypred_test))

Train Accuracy: 0.979040404040404
Test Accuracy: 0.9702020202020202


In [18]:
import joblib

# Assuming your model is stored in a variable called 'model'
joblib.dump(model, 'text_classification_model.joblib')


['text_classification_model.joblib']