<a href="https://colab.research.google.com/github/Mercymerine/Naive_bayes/blob/main/Fake_news_naive_bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive Bayes

## Introduction
This model is easy to build and is mostly used for large datasets. It is a probabilistic machine learning model that is used for classification problems. The core of the classifier depends on the Naive Bayes theorem with an assumption of independence among predictors. That means changing the value of a feature doesn’t change the value of another feature.

 It works by calculating the probability of an item belonging to a certain class based on its features.

 Naive Bayes is a simple but powerful method in machine learning used for guessing categories of things. Imagine sorting emails into spam or inbox. Naive Bayes looks at each word (like a clue) and predicts how likely it is to be spam based on past emails. It assumes these words aren’t connected (not always true!), but it’s fast and works well, making it a popular choice for many tasks.

  A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

### Why is it called Naive?
It is called Naive because of the assumption that 2 variables are independent when they may not be. In a real-world scenario, there is hardly any situation where the features are independent.

### Conditional Probability for Naive Bayes
Conditional probability is defined as the **likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome.** Conditional probability is calculated by multiplying the probability of the preceding event by the updated probability of the succeeding, or conditional, event.

### Bayes’ Rule
Provides a means for calculating the probability of an event given some information.
When the features are independent, we can extend Bayes’ rule to what is called Naive Bayes which assumes that the features are independent that means changing the value of one feature doesn’t influence the values of other variables and this is why we call this algorithm “NAIVE”

**1. Gaussian Naive Bayes:** gaussiannb is used in classification tasks and it assumes that feature values follow a gaussian distribution.

**2. Multinomial Naive Bayes:** It is used for discrete counts. For example, let’s say,  we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

**3. Bernoulli Naive Bayes:** The binomial model is useful if your feature vectors are boolean (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

**4. Complement Naive Bayes:**It is an adaptation of Multinomial NB where the complement of each class is used to calculate the model weights. So, this is suitable for imbalanced data sets and often outperforms the MNB on text classification tasks.

**5. Categorical Naive Bayes:** Categorical Naive Bayes is useful if the features are categorically distributed. We have to encode the categorical variable in the numeric format using the ordinal encoder for using this algorithm.

Pros and Cons for Naive Bayes
Pros:

Requires a small amount of training data. So the training takes less time.
Handles continuous and discrete data, and it is not sensitive to irrelevant features.
Very simple, fast, and easy to implement.
Can be used for both binary and multi-class classification problems.
Highly scalable as it scales linearly with the number of predictor features and data points.
When the Naive Bayes conditional independence assumption holds true, it will converge quicker than discriminative models like logistic regression.
Cons:

The assumption of independent predictors/features. Naive Bayes implicitly assumes that all the attributes are mutually independent which is almost impossible to find in real-world data.
If a categorical variable has a value that appears in the test dataset, and not observed in the training dataset, then the model will assign it a zero probability and will not be able to make a prediction. This is what we called the “Zero Frequency problem“, and can be solved using smoothing techniques.
Applications of Naive Bayes Algorithm
Real-time Prediction.
Multi-class Prediction.
Text classification/ Spam Filtering/ Sentiment Analysis.
Recommendation Systems.




## Getting the necessary packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Getting the datasets

In [None]:
#Getting datats from Kaggle
!pip install -q kaggle

In [None]:
#Getting the dataset
!kaggle datasets download -d vishakhdapat/fake-news-detection

Dataset URL: https://www.kaggle.com/datasets/vishakhdapat/fake-news-detection
License(s): MIT
Downloading fake-news-detection.zip to /content
 53% 5.00M/9.37M [00:00<00:00, 51.5MB/s]
100% 9.37M/9.37M [00:00<00:00, 78.5MB/s]


In [None]:
import zipfile
zipfile_path = '/content/fake-news-detection.zip'
csv_file = zipfile_path.replace('zip', '')
with zipfile.ZipFile(zipfile_path, 'r') as file:
  file.extractall()
  print('Done')

Done


In [None]:
news = pd.read_csv('/content/fake_and_real_news.csv')
news

Unnamed: 0,Text,label
0,Top Trump Surrogate BRUTALLY Stabs Him In The...,Fake
1,U.S. conservative leader optimistic of common ...,Real
2,"Trump proposes U.S. tax overhaul, stirs concer...",Real
3,Court Forces Ohio To Allow Millions Of Illega...,Fake
4,Democrats say Trump agrees to work on immigrat...,Real
...,...,...
9895,Wikileaks Admits To Screwing Up IMMENSELY Wit...,Fake
9896,Trump consults Republican senators on Fed chie...,Real
9897,Trump lawyers say judge lacks jurisdiction for...,Real
9898,WATCH: Right-Wing Pastor Falsely Credits Trum...,Fake


In [None]:
news.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9900 entries, 0 to 9899
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Text    9900 non-null   object
 1   label   9900 non-null   object
dtypes: object(2)
memory usage: 154.8+ KB


## Encoding

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

In [None]:
#Convertin labels to binary; fake=0, real=1
news['label'] = news['label'].map({'Fake':0, 'Real':1})

In [None]:
#Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(news['Text'], news['label'], test_size=0.2, random_state=42)


In [None]:
#Convert text to numeric features using countvectorizer
vectorizer = CountVectorizer()
X_train_vectors = vectorizer.fit_transform(X_train)
X_test_vectors = vectorizer.transform(X_test)

In [None]:
#Train the naive bayes
model = MultinomialNB()
model.fit(X_train_vectors, y_train)

In [None]:
#Make predicitions
y_pred = model.predict(X_test_vectors)

In [None]:

# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.9777777777777777

Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.98      0.98       973
           1       0.98      0.98      0.98      1007

    accuracy                           0.98      1980
   macro avg       0.98      0.98      0.98      1980
weighted avg       0.98      0.98      0.98      1980

