# Naive Bayes Classifier

This Notebook treats 4 distinct classification problems using differnt Naive Bayes Classification models. We are going to use Bayesian, Gaussian, Multinomial and Complement Naive Bayes. We use the Scikit Library Naive Bayes that can be seen in: https://scikit-learn.org/stable/modules/naive_bayes.html. This library uses MLE frequentist estimators but also introduces smoothing and binning techniques to be able to treat a wider range of data for each of the models, this allows the treatment of discrete data as continous and continous as discrete. Everything can be found in the documentation in the link given.

## Bayes Using SciKit Learn:

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import ComplementNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import GaussianNB

## Checking mail spam:

Given an e-mail message, determine wheather such message is Spam or Ham:

In [15]:
data = pd.read_csv("data/spam.csv", encoding= 'latin-1')
data = data[["class", "message"]]
data

Unnamed: 0,class,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


### Dataset Preparation

In [4]:
x = np.array(data["message"])
y = np.array(data["class"])

cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42, shuffle = True)


### Using Bernoulli Naive Bayes:

In [72]:
model = BernoulliNB(binarize=0.0)
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.97847533632287
0.9876598608929773


### Using Gaussian Naive Bayes:

In [73]:
model = GaussianNB()
model.fit(xtrain.toarray(), ytrain)
print(model.score(xtest.toarray(), ytest))
print(model.score(xtrain.toarray(),ytrain))

0.9004484304932735
0.9497419789095805


### Using Multinomial Naive Bayes:

In [74]:
model = MultinomialNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.97847533632287
0.9943908458604442


### Using Complement Naive Bayes:

In [75]:
model = ComplementNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.9641255605381166
0.9856405654027373


## IMDB Sentiment Analysis:

Given a Review of a Film from IMDB determine if such review is positive:

In [76]:
df = pd.read_csv('data/IMDB_Dataset.csv')
#https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

In [77]:
df

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49995,I thought this movie did a down right good job...,positive
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


### Dataset Preparation:

In [78]:
x = np.array(df["review"])
y = np.array(df["sentiment"])
cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42, shuffle = True)

### Using Bernoulli Naive Bayes:

In [79]:
model = BernoulliNB(binarize=0.0)
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.8532
0.89815


### Using Gaussian Naive Bayes:

It does not Work with Gaussian Naive Bayes, since the use of sparse matrices is not allowed when computing the Gaussian Naive Bayes, and the dataset is far too large to perform the calculations with a non-sparse matrix.

### Using Multinomial Naive Bayes:

In [80]:
model = MultinomialNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.8487
0.891


### Using Complement Naive Bayes:

In [81]:
model = ComplementNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.8487
0.89105


## Type of Raisin Detection:

In [82]:
df = pd.read_csv('data/Raisin_Dataset.csv')

In [83]:
df

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
0,87524,442.246011,253.291155,0.819738,90546,0.758651,1184.040,Kecimen
1,75166,406.690687,243.032436,0.801805,78789,0.684130,1121.786,Kecimen
2,90856,442.267048,266.328318,0.798354,93717,0.637613,1208.575,Kecimen
3,45928,286.540559,208.760042,0.684989,47336,0.699599,844.162,Kecimen
4,79408,352.190770,290.827533,0.564011,81463,0.792772,1073.251,Kecimen
...,...,...,...,...,...,...,...,...
895,83248,430.077308,247.838695,0.817263,85839,0.668793,1129.072,Besni
896,87350,440.735698,259.293149,0.808629,90899,0.636476,1214.252,Besni
897,99657,431.706981,298.837323,0.721684,106264,0.741099,1292.828,Besni
898,93523,476.344094,254.176054,0.845739,97653,0.658798,1258.548,Besni


### Dataset Preparation:

In [84]:
x = np.array(df.drop("Class", axis=1))
y = np.array(df["Class"])
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42, shuffle = True)


### Using Bernoulli Naive Bayes:

In [85]:
model = BernoulliNB(binarize=0.0)
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.4777777777777778
0.5055555555555555


### Using Multinomial Naive Bayes:

In [86]:
model = MultinomialNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 0.8333333333333334
The Testing accuracy is 0.7805555555555556


### Using Gaussian Naive Bayes:

In [87]:
model = GaussianNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 0.8277777777777777
The Testing accuracy is 0.8222222222222222


### Using Complement  Naive Bayes:

In [88]:
model = ComplementNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 0.8333333333333334
The Testing accuracy is 0.7805555555555556


## Iris Dataset

In [89]:
df = pd.read_csv('data/Iris.csv')

In [90]:
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


In [91]:
x = np.array(df.drop("Species", axis=1))
y = np.array(df["Species"])
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42, shuffle = True)


### Using Bernoulli Naive Bayes:

In [92]:
model = BernoulliNB(binarize=0.0)
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))
print(model.score(xtrain,ytrain))

0.3
0.3416666666666667


### Using Multinomial Naive Bayes:

In [93]:
model = MultinomialNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 0.9333333333333333
The Testing accuracy is 0.8083333333333333


### Using Gaussian Naive Bayes

In [94]:
model = GaussianNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 1.0
The Testing accuracy is 0.9916666666666667


### Using Complement Naive Bayes

In [95]:
model = ComplementNB()
model.fit(xtrain, ytrain)
print(f"The Training accuracy is {model.score(xtest, ytest)}")
print(f"The Testing accuracy is {model.score(xtrain,ytrain)}")

The Training accuracy is 0.7
The Testing accuracy is 0.6583333333333333
