# 

# <center> Naive Bayes

## References

* Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow - Aurélien Géron
* Machine learning - Fast reference guide - Matt Harrison
* https://filipezabala.com/
* https://www.youtube.com/@patloeber
* https://www.youtube.com/@Dataquestio

## Overview

Naive Bayes is a probabilistic supervised learning classification model. Based on Bayes' theorem, the model assumes independence among the attributes and describes the probability of an event conditioned on prior knowledge that may be related to the event.

Scikit-learn (sklearn) provides three variations of the model:

* GaussianNB: For continuous attributes with a normal distribution.

* MultinomialNB: For discrete occurrence counters.

* BernoulliNB: For discrete Boolean attributes.

Some data transformations can be convenient, such as excluding collinear attributes, normalizing the distribution for GaussianNB model, and discretizing continuous variables.

## Some math

Here are some perspectives on the mathematical formulation of the Bayes' model.

## $ P(A|X) = \frac{P(X|A)  P(A)}{P(X)} $ 

## $ P(A_{j}|X) = \Pi{P(x_i|A_j)} $ 

## $ P(A|x_1, x_2, ... , x_n) = \frac{P(x_1|A)P(x_2|A)...P(x_n|A)}{P(x_1)P(x_2) ... P(x_n)} $ 



Using a more formal formulation:

## $ \Pi(\theta|X) = \frac{L(X|\theta)  \Pi(\theta)}{\int_{\theta}L(X|\theta)\Pi(\theta)d\theta} $ 


## $ Posteriori = \frac{Verossimilhança X Priori}{Evidencia} $ 


Where:

* $\Pi$: It is a probability distribution. It is the translation of opinion through the probability distribution. The subjectivist aspect behind Bayesian probability.


* $ \Pi(\theta)$ - **Priori**: Opinion or probability distribution before observing the data.


* $ \Pi(\theta|X) $ - **Posteriori**: Opinion or probability distribution after observing the data.


* $ L(X|\theta) $ - **Likelihood**: It is the information from the data. The information of the particular variable conditioned on the unknown population parameter 𝜃. In the Bayesian view, we cannot make different decisions with the same prior opinion after observing the data.


* $ \int_{\theta}L(X|\theta)\Pi(\theta)d\theta $ - **Evidence**: Probability of the true 𝜃. With appropriate choice of probability distribution and given mechanistic data, the numerator disappears, and the *posteriori* becomes the product of the *prior* and the *likelihood*.

---

## Imports

In [1]:
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

import numpy as np
from sklearn.model_selection import train_test_split

### Data

In [2]:
# imports
from sklearn import datasets

# Load data
bc = datasets.load_breast_cancer()

# Define X and y
X, y = bc.data, bc.target

# Separate train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

---

## Models

## From scratch

In [3]:
# imports
from my_BayesianModels import accuracy, NaiveBayes

In [4]:
# Define model
nb = NaiveBayes()

# Fit
nb.fit(X_train, y_train)

# Predict
predictions = nb.predict(X_test)

# Accuracy
acc = accuracy(y_test, predictions)

# Print accuracy
print("Accuracy: %.3f%%" % (acc * 100.0))

Accuracy: 88.596%


## From Sklearn

In [5]:
# Imports
from sklearn.naive_bayes import GaussianNB, MultinomialNB

In [6]:
# Define Multinomial model
nb = MultinomialNB()

# Treinamento do modelo
nb.fit(X_train, y_train)

# Predict
pred = nb.predict(X_test)

# Accuracy
acc = nb.score(X_test, y_test)

# Print score 
print("Accuracy: %.3f%%" % (acc * 100.0))

Accuracy: 88.596%


In [7]:
# Define Gaussian model
nb = GaussianNB()

# Treinamento do modelo
nb.fit(X_train, y_train)

# Predict
pred = nb.predict(X_test)

# Accuracy
acc = nb.score(X_test, y_test)

# Print score 
print("Accuracy: %.3f%%" % (acc * 100.0))

Accuracy: 90.351%


___