# **Naive Bayes**

## Theory

* Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming that features are conditionally independent of each other given the class

**Advantages**
* Based on a straightforward application of probabilities (Bayes’ Theorem) and basic counting.
* The probabilistic framework allows you to explain predictions using prior and likelihood probabilities.
* Naive Bayes is computationally efficient as it reduces the complexity by assuming feature independence.
* Scales well with large datasets due to its simplicity.
* It requires less data to estimate the probabilities
* Due to its simplicity, it generalizes well and is less prone to overfitting.
* Particularly useful for discrete data such as text or categorical features.


**Disadvantages**
* Unrealistic Assumptions: Features are rarely independent in real-world data.
* If a feature value does not appear in the training data, it leads to zero probability during predictions(Use Laplace Smoothing to handle this.)
* For continuous features, it assumes a Gaussian distribution, which may not hold in many datasets.
* Naive Bayes assumes equal weightage for all classes unless explicitly adjusted. Imbalanced datasets can cause biased predictions.(Use proir Probability adjustment)

**Asmuptions**
* Feature Independence
* The training data should be representative of the real-world data distribution. If the dataset is biased or limited, the model may perform poorly.
* All features used for classification must be present during both training and prediction
* Each feature contributes independently to the probability of a class.


**When to Use?**
* Text Classification, Sentimental analysis, Spam Detection.
* High dimensional data: It preform well with high dimensional data.
* Real-Time Predictions: Due to its speed, it’s great for applications requiring real-time results (e.g., fraud detection).
* Works well when data is sparse or when there are fewer samples available
* Can handle multi-class problems effectively.

**When not to Use?**
* When features are highly correlated (e.g., pixels in images), the independence assumption breaks down.
* It performs poorly when decision boundaries are nonlinear and complex.
* Naive Bayes assumes equal class probabilities unless adjusted, which can lead to biased predictions for imbalanced datasets.

**Different Types of Naive Bayes.**
* Multinomial Naive Bayes: Best for discrete features like word counts (Bag-of-Words).
* Gaussian Naive Bayes: Best for continuous features, assuming a Gaussian distribution.
* Bernoulli Naive Bayes: Best for binary features (e.g., presence or absence of words).

Formula : 

C* =
argmax
​
 P(C) 
i=1
∏
n
​
 P(x 
i
​
 ∣C) <br>
 In real world we use <br>
 logP(C∣X)∝logP(C)+ 
i=1
∑
n
​
 logP(x 
i
​
 ∣C)

## Implementation

* Initalize features and target
* Calculate joint probability of each feature with target(p(x/y)): Probability of x happen when y already happened.
* Use bayesian equation calculate the probability for each class with respect to the feature given
* Find the high probability and assign the data to it

In [433]:
import numpy as np

In [434]:
class NaiveBayes:
    '''Implement naive bayes from scratch'''

    def __init__(self):
        self.probabilites = {}  # store the join occurance of each feature value and target
        self.feature = None  # to store features while fitting
        self.target = None  # to store classes while fitting
        self.cls_count = {}  # to store count for each classes in the target
        self.size = 0  # size of the data
        self.classes = 0
        self.alpha = 1

    def __create_counter(self):
        '''Function for counting occurance of feature value with each target'''
        cols = self.feature.shape[1]
        rows = self.feature.shape[0]
        # Iterate through each features in the training data
        for feature in range(cols):
            # Iterate through each values in that feature
            for row_idx in range(rows):
                feature_value = str(self.feature[row_idx, feature])
                cls_value = self.target[row_idx]
                if (feature, feature_value, cls_value) not in self.probabilites:
                    self.probabilites[(feature, feature_value, cls_value)] = 0
                self.probabilites[(feature, feature_value, cls_value)] += 1

    def _cal_prob(self, datas):
        if datas.ndim == 1:
            datas = datas.reshape(1, -1)
        cols = datas.shape[1]
        classes = []
        # Iterate through each data in the datas
        for data in datas:
            p_cls = [] # to store the log probability for each classes
            # iterate through each classes
            for cls in self.cls_count.keys():
                likelihood = 0 # set likelihood to 0
                # iterate through each features
                for feature in range(cols):
                    feature_value = str(data[feature]) # feature
                    # find the intersection of feature value and class
                    count_feature_target = self.probabilites[(feature, feature_value, int(cls))]
                    # count the occurance of that class
                    count_cls = self.cls_count[cls]
                    # cal log likelihood
                    likelihood += np.log((count_feature_target + self.alpha) / count_cls + (self.alpha * len(self.classes)))
                likelihood += np.log(count_cls/self.size)
                # append likelihood of each cls
                p_cls.append(likelihood)
            # find the max probability
            classes.append(self.classes[np.argmax(p_cls)])
        return np.array(classes)


    def __count_class(self):
        for cls in self.target:
            cls = str(cls)
            if cls not in self.cls_count:
                self.cls_count[cls] = 1
            else:
                self.cls_count[cls] += 1

    def fit(self, X, y):
        self.feature = X
        self.target = y
        self.size = X.shape[0]
        self.classes = np.unique(self.target)
        self.__create_counter()
        self.__count_class()

    def predict(self, data):
        data = np.array(data)
        idx = self._cal_prob(data)
        return idx

In [435]:
feature1 = np.random.choice([0, 1, 2], size=100)
feature2 = np.random.choice([2, 3, 4], size=100)
feature3 = np.random.choice([5, 1, 0], size=100)
feature4 = np.random.choice([0, 5, 2], size=100)
X = np.column_stack([feature1, feature2, feature3, feature4])
X.shape

(100, 4)

In [436]:
y = np.random.choice([0, 1], size=100)

In [437]:
nb = NaiveBayes()
nb.fit(X, y)

In [438]:
X[0]

array([0, 3, 1, 0])

In [439]:
a = nb.predict(X)

In [440]:
X.shape

(100, 4)

In [441]:
a.shape

(100,)

In [442]:
y[0]

1

In [443]:
from sklearn.metrics import accuracy_score
accuracy_score(a,y)

0.42

In [444]:
from sklearn.naive_bayes import GaussianNB

In [445]:
gb = GaussianNB()
gb.fit(X,y)
p = gb.predict(X)

In [446]:
from sklearn.metrics import accuracy_score
accuracy_score(p,y)

0.54

In [447]:
y[12]

0