# Types of Naive Bays 
https://scikit-learn.org/stable/modules/naive_bayes.html

    1. Naive Bayes
    1. Gaussian Naive Bayes
    3. Multinomial Naive Bayes
    4. Complement Naive Bayes
    5. Bernoulli Naive Bayes
    6. Categorical Naive Bayes
    
**Out-of-core naive Bayes model fitting :**

Naive Bayes models can be used to tackle large scale classification problems for which the full training set might not fit in memory. To handle this case, MultinomialNB, BernoulliNB, and GaussianNB expose a partial_fit method that can be used incrementally as done with other classifiers as demonstrated in Out-of-core classification of text documents. All naive Bayes classifiers support sample weighting.

Contrary to the fit method, the first call to partial_fit needs to be passed the list of all the expected class labels.

For an overview of available strategies in scikit-learn, see also the out-of-core [https://scikit-learn.org/stable/computing/scaling_strategies.html#scaling-strategies] learning documentation.

In [None]:
def accuracy_score(y_true, y_pred):
    """score = (y_true - y_pred) / len(y_true) """
    return round(float(sum(y_pred == y_true))/float(len(y_true)) * 100 ,2)

def pre_processing(df):
    
    """ partioning data into features and target """
    X = df.drop([df.columns[-1]], axis = 1)
    y = df[df.columns[-1]]
    return X, y

def train_test_split(x, y, test_size = 0.25, random_state = None):

    """ partioning the data into train and test sets """

    x_test = x.sample(frac = test_size, random_state = random_state)
    y_test = y[x_test.index]

    x_train = x.drop(x_test.index)
    y_train = y.drop(y_test.index)

    return x_train, x_test, y_train, y_test

### Calculate Prior Probability of Classes P(y)


 #### Frequency table
    
    P(Play=Yes) = 9/14 = 0.64
    P(Play=No) = 5/14 = 0.36
    

In [None]:
def _calc_class_prior(self):

    """ P(c) - Prior Class Probability"""

    for outcome in np.unique(self.y_train):
        outcome_count = sum(self.y_train == outcome)
        self.class_priors[outcome] = outcome_count / self.train_size

### Calculate the Likelihood Table for all features

#Likelihood Table - Outlook

|Play | Overcast| Rainy| Sunny|            |Play     | Cool    | Mild | Hot  |         
|-----|---------|------|------|            |---------|---------|------|------|
|Yes  | 4/9     | 2/9  | 3/9  |            |Yes      | 3/9     | 4/9  | 2/9  |
|No   | 0/5     | 3/5  | 2/5  |            |No       | 1/5     | 2/5  | 2/5  |  
|#Temp|    4/14 | 5/14 | 5/14 |            |#Humidity|    4/14 | 6/14 | 4/14 |   
            
      
|Play   |High | Normal|     |Play  | f   |  t   |
|-------|-----|-------|     |------|-----|------|
|Yes    |3/9  | 6/9   |     |Yes   | 6/9 |  3/9 |
|No     |4/5  | 1/5   |     |No    | 2/5 |  3/5 |
|#Windy | 7/14| 7/14  |     |      | 8/14|  6/14|



In [None]:
def _calc_likelihoods(self):

    for feature in self.features:

        for outcome in np.unique(self.y_train):
            outcome_count = sum(self.y_train == outcome)
            feat_likelihood = self.X_train[feature][self.y_train[self.y_train == outcome].index.values.tolist()].value_counts().to_dict()

            for feat_val, count in feat_likelihood.items():
                self.likelihoods[feature][feat_val + '_' + outcome] = count/outcome_count


def _calc_predictor_prior(self):

    for feature in self.features:
        feat_vals = self.X_train[feature].value_counts().to_dict()

            for feat_val, count in feat_vals.items():
                self.pred_priors[feature][feat_val] = count/self.train_size

***Now, Calculate Posterior Probability for each class using the Naive Bayesian equation. The Class with maximum probability is the outcome of the prediction.***

***Query: Whether Players will play or not when the weather conditions are [Outlook=Rainy, Temp=Mild, Humidity=Normal, Windy=t]?***

Calculation of Posterior Probability:

P(y=Yes|x) = P(Yes|Rainy,Mild,Normal,t)    

            P(Rainy,Mild,Normal,t|Yes) * P(Yes)
       = ___________________________________
                P(Rainy,Mild,Normal,t)        
                
                P(Rainy|Yes)*P(Mild|Yes)*P(Normal|Yes)*P(t|Yes)*P(Yes)
       = ________________________________________________________________
                    P(Rainy)*P(Mild)*P(Normal)*P(t)



Since Conditional independence of two random variables, A and B gave C holds just in case<br>   P(A, B | C) = P(A | C) * P(B | C)

         (2/9) * (4/9) * (6/9) * (3/9) * (9/14)
       = _______________________________________
            (5/14) * (6/14) * (7/14) * (6/14)
       
       = 0.43 P(y=No|x) = P(No|Rainy,Mild,Normal,t)         
       
           P(Rainy,Mild,Normal,t|No) * P(No)
       = ___________________________________
                P(Rainy,Mild,Normal,t)      
                
                
             P(Rainy|No)*P(Mild|No)*P(Normal|No)*P(t|No)*P(No)
       = ______________________________________________________
                    P(Rainy)*P(Mild)*P(Normal)*P(t)          
                    
                    
           (3/5) * (2/5) * (1/5) * (3/5) * (5/14)
       = _______________________________________
            (5/14) * (6/14) * (7/14) * (6/14)
       
       = 0.31
       
       
       
Now, P(Play=Yes|Rainy,Mild,Normal,t) has the highest Posterior probability.

In [None]:
def fit(self, X, y):

    self.features = list(X.columns)
    self.X_train = X
    self.y_train = y
    self.train_size = X.shape[0]
    self.num_feats = X.shape[1]

    for feature in self.features:
        self.likelihoods[feature] = {}
        self.pred_priors[feature] = {}

        for feat_val in np.unique(self.X_train[feature]):
            self.pred_priors[feature].update({feat_val: 0})

            for outcome in np.unique(self.y_train):
                self.likelihoods[feature].update({feat_val+'_'+outcome:0})
                self.class_priors.update({outcome: 0})

    self._calc_class_prior()
    self._calc_likelihoods()
    self._calc_predictor_prior()

In [None]:
def predict(self, X):

    """ Calculates Posterior probability P(c|x) """

    results = []
    X = np.array(X)

    for query in X:
        probs_outcome = {}
        for outcome in np.unique(self.y_train):
            prior = self.class_priors[outcome]
            likelihood = 1
            evidence = 1

            for feat, feat_val in zip(self.features, query):
                likelihood *= self.likelihoods[feat][feat_val + '_' + outcome]
                evidence *= self.pred_priors[feat][feat_val]

            posterior = (likelihood * prior) / (evidence)

            probs_outcome[outcome] = posterior

        result = max(probs_outcome, key = lambda x: probs_outcome[x])
        results.append(result)

    return np.array(results)

Weather Dataset:Train Accuracy: 92.86

Query 1:- [['Rainy' 'Mild' 'Normal' 't']] ---> ['yes']

Query 2:- [['Overcast' 'Cool' 'Normal' 't']] ---> ['yes']

Query 3:- [['Sunny' 'Hot' 'High' 't']] ---> ['no']

In [None]:
class  NaiveBayes:

    """
        Bayes Theorem:
                                        Likelihood * Class prior probability
                Posterior Probability = -------------------------------------
                                            Predictor prior probability

                                         P(x|c) * p(c)
                               P(c|x) = ------------------ 
                                              P(x)
    """

    def __init__(self):

        """
            Attributes:
                likelihoods: Likelihood of each feature per class
                class_priors: Prior probabilities of classes 
                pred_priors: Prior probabilities of features 
                features: All features of dataset
        """
        self.features = list
        self.likelihoods = {}
        self.class_priors = {}
        self.pred_priors = {}

        self.X_train = np.array
        self.y_train = np.array
        self.train_size = int
        self.num_feats = int

    def fit(self, X, y): #code

    def _calc_class_prior(self): #code

    def _calc_likelihoods(self): #code

    def _calc_predictor_prior(self): # code

    def predict(self, X): #code


if __name__ == "__main__":

    #Weather Dataset
    print("\nWeather Dataset:")

    df = pd.read_table("../Data/weather.txt")
    #print(df)

    #Split fearures and target
    X,y  = pre_processing(df)

    nb_clf = NaiveBayes()
    nb_clf.fit(X, y)

    print("Train Accuracy: {}".format(accuracy_score(y, nb_clf.predict(X))))

    #Query 1:
    query = np.array([['Rainy','Mild', 'Normal', 't']])
    print("Query 1:- {} ---> {}".format(query, nb_clf.predict(query)))

    #Query 2:
    query = np.array([['Overcast','Cool', 'Normal', 't']])
    print("Query 2:- {} ---> {}".format(query, nb_clf.predict(query)))

    #Query 3:
    query = np.array([['Sunny','Hot', 'High', 't']])
    print("Query 3:- {} ---> {}".format(query, nb_clf.predict(query)))

# Naive Bays - GaussianNB

In [None]:
def _calc_likelihoods(self):

        """ P(x|c) - Likelihood """

        for feature in self.features:

            for outcome in np.unique(self.y_train):
                self.likelihoods[feature][outcome]['mean'] = self.X_train[feature][self.y_train[self.y_train == outcome].index.values.tolist()].mean()
                self.likelihoods[feature][outcome]['variance'] = self.X_train[feature][self.y_train[self.y_train == outcome].index.values.tolist()].var()


In [None]:
def fit(self, X, y):

        self.features = list(X.columns)
        self.X_train = X
        self.y_train = y
        self.train_size = X.shape[0]
        self.num_feats = X.shape[1]

        for feature in self.features:
            self.likelihoods[feature] = {}

            for outcome in np.unique(self.y_train):
                self.likelihoods[feature].update({outcome:{}})
                self.class_priors.update({outcome: 0})


        self._calc_class_prior()
        self._calc_likelihoods()

        # print(self.likelihoods)
        # print(self.class_priors)

In [None]:
def predict(self, X):

    """ Calculates Posterior probability P(c|x) """

    results = []
    X = np.array(X)

    for query in X:
        probs_outcome = {}

        """
            Note: No Need to calculate evidence i.e P(x) since it is constant fot the given sample.
                  Therfore, it does not affect classification and can be ignored
        """
        for outcome in np.unique(self.y_train):
            prior = self.class_priors[outcome]
            likelihood = 1
            evidence_temp = 1

            for feat, feat_val in zip(self.features, query):
                mean = self.likelihoods[feat][outcome]['mean']
                var = self.likelihoods[feat][outcome]['variance']
                likelihood *= (1/math.sqrt(2*math.pi*var)) * np.exp(-(feat_val - mean)**2 / (2*var))

            posterior_numerator = (likelihood * prior)
            probs_outcome[outcome] = posterior_numerator


        result = max(probs_outcome, key = lambda x: probs_outcome[x])
        results.append(result)

    return np.array(results)


In [None]:
class  GaussianNB:

    """
        Bayes Theorem:
                                        Likelihood * Class prior probability
                Posterior Probability = -------------------------------------
                                            Predictor prior probability

                                         P(x|c) * p(c)
                               P(c|x) = ------------------ 
                                              P(x)
        Gaussian Naive Bayes:
                                     1
                P(x|c) = --------------------------- * exp(- (x - mean)^2 / 2*(var(x)^2)))
                           sqrt(2 * pi * var(x)^2)
                           
                           
                Here var(x) is actually std(x)
    """

    def __init__(self):

        """
            Attributes:
                likelihoods: Likelihood of each feature per class
                class_priors: Prior probabilities of classes  
                features: All features of dataset
        """
        self.features = list
        self.likelihoods = {}
        self.class_priors = {}

        self.X_train = np.array
        self.y_train = np.array
        self.train_size = int
        self.num_feats = int

    def fit(self, X, y): #code

    def _calc_class_prior(self): #code

    def _calc_likelihoods(self): #code
        
    def predict(self, X):  #code
        
        
if __name__ == "__main__":

    # Iris Dataset
    print("\nIris Dataset:")

    df = pd.read_csv("../Data/iris.csv")
    #print(df)

    #Split fearures and target
    X,y  = pre_processing(df)

    #Split data into Training and Testing Sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0)

    #print(X_train, y_train)
    gnb_clf = GaussianNB()
    gnb_clf.fit(X_train, y_train)
    #print(X_train, y_train)

    print("Train Accuracy: {}".format(accuracy_score(y_train, gnb_clf.predict(X_train))))
    print("Test Accuracy: {}".format(accuracy_score(y_test, gnb_clf.predict(X_test))))

    #Query 1:
    query = np.array([[5.7, 2.9, 4.2, 1.3]])
    print("Query 1:- {} ---> {}".format(query, gnb_clf.predict(query)))


    #############################################################################################################

    #Gender Classification Dataset
    print("\nGender Dataset:")

    df = pd.read_csv("../Data/gender.csv")
    #print(df)

    #Split fearures and target
    X,y  = df.drop([df.columns[0]], axis = 1), df[df.columns[0]]

    X_train, y_train = X, y

    gnb_clf = GaussianNB()
    gnb_clf.fit(X_train, y_train)

    print("Train Accuracy: {}".format(accuracy_score(y_train, gnb_clf.predict(X_train))))

    #Query 1:
    query = np.array([[6, 130, 8]])
    print("Query 1:- {} ---> {}".format(query, gnb_clf.predict(query)))

    #Query 2:
    query = np.array([[5, 80, 6]])
    print("Query 2:- {} ---> {}".format(query, gnb_clf.predict(query)))

    #Query 3:
    query = np.array([[7, 140, 14]])
    print("Query 3:- {} ---> {}".format(query, gnb_clf.predict(query)))



naive_bayes.BernoulliNB(*[, alpha, ...]) : **Naive Bayes classifier for multivariate Bernoulli models.**

naive_bayes.CategoricalNB(*[, alpha, ...]) : **Naive Bayes classifier for categorical features.**

naive_bayes.ComplementNB(*[, alpha, ...]) : **The Complement Naive Bayes classifier described in Rennie et al. (2003).**

naive_bayes.GaussianNB(*[, priors, ...]) : **Gaussian Naive Bayes (GaussianNB).**

naive_bayes.MultinomialNB(*[, alpha, ...]) : **Naive Bayes classifier for multinomial models.**

In [None]:
import warnings

from abc import ABCMeta, abstractmethod


import numpy as np
from scipy.special import logsumexp

from .base import BaseEstimator, ClassifierMixin
from .preprocessing import binarize
from .preprocessing import LabelBinarizer
from .preprocessing import label_binarize
from .utils import deprecated
from .utils.extmath import safe_sparse_dot
from .utils.multiclass import _check_partial_fit_first_call
from .utils.validation import check_is_fitted, check_non_negative
from .utils.validation import _check_sample_weight


__all__ = [
    "BernoulliNB",
    "GaussianNB",
    "MultinomialNB",
    "ComplementNB",
    "CategoricalNB",
]

X : array-like of shape (n_samples, n_features) The input samples.

C : ndarray of shape (n_samples,) Predicted target values for X.

_joint_log_likelihood :

    I.e. ``log P(c) + log P(x|c)`` for all rows x of X
    
    predict, predict_proba, and predict_log_proba pass the input
    through _check_X and handle it over to _joint_log_likelihood.
            
predict_log_proba :

    C : array-like of shape (n_samples, n_classes)
         Returns the log-probability of the samples for each class 
         in the model. The columns correspond to the classes in 
         sorted order, as they appear in the attribute 
         :term:`classes_`.

In [None]:

class _BaseNB(ClassifierMixin, BaseEstimator, metaclass=ABCMeta):
    """Abstract base class for naive Bayes estimators"""

    @abstractmethod
    def _joint_log_likelihood(self, X):
        """Compute the unnormalized posterior log probability of X
        """

    @abstractmethod
    def _check_X(self, X):
        """To be overridden in subclasses with the actual checks.
        Only used in predict* methods.
        """

    def predict(self, X):
        """
        Perform classification on an array of test vectors X.
        """
        check_is_fitted(self)
        X = self._check_X(X)
        jll = self._joint_log_likelihood(X)
        return self.classes_[np.argmax(jll, axis=1)]

    def predict_log_proba(self, X):
        """
        Return log-probability estimates for the test vector X.
        """
        1. calculate jll just like predict method
        # normalize by P(x) = P(f_1, ..., f_n)
        2. calculte log of the sum of exponentials of jll using logsumexp() 
            for each column (axis=1)
        return jll - np.atleast_2d(log_prob_x).T

    def predict_proba(self, X):
        """
        Return probability estimates for the test vector X.
        """
        return np.exp(self.predict_log_proba(X))

### Parameters

    priors : array-like of shape (n_classes,)
        Prior probabilities of the classes. If specified, the priors are not
        adjusted according to the data.
    var_smoothing : float, default=1e-9
        Portion of the largest variance of all features that is added to
        variances for calculation stability.
        
### Attributes
 
    class_count_ : ndarray of shape (n_classes,) 
                number of training samples observed in each class.
    class_prior_ : ndarray of shape (n_classes,)  
              probability of each class.
    classes_ : ndarray of shape (n_classes,)  
              class labels known to the classifier.
    epsilon_ : float 
             absolute additive value to variances.
    n_features_in_ : int 
            Number of features seen during :term:`fit`.
    feature_names_in_ : ndarray of shape (`n_features_in_`,) 
            Names of features seen during :term:`fit`. Defined only when `X` has feature
            names that are all trings.
    var_ : ndarray of shape (n_classes, n_features) 
            Variance of each feature perclass.
    theta_ : ndarray of shape (n_classes, n_features) 
            mean of each feature per class.
        
        
**fit()**

    X : array-like of shape (n_samples, n_features) : Training vectors, where 
         `n_samples` is the number of samples and `n_features` is the number of 
         features.
    y : array-like of shape (n_samples,)
                Target values.
     sample_weight : array-like of shape (n_samples,), default=None
                Weights applied to individual samples (1. for unweighted).
                   Gaussian Naive Bayes supports fitting with *sample_weight*.

**_update_mean_variance()**

    Parameters
    ----------
            n_past : int
                Number of samples represented in old mean and variance. If sample
                weights were given, this should contain the sum of sample
                weights represented in old mean and variance.
            mu : array-like of shape (number of Gaussians,)
                Means for Gaussians in original set.
            var : array-like of shape (number of Gaussians,)
                Variances for Gaussians in original set.
            sample_weight : array-like of shape (n_samples,), default=None
                Weights applied to individual samples (1. for unweighted).
     Returns
     -------
            total_mu : array-like of shape (number of Gaussians,)
                Updated mean for each Gaussian over the combined set.
            total_var : array-like of shape (number of Gaussians,)
                Updated variance for each Gaussian over the combined set.
                
                
**_partial_fit()**

        classes : array-like of shape (n_classes,), default=None
            List of all the classes that can possibly appear in the y vector.
            Must be provided at the first call to partial_fit, can be omitted
            in subsequent calls.
        _refit : bool, default=False
            If true, act as though this were the first time we called
            _partial_fit (ie, throw away any past fitting and start over).
            
        This method is expected to be called several times consecutively
        on different chunks of a dataset so as to implement out-of-core
        or online learning.
        This is especially useful when the whole dataset is too big to fit in
        memory at once.
        This method has some performance and numerical stability overhead,
        hence it is better to call partial_fit on chunks of data that are
        as large as possible (as long as fitting in the memory budget) to
        hide the overhead. 
            

In [None]:
class GaussianNB(_BaseNB):
    """
    Gaussian Naive Bayes (GaussianNB).
    Can perform online updates to model parameters via :meth:`partial_fit`.
    """

    def __init__(self, *, priors=None, var_smoothing=1e-9):
        self.priors = priors
        self.var_smoothing = var_smoothing

    def fit(self, X, y, sample_weight=None):
        """Fit Gaussian Naive Bayes according to X, y.
        """
        y = self._validate_data(y=y)
        return self._partial_fit(
            X, y, np.unique(y), _refit=True, sample_weight=sample_weight
        )

    def _check_X(self, X):
        """Validate X, used only in predict* methods."""
        return self._validate_data(X, reset=False)

    @staticmethod
    def _update_mean_variance(n_past, mu, var, X, sample_weight=None):
        """Compute online update of Gaussian mean and variance.
        return the updated mean and variance. (NB - each dimension (column) 
        in X is treated as independent-- you get variance, not covariance).
        """
        if X.shape[0] == 0:
            return mu, var

        1. if sample_weight is given use it to calculate n_new,new_mu,new_var
           else use only X , row-wise(axis=0)   
        2. if n_past == 0 return new_mu, new_var
        3. n_total = float(n_past + n_new)
        4. calculate (weighted) total_mu using n_new,new_mu,n_past,mu
        5. calculate total_var using old & new_var  

        return total_mu, total_var

    def partial_fit(self, X, y, classes=None, sample_weight=None):
        """Incremental fit on a batch of samples.
        """
        return self._partial_fit(
            X, y, classes, _refit=False, sample_weight=sample_weight
        )

    def _partial_fit(self, X, y, classes=None, _refit=False, sample_weight=None):
        """Actual implementation of Gaussian NB fitting."""
        if _refit:
            self.classes_ = None

        first_call = _check_partial_fit_first_call(self, classes)
        X, y = self._validate_data(X, y, reset=first_call)
        
        1. validate data + check sample_weight
        2. variance smoothing to avoid numerical error due to ratio of data 
            variance between dimensions is too small
        3.if first_call:
            1. initalize parameters
            2. initalize priors then some condition checking
                2.1 check provided prior matches no of classes
                2.2 sum of priors should be 1
                2.3 priors should be non-neg
                else : initalize prior to zeros rather than self.priors
        else:
            1. if X.shape[1] != self.theta_.shape[1] then no of feature doesn''t match
            2. put epsilon back 


        4. check for unique classes in y , if all are not unique then labels don''t exist
        5. for each unique class calculate - new_theta,new_sigma & update var with epsilon
           parameter
        6. update in case of no prior provided

        return self

    def _joint_log_likelihood(self, X):
        
        1. for each classes calculate joint log likelihood
        
        return joint_log_likelihood

    @deprecated(  # type: ignore
        "Attribute `sigma_` was deprecated in 1.0 and will be removed in"
        "1.2. Use `var_` instead."
    )
    @property
    def sigma_(self):
        return self.var_


_ALPHA_MIN = 1e-10

In [None]:
class _BaseDiscreteNB(_BaseNB):
    """Abstract base class for naive Bayes on discrete/categorical data
    Any estimator based on this class should provide:
    __init__
    _joint_log_likelihood(X) as per _BaseNB
    _update_feature_log_prob(alpha)
    _count(X, Y)
    """

    @abstractmethod
    def _count(self, X, Y):
        """Update counts that are used to calculate probabilities.
        The counts make up a sufficient statistic extracted from the data.
        Accordingly, this method is called each time `fit` or `partial_fit`
        update the model. `class_count_` and `feature_count_` must be updated
        here along with any model specific counts.
        """

    @abstractmethod
    def _update_feature_log_prob(self, alpha):
        """Update feature log probabilities based on counts.
        This method is called each time `fit` or `partial_fit` update the
        model.
        """

    def _check_X(self, X):
        """Validate X, used only in predict* methods."""
        return self._validate_data(X, accept_sparse="csr", reset=False)

    def _check_X_y(self, X, y, reset=True):
        """Validate X and y in fit methods."""
        return self._validate_data(X, y, accept_sparse="csr", reset=reset)

    def _update_class_log_prior(self, class_prior=None):
        """Update class log priors.
        The class log priors are based on `class_prior`, class count or the
        number of classes. This method is called each time `fit` or
        `partial_fit` update the model.
        """
        1. if dimensions of classes match : class_log_prior_ is np.log(class_prior)
        2. elif sample weight is taken into account during self.fit_prior we need to 
           calculate class_log_prior_ differently
        3. else class_log_prior_ is calculate on whole classes
        
    def _check_alpha(self):
        
        1. alpha should be > 0 , alpha is the smoothing parameter
        2. alpha should be a scalar or a numpy array with shape [n_features]
        3. alpha too small will result in numeric errors
        
        return self.alpha

    def partial_fit(self, X, y, classes=None, sample_weight=None):
        """Incremental fit on a batch of samples."""
        1. check input data X,y,classes
        2. for _check_partial_fit_first_call check if dimension matches
        3. extend Y for multiclass using label_binarize()
            1. binary classifier then 1-Y & Y
            2. single class,degenerate case only Y
        4. check class shape of Y
        5. incorporate weight in calculation of Y
        6. update class_prior + _update_class_log_prior() , 
            you can include alpha in claculation of _update_feature_log_prob()
            
        return self

    def fit(self, X, y, sample_weight=None):
        """Fit Naive Bayes classifier according to X, y.
        """
        1. check data, get parameters ready
        2. use LabelBinarizer() & fit on Y 
        3. check class dimensions for Binary & single class
        4. if sample_weight is present use that in calulation of Y
        5. Count raw events from data before updating the class log prior and feature 
            log probas
        6. calculate _update_feature_log_prob(),_update_class_log_prior(), you can use alpha
           here for smoothing
            
        return self

    def _init_counters(self, n_classes, n_features):
        self.class_count_ = np.zeros(n_classes, dtype=np.float64)
        self.feature_count_ = np.zeros((n_classes, n_features), dtype=np.float64)

    def _more_tags(self):
        return {"poor_score": True}

    # TODO: Remove in 1.2
    # mypy error: Decorated property not supported
    @deprecated(  # type: ignore
        "Attribute `n_features_` was deprecated in version 1.0 and will be "
        "removed in 1.2. Use `n_features_in_` instead."
    )
    @property
    def n_features_(self):
        return self.

### Parameters
    alpha : float, default=1.0
        Additive (Laplace/Lidstone) smoothing parameter
        (0 for no smoothing).

    fit_prior : bool, default=True
        Whether to learn class prior probabilities or not.
        If false, a uniform prior will be used.

### Attributes
    class_log_prior_ : ndarray of shape (n_classes,)
        Smoothed empirical log probability for each class.

    feature_count_ : ndarray of shape (n_classes, n_features)
        Number of samples encountered for each (class, feature)
        during fitting. This value is weighted by the sample weight when
        provided.
    feature_log_prob_ : ndarray of shape (n_classes, n_features)
        Empirical log probability of features
        given a class, ``P(x_i|y)``.

In [None]:
class MultinomialNB(_BaseDiscreteNB):
    """
    Naive Bayes classifier for multinomial models.
    The multinomial Naive Bayes classifier is suitable for classification with
    discrete features (e.g., word counts for text classification). The
    multinomial distribution normally requires integer feature counts. However,
    in practice, fractional counts such as tf-idf may also work.
    """

    def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None):
        self.alpha = alpha
        self.fit_prior = fit_prior
        self.class_prior = class_prior

    def _more_tags(self):
        return {"requires_positive_X": True}

    def _count(self, X, Y):
        """Count and smooth feature occurrences."""
        check_non_negative(X, "MultinomialNB (input X)")
        self.feature_count_ += safe_sparse_dot(Y.T, X)
        self.class_count_ += Y.sum(axis=0)

    def _update_feature_log_prob(self, alpha):
        """Apply smoothing to raw counts and recompute log probabilities"""
        smoothed_fc = self.feature_count_ + alpha
        smoothed_cc = smoothed_fc.sum(axis=1)

        self.feature_log_prob_ = np.log(smoothed_fc) - np.log(smoothed_cc.reshape(-1, 1))

    def _joint_log_likelihood(self, X):
        """Calculate the posterior log probability of the samples X"""
        return safe_sparse_dot(X, self.feature_log_prob_.T) + self.class_log_prior_



In [None]:
class ComplementNB(_BaseDiscreteNB):
    """The Complement Naive Bayes classifier described in Rennie et al. (2003).
    The Complement Naive Bayes classifier was designed to correct the "severe
    assumptions" made by the standard Multinomial Naive Bayes classifier. It is
    particularly suited for imbalanced data sets.
    """

    def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None, norm=False):
        self.alpha = alpha
        self.fit_prior = fit_prior
        self.class_prior = class_prior
        self.norm = norm

    def _more_tags(self):
        return {"requires_positive_X": True}

    def _count(self, X, Y):
        """Count feature occurrences."""
        check_non_negative(X, "ComplementNB (input X)")
        self.feature_count_ += safe_sparse_dot(Y.T, X)
        self.class_count_ += Y.sum(axis=0)
        self.feature_all_ = self.feature_count_.sum(axis=0)

    def _update_feature_log_prob(self, alpha):
        """Apply smoothing to raw counts and compute the weights."""
        comp_count = self.feature_all_ + alpha - self.feature_count_
        logged = np.log(comp_count / comp_count.sum(axis=1, keepdims=True))
        # _BaseNB.predict uses argmax, but ComplementNB operates with argmin.
        if self.norm:
            summed = logged.sum(axis=1, keepdims=True)
            feature_log_prob = logged / summed
        else:
            feature_log_prob = -logged
        self.feature_log_prob_ = feature_log_prob

    def _joint_log_likelihood(self, X):
        """Calculate the class scores for the samples in X."""
        jll = safe_sparse_dot(X, self.feature_log_prob_.T)
        if len(self.classes_) == 1:
            jll += self.class_log_prior_
        return jll


In [None]:
class BernoulliNB(_BaseDiscreteNB):
    """Naive Bayes classifier for multivariate Bernoulli models.
    Like MultinomialNB, this classifier is suitable for discrete data. The
    difference is that while MultinomialNB works with occurrence counts,
    BernoulliNB is designed for binary/boolean features.
    
    """

    def __init__(self, *, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None):
        self.alpha = alpha
        self.binarize = binarize
        self.fit_prior = fit_prior
        self.class_prior = class_prior

    def _check_X(self, X):
        """Validate X, used only in predict* methods."""
        X = super()._check_X(X)
        if self.binarize is not None:
            X = binarize(X, threshold=self.binarize)
        return X

    def _check_X_y(self, X, y, reset=True):
        X, y = super()._check_X_y(X, y, reset=reset)
        if self.binarize is not None:
            X = binarize(X, threshold=self.binarize)
        return X, y

    def _count(self, X, Y):
        """Count and smooth feature occurrences."""
        self.feature_count_ += safe_sparse_dot(Y.T, X)
        self.class_count_ += Y.sum(axis=0)

    def _update_feature_log_prob(self, alpha):
        """Apply smoothing to raw counts and recompute log probabilities"""
        smoothed_fc = self.feature_count_ + alpha
        smoothed_cc = self.class_count_ + alpha * 2

        self.feature_log_prob_ = np.log(smoothed_fc) - np.log(smoothed_cc.reshape(-1, 1))

    def _joint_log_likelihood(self, X):
        """Calculate the posterior log probability of the samples X"""
        n_features = self.feature_log_prob_.shape[1]
        n_features_X = X.shape[1]

        if n_features_X != n_features:
            raise ValueError(
                "Expected input with %d features, got %d instead"
                % (n_features, n_features_X)
            )

        neg_prob = np.log(1 - np.exp(self.feature_log_prob_))
        # Compute  neg_prob · (1 - X).T  as  ∑neg_prob - X · neg_prob
        jll = safe_sparse_dot(X, (self.feature_log_prob_ - neg_prob).T)
        jll += self.class_log_prior_ + neg_prob.sum(axis=1)

        return jll


In [None]:
class CategoricalNB(_BaseDiscreteNB):
    """Naive Bayes classifier for categorical features.
    The categorical Naive Bayes classifier is suitable for classification with
    discrete features that are categorically distributed. The categories of
    each feature are drawn from a categorical distribution.
    
    """

    def __init__(
        self, *, alpha=1.0, fit_prior=True, class_prior=None, min_categories=None
    ):
        self.alpha = alpha
        self.fit_prior = fit_prior
        self.class_prior = class_prior
        self.min_categories = min_categories

    def fit(self, X, y, sample_weight=None):
        """Fit Naive Bayes classifier according to X, y.
        """
        return super().fit(X, y, sample_weight=sample_weight)

    def partial_fit(self, X, y, classes=None, sample_weight=None):
        """Incremental fit on a batch of samples.
        """
        return super().partial_fit(X, y, classes, sample_weight=sample_weight)

    def _more_tags(self):
        return {"requires_positive_X": True}

    def _check_X(self, X):
        """Validate X, used only in predict* methods."""
        X = self._validate_data(
            X, dtype="int", accept_sparse=False, force_all_finite=True, reset=False
        )
        check_non_negative(X, "CategoricalNB (input X)")
        return X

    def _check_X_y(self, X, y, reset=True):
        X, y = self._validate_data(
            X, y, dtype="int", accept_sparse=False, force_all_finite=True, reset=reset
        )
        check_non_negative(X, "CategoricalNB (input X)")
        return X, y

    def _init_counters(self, n_classes, n_features):
        self.class_count_ = np.zeros(n_classes, dtype=np.float64)
        self.category_count_ = [np.zeros((n_classes, 0)) for _ in range(n_features)]

    @staticmethod
    def _validate_n_categories(X, min_categories):
        1. rely on max for n_categories categories are encoded between 0...n-1
        2. error check for min_categories
            2.1. integral type
            2.2. shape & array-like object
        3. return n_categories_ or n_categories_X
        

    def _count(self, X, Y):
        def _update_cat_count_dims(cat_count, highest_feature):
            1. if difference between highest_feature & cat_count non-zero ppend a 
                    column full of zeros for each new category
            return cat_count

        def _update_cat_count(X_feature, Y, cat_count, n_classes):
            1. for each class create a mask=Y[:,j] then use this mask to get number of 
                bincount()
            2. cat_count[j, indices] += counts[indices]
            3. calculate class_count
            4. validate n_categories_
            5. for each feature update _update_cat_count_dims & _update_cat_count
            

    def _update_feature_log_prob(self, alpha):
        feature_log_prob = []
        1. for each feature 
            1.1 smooth category_count_ using alpha + get the sum
            1.2 store difference between log of smoothed_cat_count & smoothed_class_count
        2. update self.feature_log_prob_ = feature_log_prob

    def _joint_log_likelihood(self, X):
        1. do feature check
        2. for each feature
            2.1 calculate sum of feature_log_prob_ for each indices = X[:, i]
        3. total_ll = jll + self.class_log_prior_
        return 