# $$ \text{Baye's Theorem } $$



### $$P(y|x) = \frac{P(x|y)P(y)}{P(x)}$$

#### $$\text{where:}$$

\begin{align*}
P(y) & : \text{prior probability of event y} \\
P(y|x) & : \text{probability of event y given event x} \\
P(x|y) & : \text{probability of event x given event y} \\
P(x) & : \text{prior probability of event x}
\end{align*}


##### $$\text{ Assume that features are mutually independent (Naive assumption)}$$

##### $$\text{so we get in our case}$$

#### $$P(y|\mathbf{X}) = \frac{\prod_{i=1}^{n} P(x_i|y)P(y)}{P(\mathbf{X})}$$

##### $$\text{where:}$$

\begin{align*}
\mathbf{X} & : \text{vector of features } (x_1, x_2, \ldots, x_n) \\
x_i & : \text{individual feature} \\
n & : \text{number of features} \\
y & : \text{event that the person would live in the next 10 years} \\
P(y) & : \text{prior probability of event } y \\
P(y|\mathbf{X}) & : \text{probability of event } y \text{ given features } \mathbf{X} \text{  (Posterior) } \\
P(\mathbf{X}|y) & : \text{probability of features } \mathbf{X} \text{ given event } y \text{  (LikeHood) } \\
P(\mathbf{X}) & : \text{prior probability of features } \mathbf{X}
\end{align*}


##### $$\text{Then to select class with highest posterior probability }$$

### $$y = \arg\max_y P(y|\mathbf{X}) = \arg\max_y \frac{\prod_{i=1}^{n} P(x_i|y)P(y)}{P(\mathbf{X})}$$

##### $$ \text{Since } P(\mathbf{X}) \text{does not depend on our posterior probability at all we can neglect it} $$

### $$\arg\max_y P(y|\mathbf{X}) = \arg\max_y \left( \prod_{i=1}^{n} P(x_i|y)P(y) \right)$$

##### $$\text{As the values of the probabilities is between 0 and 1,}$$
##### $$\text{so multiplying them would result in a very small number,}$$
##### $$\text{so we would apply log function to allow us to change the multiplicatoin to summation as follows :}$$

## $$y = \arg\max_y \left( \sum_{i=1}^{n} \log(P(x_i|y)) \right) + \log(P(y))$$







##### $$\text{Finally, We need to calculate the following}$$

##### $$\text{P(y)    Prior probability  -> Frequency of each class} $$

##### $$P(\mathbf{x_i}|y) \text{     class conditional probability  -> Model with Gaussian}$$

## $$P(x_{i}\mid y) = \frac{1}{\sqrt{2\pi \sigma_y^{2}}} \exp \left(-\frac{(x_{i} -\mu_{y})^2}{2\sigma_y^{2}} \right)$$

# Steps

### Training
- Calculate mean,var and prior(frequency) for each class

### Predictions
- Calculate posterior for each class
- Choose class with highest posterior probability 

In [None]:
class NaiveBayes:
    def fit(self,X,y):
        n_samples , n_features = X.shape
        self._classes = np.unique(y) # 0 or 1
        n_classes = len(self._classes)

        self._mean = np.zeros((n_classes , n_features) , dtype=np.float64)
        self._var = np.zeros((n_classes , n_features) , dtype=np.float64)
        self._priors = np.zeros(n_classes  , dtype=np.float64)
        for i , c in enumerate(self._classes):
            X_c = X[y == c]
            self._mean[i, :] = X_c.mean(axis = 0) #mean of each feature in each class
            self._var[i,:] = X_c.var(axis = 0)
            self._priors[i]=X_c.shape[0] / float(n_samples) #prob of each class
    def predict(self,X):
        Z=np.array(X)
        y_pred = [self._predict(x) for x in Z]
        return np.array(y_pred)
    
    def _predict(self,x):
            posteriors = []
            for i , c in enumerate(self._classes): #calculate posterior for each class
                prior = np.log(self._priors[i])
                
                posterior = np.sum(np.log(self._pdf(i,x))) #gaussian model 
                posterior += prior
                posteriors.append(posterior)
            return self._classes[np.argmax(posteriors)]

    def _pdf(self, i , x):
        mean = self._mean[i]
        var = self._var[i]
        
        numerator  = np.exp(-(((x - mean) **2) / (2 * var)))
        doneminator = np.sqrt(2 *np.pi * var)
        return numerator / doneminator

In [None]:
def accuracy(a,b):
    return np.sum(a == b) / len( a)

In [None]:
NB=NaiveBayes()
NB.fit(X_train,y_train)
y_pred = NB.predict(X_test)


accuracy(y_test,y_pred)

0.7613741875580315