## Model definition:
$$\eqalign{
  & \bar y = \arg \mathop {\max }\limits_y p\left( {Y = y|X = x} \right) = \arg \mathop {\max }\limits_y \frac{{p\left( {Y = y} \right)p(X = x|Y = y)}}{{P(X = x)}}  \cr 
  &  = \arg \mathop {\max }\limits_y p\left( {Y = y} \right)\prod\limits_{i = 1}^n {p({x_i}|Y = y) = \arg \mathop {\max }\limits_y \ln \left( {p\left( {Y = y} \right)p({x_i}|Y = y)} \right)}   \cr 
  &  = \arg \mathop {\max }\limits_y \ln p\left( {Y = y} \right) + \sum\limits_{i = 1}^d {\ln } p({x_i}|Y = y) \cr} $$

${\bar y}$ is the prediction

The first equal symbol: $y_i$ is the label that has the highest probability given vector x.

The second equal symbol: it is the result of Bayes Theorem.

The third equal symbol: it is due to $x_i$ being independent with each other. However, in practice, it is impossible. Therefore, it is called naive.


## How to fit it:

We calculate the the prior distribution, variances and the mean of all features of each label.
$${\mu _{y,i}} = \frac{{sum\,of\,values\,of\,{x_i}\,in\,obvervation\,y}}{{\# obvervation\,of\,y}}$$
\
\
$${\sigma ^2}_{y,i} = \frac{{\sum\limits_{j = 1}^{\# obvervation\,\,\,of\,\,y} {({x_{j,i}} - {\mu _{y,i}})} }}{{\# obvervation\,\,of\,y\,\,\,\,}}$$



## How to use it:
1. Separate the samples of the labels into different matrixs.
2. Calculate the prior distribution, mean and variances of each label and put them into a dictionary.
3. Find the class-conditional distribution of the labels by using the below formula with $x_i$ is the element from input:
$$ p\left( {{x_i}|y} \right) = \frac{1}{{\sqrt {2\pi } {\sigma _{y,i}}}}{e^{ - \frac{{{{\left( {{x_i} - {\mu _{y,i}}} \right)}^2}}}{{2{\sigma _{y,i}}^2}}}}$$
4. Calculate the probability of each label given the input.
5. Return the label that has the highest probability from 4th step as the prediction.

In [8]:
from math import sqrt
from math import pi
import numpy as np
class NaiveBayer:
	def __init__(self):
		pass

	def fit(self,X,y):
		self.X = X
		self.y = y
		self.labels = np.unique(y)
		self.summaries = {}
		for label in self.labels:
			X_copy = X[y == label]
			means = X_copy.mean(axis = 0)
			std = X_copy.std(axis = 0)
			prior = len(X_copy)/len(X)
			self.summaries[label] = (prior, means , std)

	def predict(self, x):
		predictions = {}
		for label in self.labels:
			prior_probability, mean, std = np.log(self.summaries[label][0]), self.summaries[label][1], self.summaries[label][2]
			for idx in range(len(x)):
				if mean[idx] != 0:
					prior_probability += -((x[idx] - mean[idx])**2 / (2 * std[idx]**2 )) - np.log(std[idx])
			predictions[label] = prior_probability

		a = float('-inf')
		for key in predictions:
			if predictions[key] > a:
				a = predictions[key]
				prediction = key
		return prediction

In [10]:
import pandas as pd
import numpy as np
df_train = pd.read_csv("../data/digit-recognizer/data/train.csv")
df_test = pd.read_csv("../data/digit-recognizer/data/test.csv")

train = df_train.to_numpy()
X = train[:,1:]
y = train[:,0]
classifier = NaiveBayer()
classifier.fit(X, y)

f = open("../data/digit-recognizer/outputs/submission_nb.csv", "w")
f.write("ImageId,Label\n" )
test = df_test.to_numpy()
for i in range(len(test)):
    a = classifier.predict(test[i])
    f.write(str(i+1) + "," + str(a) +"\n")
f.close()