# Naive Bayes
``` text 
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem
It is not a single algorithm but a family of algorithms where all of them share a common principle,
ie every pair of features being classified is independent of each other
```

dataset that describes the weather conditions for playing a game of golf Given the weather conditions, 
each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing golf

``` text

Outlook	    Temperature	        Humidity	         Windy	        Play Golf
-------------------------------------------------------------------------------
Rainy	        Hot	            High	        False	            No
Rainy	        Hot	            High	        True	            No
Overcast        Hot	            High	        False	            Yes
Sunny	        Mild	        High	                False	            Yes
Sunny	        Cool	        Normal	                False	            Yes
Sunny	        Cool	        Normal	                True	            No
Overcast        Cool	        Normal	                True	            Yes
Rainy	        Mild	        High	                False	            No
Rainy	        Cool	        Normal	                False	            Yes
Sunny	        Mild	        Normal	                False	            Yes
Rainy	        Mild	        Normal	                True	            Yes
Overcast        Mild	        High	                True	            Yes
Overcast        Hot	            Normal	        False	            Yes
Sunny	        Mild	        High	                True	            No


The fundamental Naive Bayes assumption is that each feature makes an:

independent equal , contribution to the outcome

With relation to our dataset, this concept can be understood as:
We assume that no pair of features are dependent For example, the temperature being ‘Hot’ has nothing to do with the humidity
 or the outlook being ‘Rainy’ has no effect on the winds Hence, the features are assumed to be independent
Secondly, each feature is given the same weight(or importance) For example, knowing only temperature 
and humidity alone can’t predict the outcome accurately None of the attributes is irrelevant and assumed to be contributing equally to the outcome
```

# Bayes’ Theorem  
```text 
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred
Bayes’ theorem is stated mathematically as the following equation:

P(A|B) = \frac{P(B|A) P(A)}{P(B)}  , where A and B are events and P(B) ≠ 0

* Basically, we are trying to find probability of event A, given the event B is true Event B is also termed as evidence
* P(A) is the priori of A (the prior probability, ie Probability of event before evidence is seen)
  The evidence is an attribute value of an unknown instance(here, it is event B)
* P(A|B) is a posteriori probability of B, ie probability of event after evidence is seen

we can apply Bayes’ theorem in following way:

P(y|X) = \frac{P(X|y) P(y)}{P(X)} 

Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among the features So now, we split evidence into the independent parts

Now, if any two events A and B are independent, then,

P(A,B) = P(A)P(B)
Hence, we reach to the result:

 P(y|x_1,,x_n) = \frac{ P(x_1|y)P(x_2|y)P(x_n|y)P(y)}{P(x_1)P(x_2)P(x_n)} 

which can be expressed as:

 P(y|x_1,,x_n) = \frac{P(y)\prod_{i=1}^{n}P(x_i|y)}{P(x_1)P(x_2)P(x_n)} 

Now, as the denominator remains constant for a given input, we can remove that term:

 P(y|x_1,,x_n)\propto P(y)\prod_{i=1}^{n}P(x_i|y) 

Now, we need to create a classifier model For this, we find the probability of given set of inputs for all possible values of the class variable y and pick up the output with maximum probability This can be expressed mathematically as:

y = argmax_{y} P(y)\prod_{i=1}^{n}P(x_i|y) 


We need to find P(xi | yj) for each xi in X and yj in y All these calculations have been demonstrated in the tables below:
```
![imagepng](attachment:imagepng)

# probability of playing golf is given by:
 P(Yes | today) = \frac{P(Sunny Outlook|Yes)P(Hot Temperature|Yes)P(Normal Humidity|Yes)P(No Wind|Yes)P(Yes)}{P(today)} 


# probability to not play golf is given by:
 P(No | today) = \frac{P(Sunny Outlook|No)P(Hot Temperature|No)P(Normal Humidity|No)P(No Wind|No)P(No)}{P(today)} 

* we can ignore P(today) and find proportional probabilities as:

    P(Yes | today) \propto \frac{2}{9}\frac{2}{9}\frac{6}{9}\frac{6}{9}\frac{9}{14} \approx 00141 

    P(No | today) \propto \frac{3}{5}\frac{2}{5}\frac{1}{5}\frac{2}{5}\frac{5}{14} \approx 00068 

since

 P(Yes | today) + P(No | today) = 1 

* These numbers can be converted into a probability by making the sum equal to 1 (normalization):

    P(Yes | today) = \frac{00141}{00141 + 00068} = 067 

    P(No | today) = \frac{00068}{00141 + 00068} = 033 

Since

    P(Yes | today) > P(No | today) 

### prediction that golf would be played is ‘Yes’

# Code

In [48]:
import numpy as np
from sklearn.datasets import load_iris
import pandas as pd 	
import matplotlib.pyplot as plt 
import math

In [49]:
def pre_processing(df):

	X = df.drop([df.columns[-1]], axis = 1)
	y = df[df.columns[-1]]

	return X, y

In [51]:
df = pd.read_csv("data.csv")

df

Unnamed: 0,Outlook,Temp,Humidity,Windy,play
0,Rainy,Hot,High,f,no
1,Rainy,Hot,High,t,no
2,Overcast,Hot,High,f,yes
3,Sunny,Mild,High,f,yes
4,Sunny,Cool,Normal,f,yes
5,Sunny,Cool,Normal,t,no
6,Overcast,Cool,Normal,t,yes
7,Rainy,Mild,High,f,no
8,Rainy,Cool,Normal,f,yes
9,Sunny,Mild,Normal,f,yes


In [52]:
X,y  = pre_processing(df)

In [53]:
X

Unnamed: 0,Outlook,Temp,Humidity,Windy
0,Rainy,Hot,High,f
1,Rainy,Hot,High,t
2,Overcast,Hot,High,f
3,Sunny,Mild,High,f
4,Sunny,Cool,Normal,f
5,Sunny,Cool,Normal,t
6,Overcast,Cool,Normal,t
7,Rainy,Mild,High,f
8,Rainy,Cool,Normal,f
9,Sunny,Mild,Normal,f


In [54]:
y

0      no
1      no
2     yes
3     yes
4     yes
5      no
6     yes
7      no
8     yes
9     yes
10    yes
11    yes
12    yes
13     no
Name: play, dtype: object

# Calculate Prior Probability of Classes P(y)
* P(Play=Yes) = 9/14 = 0.64
* P(Play=No) = 5/14 = 0.36

In [40]:
features = list
likelihoods = {}
class_priors = {}
pred_priors = {}

X_train = np.array
y_train = np.array
train_size = int
num_feats = int

In [64]:
def _calc_class_prior():

		""" P(c) - Prior Class Probability """

		for outcome in np.unique(y_train):
			outcome_count = np.count_nonzero(y_train == outcome)
			class_priors[outcome] = outcome_count / train_size

# Calculate the Likelihood Table for all features
``` text
#Likelihood Table
#Outlook
Play Overcast Rainy Sunny 
Yes  4/9      2/9   3/9
No   0/5      3/5   2/5
     ___      ___   ___
     4/14     5/14  5/14
#Temp
Play  Cool  Mild  Hot
Yes   3/9   4/9   2/9
No    1/5   2/5   2/5
      ___   ___   ___
      4/14  6/14  4/14
#Humidity 
Play  High  Normal
Yes   3/9   6/9
No    4/5   1/5
      ___   ___  
      7/14  7/14 
#Windy
Play   f     t
Yes    6/9   3/9
No     2/5   3/5
       ___   ___ 
       8/14  6/14
```  

In [59]:
def _calc_likelihoods():

		""" P(x|c) - Likelihood """

		for feature in features:

			for outcome in np.unique(y_train):
				outcome_count = sum(y_train == outcome)
				feat_likelihood = X_train[feature][y_train[y_train == outcome].index.values.tolist()].value_counts().to_dict()

				for feat_val, count in feat_likelihood.items():
					likelihoods[feature][feat_val + '_' + outcome] = count/outcome_count

In [68]:
def _calc_predictor_prior():

		""" P(x) - Evidence """

		for feature in features:
			feat_vals = X_train[feature].value_counts().to_dict()

			for feat_val, count in feat_vals.items():
				pred_priors[feature][feat_val] = count/train_size

In [65]:
def fit( X, y):
		global num_feats
		global train_size
		global features
		global X_train
		global y_train
		features = list(X.columns)
		X_train = X
		y_train = y
		train_size = X.shape[0]
		num_feats = X.shape[1]

		for feature in features:
			likelihoods[feature] = {}
			pred_priors[feature] = {}

			for feat_val in np.unique(X_train[feature]):
				pred_priors[feature].update({feat_val: 0})

				for outcome in np.unique(y_train):
					likelihoods[feature].update({feat_val+'_'+outcome:0})
					class_priors.update({outcome: 0})

		_calc_class_prior()
		_calc_likelihoods()
		_calc_predictor_prior()

In [46]:
def predict( X):

		""" Calculates Posterior probability P(c|x) """

		results = []
		X = np.array(X)

		for query in X:
			probs_outcome = {}
			for outcome in np.unique(y_train):
				prior = class_priors[outcome]
				likelihood = 1
				evidence = 1

				for feat, feat_val in zip(features, query):
					likelihood *= likelihoods[feat][feat_val + '_' + outcome]
					evidence *= pred_priors[feat][feat_val]

				posterior = (likelihood * prior) / (evidence)

				probs_outcome[outcome] = posterior

			result = max(probs_outcome, key = lambda x: probs_outcome[x])
			results.append(result)

		return np.array(results)

In [47]:
def accuracy_score(y_true, y_pred):

	"""	score = (y_true - y_pred) / len(y_true) """

	return round(float(sum(y_pred == y_true))/float(len(y_true)) * 100 ,2)

In [72]:
if __name__ == "__main__":
	fit(X, y)

	print("Train Accuracy: {}".format(accuracy_score(y, predict(X))))
	
	#Query 1:
	query = np.array([['Rainy','Mild', 'Normal', 't']])
	print("Query 1:- {} ---> {}".format(query, predict(query)))

	#Query 2:
	query = np.array([['Overcast','Cool', 'Normal', 't']])
	print("Query 2:- {} ---> {}".format(query, predict(query)))

	

Train Accuracy: 92.86
Query 1:- [['Rainy' 'Mild' 'Normal' 't']] ---> ['yes']
Query 2:- [['Overcast' 'Cool' 'Normal' 't']] ---> ['yes']
