# Using the Naive Bayes Classifier To Predict Class


Consider the data given in Table 2 from an employee database, where "Status" is the class attribute. Using the Naive-Bayes classifer, predict the class value of the   
**record: (`marketing', 31-35,46k-50k)**. _Show all the steps_.

<pre>

| Department | Age   | Salary  | Status | 
|------------|-------|---------|--------| 
| marketing  | 26-30 | 46K-50K | junior | 
| marketing  | 31-35 | 41K-45K | junior | 
| secretary  | 46-50 | 36K-40K | senior | 
| sales      | 26-30 | 25K-30K | junior | 
| systems    | 41-45 | 66K-70K | senior | 
| sales      | 31-35 | 46K-50K | junior | 
| marketing  | 36-40 | 46K-50K | senior | 
| systems    | 21-25 | 46K-50K | junior | 
| systems    | 31-35 | 66K-70K | senior | 
| secretary  | 26-30 | 25K-30K | junior | 
| sales      | 31-35 | 31K-35K | junior | 

</pre>



## Extract Amounts of Elements
_ Not super-important to understand numpy to understand bayes rule, but 
I implemented the amount specification with numpy; Some tutorials about numpy_  
[Numpy](http://cs231n.github.io/python-numpy-tutorial/)  
[Csv to Numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)  
[Filtering, Vectorization, Numpy](http://stackoverflow.com/questions/11953111/numpy-how-to-filter-matrix-lines)    
[Numpy get Index](http://stackoverflow.com/questions/18079029/index-of-element-in-numpy-array)  
[Numpy 1 element ndarray to int](http://stackoverflow.com/questions/30311172/convert-list-or-numpy-array-of-single-element-to-float-in-python)  

**Numbers that are important to use the bayes rule**

* Number of junior status members 
* Number of senior status members
* Number of all members
* Number of junior status members who are in Department of Marketing
* Number of junior status members who are in Agegroup 31-35
* Number of junior status members who are receive Salary in range 46k-50k

In [122]:
import numpy as np
completeFile = np.genfromtxt('bayes.csv', delimiter=';',  dtype = str )
header=completeFile[0,:]
data=completeFile[1:,:]


jun=data[:,-1]=="junior"
sen=data[:,-1]=="senior"
junior=data[jun,:]
senior=data[sen,:]


print("Sample size ",len(data))
print("Junior members", len(junior))
print("Senior members", len(senior))

# Juniors Marketing 
department= int(np.where( header == "Department")[0])
marketing=junior[:,department]=="marketing"
jun_mark=len(junior[marketing,:])
# Juniors 31-35
age= int(np.where(header=="Age")[0])
middleAged=junior[:,age]=="31-35"
jun_mAge=len(junior[middleAged,:])
# Juniors 45-50k
salary=int(np.where(header=="Salary")[0])
k46_50=junior[:,salary]=="46K-50K"
jun_46K_50K=len(junior[k46_50,:])

print("Junior and marketing ", jun_mark)
print("Junior and 31-35" , jun_mAge)
print("Junior and 45-50k", jun_46K_50K)

# seniors Marketing 
marketing=senior[:,department]=="marketing"
sen_mark=len(senior[marketing,:])
# seniors 31-35
middleAged=senior[:,age]=="31-35"
sen_mAge=len(senior[middleAged,:])
# seniors 45-50k
k46_50=senior[:,salary]=="46K-50K"
sen_46K_50K=len(senior[k46_50,:])


print("Senior and marketing ", sen_mark)
print("Senior and 31-35" , sen_mAge)
print("Senior and 45-50k", sen_46K_50K)

('Sample size ', 11)
('Junior members', 7)
('Senior members', 4)
('Junior and marketing ', 2)
('Junior and 31-35', 3)
('Junior and 45-50k', 3)
('Senior and marketing ', 1)
('Senior and 31-35', 1)
('Senior and 45-50k', 1)


## Naive Bayes Algorithm 

We now use the Naive Bayes Algorithm to predict Class for example: **record: (`marketing', 31-35,46k-50k)**. 

We assume independence of conditional probabilities, that means 
$$ P(X \mid C) = P(X1 \mid Ci)\,P(X2 \mid Ci)\,P(X3 \mid Ci) $$  

C stands for random variable class; X stands for random variable feature  
$$ P(C \mid X) = \frac{P(X1 \mid Ci)\,P(X2 \mid Ci)\,P(X3 \mid Ci) \, P(Ci)}{P(X)} $$


P( X ) stays the same always, thus we dont need to keep it for calculation. The result we get is proportional to 
$$P(C \mid X) = P(X1 \mid Ci)\,P(X2 \mid Ci)\,P(X3 \mid Ci) \, P(Ci)$$

I also assume, that price ranges are discrete variables, thus I will just calculate the conditional probabilities. 





In [129]:
def bayesCalculator(sampleSize,nrClasses, lConditionalFeatures):

    sampleSize=float(sampleSize)
    nrClasses=float(nrClasses)
    lCoPr=[]
    for f in lConditionalFeatures:
        lCoPr.append(float(f)/float(nrClasses))
    ans=1.0q
    print("Conditional Probabilities", lCoPr)

    print("Probability of class ", nrClasses/sampleSize  )
    for i in lCoPr:
        ans*=i
        
    ans=ans*nrClasses/sampleSize
    print("P(C|X)" , ans)
    
# Status = senior

print("Junior")
bayesCalculator(len(data), len(junior), [jun_46K_50K,jun_mark,jun_mAge])

print("Senior")
bayesCalculator(len(data), len(senior), [sen_46K_50K,sen_mark,sen_mAge])

Junior
('Conditional Probabilities', [0.42857142857142855, 0.2857142857142857, 0.42857142857142855])
('Probability of class ', 0.6363636363636364)
('P(C|X)', 0.0333951762523191)
Senior
('Conditional Probabilities', [0.25, 0.25, 0.25])
('Probability of class ', 0.36363636363636365)
('P(C|X)', 0.005681818181818182)


## Rule based on probabilities
Because P(C|X) is greater for junior we assume, that This feature belongs to the junior status. 