# **Naive Bayes Classifier**

## **Loan Prediction**

### **Upload and Read Data File**

In [1]:
from google.colab import files
uploaded = files.upload()

Saving simple_loan.csv to simple_loan.csv


In [2]:
import numpy as np
import pandas as pd
df= pd.read_csv('simple_loan.csv')
df

Unnamed: 0,age,employed,own_house,credit,target
0,young,False,n,fair,no
1,young,False,n,good,no
2,young,True,n,good,yes
3,young,True,y,fair,yes
4,young,False,n,fair,no
5,middle,False,n,fair,no
6,middle,False,n,good,no
7,middle,True,y,good,yes
8,middle,False,y,excellent,yes
9,middle,False,y,excellent,yes


Separate data between dependent and independent variables

In [3]:
X=df.drop(['target'], axis=1)
y=df.target

In [8]:
X, y

(       age  employed own_house     credit
 0    young     False         n       fair
 1    young     False         n       good
 2    young      True         n       good
 3    young      True         y       fair
 4    young     False         n       fair
 5   middle     False         n       fair
 6   middle     False         n       good
 7   middle      True         y       good
 8   middle     False         y  excellent
 9   middle     False         y  excellent
 10     old     False         y  excellent
 11     old     False         y       good
 12     old      True         n       good
 13     old      True         n  excellent
 14     old     False         n       fair
 15     old     False         n  excellent
 16   young      True         y       fair,
 0      no
 1      no
 2     yes
 3     yes
 4      no
 5      no
 6      no
 7     yes
 8     yes
 9     yes
 10    yes
 11    yes
 12    yes
 13    yes
 14     no
 15    yes
 16    yes
 Name: target, dtype: object)

### **Label Encoding**

In [10]:
from sklearn.preprocessing import LabelEncoder
def labelEncode(data,columns):
  for i in columns:
    lb=LabelEncoder().fit_transform(data[i])
    data[i+'_'] = lb

In [20]:
f_columns=['age', 'employed','own_house', 'credit']
labelEncode(X,f_columns)
X

Unnamed: 0,age,employed,own_house,credit,age_,employed_,own_house_,credit_
0,young,False,n,fair,2,0,0,1
1,young,False,n,good,2,0,0,2
2,young,True,n,good,2,1,0,2
3,young,True,y,fair,2,1,1,1
4,young,False,n,fair,2,0,0,1
5,middle,False,n,fair,0,0,0,1
6,middle,False,n,good,0,0,0,2
7,middle,True,y,good,0,1,1,2
8,middle,False,y,excellent,0,0,1,0
9,middle,False,y,excellent,0,0,1,0


In [19]:
y_le=LabelEncoder()
y1=y_le.fit_transform(y)
y1

array([0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1])

Labeling each attribute based on the letter of the Alphabet

In [21]:
X1=X[['age_', 'employed_','own_house_', 'credit_']]
X1

Unnamed: 0,age_,employed_,own_house_,credit_
0,2,0,0,1
1,2,0,0,2
2,2,1,0,2
3,2,1,1,1
4,2,0,0,1
5,0,0,0,1
6,0,0,0,2
7,0,1,1,2
8,0,0,1,0
9,0,0,1,0


### **Model Construction**

In [22]:
from sklearn.naive_bayes import CategoricalNB
model=CategoricalNB()
model.fit(X1,y1)

$P(x_i=t|y=c;α)=\frac{N_{tic}+α}{N_c+αn_i}$

In [23]:
print(model.feature_log_prob_)

[array([[-1.09861229, -1.5040774 , -0.81093022],
       [-1.25276297, -0.84729786, -1.25276297]]), array([[-0.13353139, -2.07944154],
       [-0.77318989, -0.61903921]]), array([[-0.13353139, -2.07944154],
       [-0.95551145, -0.48550782]]), array([[-2.19722458, -0.58778666, -1.09861229],
       [-0.84729786, -1.54044504, -1.02961942]])]


**Age**

Log(P(age=middle|target=no)) = -1.09861229

Log(P(age=old|target=no)) = -1.5040774

Log(P(age=young|target=no)) = -0.81093022

Log(P(age=middle|target=yes)) = -1.25276297

Log(P(age=old|target=yes)) = -0.84729786

Log(P(age=young|target=yes)) = -1.25276297

**Employed**

Log(P(employed=false|target=no)) = -0.13353139

Log(P(employed=true|target=no)) = -2.07944154

Log(P(employed=false|target=yes)) = -0.77318989

Log(P(employed=true|target=yes)) = -0.61903921

**Owned_House**

Log(P(own_house=n|target=no)) = -0.13353139

Log(P(own_house=y|target=no)) = -2.07944154

Log(P(own_house=n|target=yes)) = -0.95551145

Log(P(own_house=y|target=yes)) = -0.48550782

**Credit**

Log(P(credit=excellent|target=no)) = -2.19722458

Log(P(credit=fair|target=no)) = -0.58778666

Log(P(credit=good|target=no)) = -1.09861229

Log(P(credit=excellent|target=yes)) = -0.84729786

Log(P(credit=fair|target=yes)) = -1.54044504

Log(P(credit=good|target=yes)) = -1.02961942

In [24]:
print(model.category_count_)

[array([[2., 1., 3.],
       [3., 5., 3.]]), array([[6., 0.],
       [5., 6.]]), array([[6., 0.],
       [4., 7.]]), array([[0., 4., 2.],
       [5., 2., 4.]])]


**Age**

Count(age=middle && target=no) = 2

Count(age=old && target=no) = 1

Count(age=young && target=no) = 3

Count(age=middle && target=yes) = 3

Count(age=old && target=yes) = 5

Count(age=young && target=yes) = 3

**Employed**

Count(employed=false && target=no) = 6

Count(employed=true && target=no) = 0

Count(employed=false && target=yes) = 5

Count(employed=true && target=yes) = 6

**Owned_House**

Count(own_house=n && target=no) = 6

Count(own_house=y && target=no) = 0

Count(own_house=n && target=yes) = 4

Count(own_house=y && target=yes) = 7

1. age =“middle”, employed = “true”, own_house = “y”, credit= “fair”
2. age =“old”, employed = “false”, own_house = “n”, credit= “good”

### **Model Prediction**

In [29]:
new_input=pd.DataFrame([[0,1,1,1],[1,0,0,2]],columns=['age_','employed_','own_house_', 'credit_'])
y_prob_pred = model.predict_proba(new_input)
y_prob_pred

array([[0.0721808 , 0.9278192 ],
       [0.53238717, 0.46761283]])

In [33]:
y_new_predict=model.predict(new_input)
y_new_predict

array([1, 0])

In [34]:
n=1
for i in y_new_predict:
  print( 'No' ,n, '=>: ',y_le.classes_[i])
  n=n+1

No 1 =>:  yes
No 2 =>:  no
