### Naive Bayes Classification
#### Classification Workflow
First step is to understand the problem and identify potential features and label. Features are those characteristics or attributes which affect the results of the label. The classification has two phases, a learning phase, and the evaluation phase. In the learning phase, classifier trains its model on a given dataset and in the evaluation phase, it tests the classifier performance. Performance is evaluated on the basis of various parameters such as accuracy, error, precision, and recall.
<br >
#### Naive Bayes Overview:
Naive Bayes is a statistical classification technique based on Bayes Theorem. It assumes that the effect of a particular feature in a class is independent of other features. For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location.
#### Bayes Theorem:
<img src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1543836882/image_3_ijznzs.png">
<ul>
    <li>P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior probability of h.</li>
    <li>P(D): the probability of the data (regardless of the hypothesis). This is known as the prior probability.</li>
    <li>P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.</li>
    <li>P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior probability.</li>
</ul>

#### How Naive Bayes Work:
<ol>
    <li>Calculate the prior probability for given class labels</li>
    <li>Find Likelihood probability with each attribute for each class</li>
    <li>Put these value in Bayes Formula and calculate posterior probability.</li>
<li>See which class has a higher probability, given the input belongs to the higher probability class.</li>
</ol>
<img src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1543836883/image_4_lyi0ob.png">
Now suppose you want to calculate the probability of playing when the weather is overcast.<br>
<b>Probability of playing:</b>

<code>P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast)</code> .....................(1)

Calculate Prior Probabilities:

<code>P(Overcast)</code> = 4/14 = 0.29

<code>P(Yes)</code>= 9/14 = 0.64

Calculate Posterior Probabilities:

<code>P(Overcast |Yes)</code> = 4/9 = 0.44

Put Prior and Posterior probabilities in equation (1)

<code>P (Yes | Overcast)</code> = 0.44 * 0.64 / 0.29 = 0.98(Higher)

Similarly, you can calculate the probability of not playing:

<b>Probability of not playing:</b>

<code>P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast)</code> .....................(2)

Calculate Prior Probabilities:

<code>P(Overcast)</code> = 4/14 = 0.29

<code>P(No)</code>= 5/14 = 0.36

Calculate Posterior Probabilities:

<code>P(Overcast |No)</code> = 0/9 = 0

Put Prior and Posterior probabilities in equation (2)

<code>P (No | Overcast)</code> = 0 * 0.36 / 0.29 = 0

==> The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast than players will play the sport.

### Naive Bayes Classifier Building
#### Defining Dataset
The first two are features(weather, temperature) and the other is the label.

In [1]:
# Assigning features and label variables
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']

play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

#### Encoding Features
Need to convert these string labels into numbers. for example: 'Overcast', 'Rainy', 'Sunny' as 0, 1, 2.

In [2]:
# Import LabelEncoder
from sklearn import preprocessing
#creating labelEncoder
le = preprocessing.LabelEncoder()
# Converting string labels into numbers.
weather_encoded=le.fit_transform(weather)
print(weather_encoded)

[2 2 0 1 1 1 0 2 2 1 2 0 0 1]


Similarly, encode temp and play

In [3]:
#temp and play data set
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play)
print ("Temp:",temp_encoded)
print ("Play:",label)

Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


Combine both the features (weather and temp) in a single variable (list of tuples).

In [4]:
#combine weather and temp into tuple

features=list(zip(weather_encoded,temp_encoded))
print(features)

[(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]


### Generate model by:
<ul>
    <li>create naive bayes classifier</li>
    <li>fit dataset on classifier</li>
    <li>perform prediction</li>
</ul>


In [5]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(features,label)

#Predict Output: can play or not if it is overcast and mild
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print("Predicted Value:", predicted)

Predicted Value: [1]


1 indicate "Can Play"

### Naive Bayes with Multiple Label
#### Loading the dataset
Our dataset that consist of the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.

Dataset comprises of 13 features (alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids, nonflavanoid_phenols, proanthocyanins, color_intensity, hue, od280/od315_of_diluted_wines, proline) and type of wine cultivar. This data has three type of wine Class_0, Class_1, and Class_3.

In [6]:
#Loading data
#Import scikit-learn dataset library
from sklearn import datasets

wine = datasets.load_wine()

#### Exploring Data

In [7]:
# print the names of the 13 features
print("Features: ", wine.feature_names)

# print the label type of wine(class_0, class_1, class_2)
print("\nLabels: ", wine.target_names)

Features:  ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

Labels:  ['class_0' 'class_1' 'class_2']


In [8]:
# print data(feature)shape
wine.data.shape

(178, 13)

In [9]:
# print the wine data features (top 5 records)
print(wine.data[0:5])

[[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
  2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 1.120e+01 1.000e+02 2.650e+00 2.760e+00
  2.600e-01 1.280e+00 4.380e+00 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 1.860e+01 1.010e+02 2.800e+00 3.240e+00
  3.000e-01 2.810e+00 5.680e+00 1.030e+00 3.170e+00 1.185e+03]
 [1.437e+01 1.950e+00 2.500e+00 1.680e+01 1.130e+02 3.850e+00 3.490e+00
  2.400e-01 2.180e+00 7.800e+00 8.600e-01 3.450e+00 1.480e+03]
 [1.324e+01 2.590e+00 2.870e+00 2.100e+01 1.180e+02 2.800e+00 2.690e+00
  3.900e-01 1.820e+00 4.320e+00 1.040e+00 2.930e+00 7.350e+02]]


In [10]:
# print the wine labels (0:Class_0, 1:class_2, 2:class_2)
print(wine.target)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


#### Splitting Data
Seperate the columns into dependent and independent variables then split those variables into train and test set with 70% training and 30% test

In [12]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3,random_state=109)
print("X_train", X_train)
print("X_test", X_test)
print("y_train", y_train)
print("y_test", y_test)

X_train [[1.323e+01 3.300e+00 2.280e+00 ... 5.600e-01 1.510e+00 6.750e+02]
 [1.384e+01 4.120e+00 2.380e+00 ... 5.700e-01 1.640e+00 4.800e+02]
 [1.220e+01 3.030e+00 2.320e+00 ... 6.600e-01 1.830e+00 5.100e+02]
 ...
 [1.362e+01 4.950e+00 2.350e+00 ... 9.100e-01 2.050e+00 5.500e+02]
 [1.336e+01 2.560e+00 2.350e+00 ... 7.000e-01 2.470e+00 7.800e+02]
 [1.439e+01 1.870e+00 2.450e+00 ... 1.020e+00 3.580e+00 1.290e+03]]
X_test [[1.330000e+01 1.720000e+00 2.140000e+00 1.700000e+01 9.400000e+01
  2.400000e+00 2.190000e+00 2.700000e-01 1.350000e+00 3.950000e+00
  1.020000e+00 2.770000e+00 1.285000e+03]
 [1.293000e+01 3.800000e+00 2.650000e+00 1.860000e+01 1.020000e+02
  2.410000e+00 2.410000e+00 2.500000e-01 1.980000e+00 4.500000e+00
  1.030000e+00 3.520000e+00 7.700000e+02]
 [1.221000e+01 1.190000e+00 1.750000e+00 1.680000e+01 1.510000e+02
  1.850000e+00 1.280000e+00 1.400000e-01 2.500000e+00 2.850000e+00
  1.280000e+00 3.070000e+00 7.180000e+02]
 [1.253000e+01 5.510000e+00 2.640000e+00 2.500000

#### Model Generation
After splitting, you will generate a random forest model on the training set and perform prediction on test set features.

In [13]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
gnb = GaussianNB()

#Train the model using the training sets
gnb.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = gnb.predict(X_test)
print(y_pred)

[0 0 1 2 0 1 0 0 1 0 2 2 2 2 0 1 1 0 0 1 2 1 0 2 0 0 1 2 0 1 2 1 1 0 1 1 0
 2 2 0 2 1 0 0 0 2 2 0 1 1 2 0 0 2]


#### Evaluating model
After model generation, check the accuracy using actual and predicted values.

In [39]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

print(y_test)
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

[0 0 1 2 0 1 0 1 1 0 1 1 2 2 0 1 1 0 0 1 2 1 0 2 0 0 1 2 0 1 2 1 1 0 1 1 0
 2 2 0 2 0 0 0 0 2 2 0 1 1 2 1 0 2]
Accuracy: 0.9074074074074074


### Advantage
<ul>
    <li>It is not only a simple approach but also a fast and accurate method for prediction.</li>
    <li>Naive Bayes has very low computation cost.</li>
    <li>It can efficiently work on a large dataset.</li>
    <li>It performs well in case of discrete response variable compared to the continuous variable.</li>
    <li>It can be used with multiple class prediction problems.</li>
    <li>It also performs well in the case of text analytics problems.</li>
<li>When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression.</li>
</ul>

### Disadvantage
<ul>
    <li>The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent.</li>
    <li>If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem.</li>
</ul>