# Classification using Bayesian Classifier 

### Bayesian Classifiers
These classifiers are "probabilistic classifiers" based on Bayes' theorem. Bayesian classifiers are highly scalable. They are often used when dimensionality of the inputs is high. 

### Types
1. Naïve Bayes
2. Bayesian Belief Network

### Problem Statement

UCI dataset: Skin Segmentation Data Set (https://archive.ics.uci.edu/ml/machine-learning-databases/00229/).
The Skin Segmentation dataset is constructed over the B, G, R color space. Skin and Nonskin dataset is generated using skin textures from face images of people with diverse age, gender, and race. The task is to identify whether the BGR combination is a skin color or not.


In [133]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support

Following code reads a text file into a dataframe. Modify the below code to split the values of B, G, R and class label into different columns. Each column must have a column name as specified below:

```
Column No.   Expected Column Name
1            BLUE
2            GREEN
3            RED
4            RESULT     
```

In [134]:
df = pd.read_csv(
    filepath_or_buffer='data/Skin_NonSkin.txt', sep='\t', names=['BLUE', 'GREEN', 'RED', 'RESULT'])
df[:5]

Unnamed: 0,BLUE,GREEN,RED,RESULT
0,74,85,123,1
1,73,84,122,1
2,72,83,121,1
3,70,81,119,1
4,70,81,119,1


Write some code to define X and y dataframes containing R G B components in X and the class in y. Then these will be used to split the data into test / train data. We will be using the Test-Train Split in order to calculate the accuracy of a classification model.

In [135]:
# Write your code here
X = df[['BLUE','GREEN','RED']]
#X = df.iloc[:,0:3]
y = df[['RESULT']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

The code in the next cell is used to classify the test data by following the steps below:
    1. Import Gaussian Naïve Bayes Classifier
    2. Fit the model with training data (X: attributes and y:labels)
    3. Use the trained model to predict labels of test data (X_test)
    4. Calculate the accuracy score using actual labels (y_test) and predicted labels (y_pred)

In [136]:
gnb = GaussianNB()
y_pred = gnb.fit(X_train, np.ravel(y_train)).predict(X_test)
accuracy_score(np.ravel(y_test), y_pred)

0.9239206724883702

Write some code to calculate Precision, Recall and F-score of the results obtained using the given GaussianNB classification model.


In [137]:
#y_test_array = np.array(y_test['RESULT'])
precision, recall, f_score, _ = precision_recall_fscore_support(np.ravel(y_test), y_pred, average='binary', pos_label=1)
print("precision for skin color value as 1 = ", precision)
print("recall for skin color value as 1 = ", recall)
print("f_score for skin color value as 1 = ", f_score)
print(" ------------------------------------------------------ ")
precision, recall, f_score, _ = precision_recall_fscore_support(np.ravel(y_test), y_pred, average='binary', pos_label=2)
print("precision for skin color value as 2 = ", precision)
print("recall for skin color value as 2 = ", recall)
print("f_score for skin color value as 2 = ", f_score)

precision for skin color value as 1 =  0.8738865447726207
recall for skin color value as 1 =  0.7375751820196265
f_score for skin color value as 1 =  0.7999656667095834
 ------------------------------------------------------ 
precision for skin color value as 2 =  0.9344664031620553
recall for skin color value as 2 =  0.9723416068601041
f_score for skin color value as 2 =  0.953027844682502


Write some code to classify X_train and y_train using Multinomial Naive Bayes Classifier or Bernoulli Naive Bayes Classifier. Calculate the accuracy, precision, recall and f1-score values for your trained model. Use the scikit learn library for this task.

In [148]:
bnb = BernoulliNB(alpha=1.0)
y_pred = bnb.fit(X_train, np.ravel(y_train)).predict(X_test)
#print(y_pred[1500:2500])
precision, recall, f_score, _ = precision_recall_fscore_support(np.ravel(y_test), y_pred, average='binary', pos_label=1)
print("precision for skin color value as 1 = ", precision)
print("recall for skin color value as 1 = ", recall)
print("f_score for skin color value as 1 = ", f_score)
print(" ------------------------------------------------------ ")
precision, recall, f_score, _ = precision_recall_fscore_support(np.ravel(y_test), y_pred, average='binary', pos_label=2)
print("precision for skin color value as 2 = ", precision)
print("recall for skin color value as 2 = ", recall)
print("f_score for skin color value as 2 = ", f_score)

precision for skin color value as 1 =  0.0
recall for skin color value as 1 =  0.0
f_score for skin color value as 1 =  0.0
 ------------------------------------------------------ 
precision for skin color value as 2 =  0.7937484697625071
recall for skin color value as 2 =  1.0
f_score for skin color value as 2 =  0.8850164704169473


  'precision', 'predicted', average, warn_for)
