# Naive Bayes Classification using Scikit-learn
Find out at [this link](https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn).


## Classification Workflow
+ Understand the problem and identify potential features and label.
+ Features are those characteristics or attributes which affect the results of the label.
+ The classification has two phases:
    + a learning phase
    + the evaluation phase
+ Performance is evaluated on the basis of various parameters:
    + accuracy, error, precision, and recall.
    
![Classification workflow](images/nbc.webp "Classification workflow")

## What is Naive Bayes Classifier?
+ Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features.
+ Even if these features are interdependent, these features are still considered independently.
+ This assumption simplifies computation, and that's why it is considered as naive.
+ This assumption is called class conditional independence.

![Bayes Theorem Equation for Naive Bayes Classification](images/nbc_equation.webp "Bayes Theorem Equation")

## How Naive Bayes classifier works?
### First Approach (In case of a single feature)
Naive Bayes classifier calculates the probability of an event in the following steps:

+ **Step 1:** Calculate the prior probability for given class labels
+ **Step 2:** Find Likelihood probability with each attribute for each class
+ **Step 3:** Put these value in Bayes Formula and calculate posterior probability.
+ **Step 4:** See which class has a higher probability, given the input belongs to the higher probability class.

![Wheather Table](images/wheather-table-1.webp "Wheather Table")

### Second Approach (In case of multiple features)

![Wheather Table](images/wheather-table-2.webp "Wheather Table")

## Classifier Building in Scikit-learn
### Naive Bayes Classifier
#### Defining Dataset
In this example, you can use the dummy dataset with three columns: weather, temperature, and play. The first two are features(weather, temperature) and the other is the label.

In [11]:
# Assigning features and label variables

weather = ['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast',
           'Sunny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']

temp = ['Hot','Hot','Hot','Mild','Cool','Cool','Cool',
        'Mild','Cool','Mild','Mild','Mild','Hot','Mild']

play = ['No','No','Yes','Yes','Yes','No','Yes',
        'No','Yes','Yes','Yes','Yes','Yes','No']

#### Encoding Features
First, you need to convert these string labels into numbers. for example: 'Overcast', 'Rainy', 'Sunny' as 0, 1, 2. This is known as label encoding. Scikit-learn provides LabelEncoder library for encoding labels with a value between 0 and one less than the number of discrete classes.

In [20]:
# Import LabelEncoder
from sklearn.preprocessing import LabelEncoder

#creating labelEncoder
le = LabelEncoder()

# Converting string labels into numbers.
weather_encoded = le.fit_transform(weather)
print(wheather_encoded)

[2 2 0 1 1 1 0 2 2 1 2 0 0 1]


Similarly, you can also encode temp and play columns.

In [21]:
# Converting string labels into numbers

temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)

print(f'Temp: {temp_encoded}')
print(f'Play: {label}')

Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


Now combine both the features (weather and temp) in a single variable (list of tuples).

In [24]:
#Combinig weather and temp into single listof tuples

features_zip = zip(weather_encoded,temp_encoded)
features = list(features_zip)
print(features)

[(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]


#### Generating Model
Generate a model using naive bayes classifier in the following steps:
+ Create naive bayes classifier
+ Fit the dataset on classifier
+ Perform prediction

In [25]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(features, label)

#Predict Output
predicted = model.predict([[0,2]]) # 0:Overcast, 2:Mild
print(f'Predicted Value: {predicted}')

Predicted Value: [1]


Here, 1 indicates that players can 'play'.

### Naive Bayes with Multiple Labels