# Naive Bayes classifiers

## Naive Bayes with categorical feature from scratch

The features are assumed to be generated from a simple multinomial distribution.
The multinomial distribution describes the probability of observing counts among a number of categories, and thus multinomial naive Bayes is most appropriate for features that represent counts or count rates.
We aim at modeling the data distribution with a best-fit multinomial distribution.

In [1]:
"""

First let us design a simple dataset taking into account weather related features:
    - outlook in {'sunny', 'overcast', 'rainy'}
    - temp: level of temperature {'hot','cool','mild'}
    - humidity: level of humidity {'high', 'normal'}
    - windy: either it is windy are not {'yes','no'}
Then we aim at predicting either a tennis match can be played or not. "play" in {'yes','no'}.


"""
import seaborn as sns
import pandas as pd
from sklearn import preprocessing

outlook=['sunny', 'overcast', 'rain']
temperature=['hot', 'mild', 'cold']
humidity=['high', 'normal']
wind=['yes', 'no']
play=['yes', 'no']

dataset=[["outlook", "temp", "humidity", "windy", "play"],
    ["sunny", "hot", "high", "no", "no"],
    ["sunny", "hot", "high", "yes", "no"],
    ["overcast", "hot", "high", "no", "yes"],
    ["rainy", "mild", "high", "no", "yes"],
    ["rainy", "cool", "normal", "no", "yes"],
    ["rainy", "cool", "normal", "yes", "no"],
    ["overcast", "cool", "normal", "yes", "yes"],
    ["sunny", "mild", "high", "no", "no"],
    ["sunny", "cool", "normal", "no", "yes"],
    ["rainy", "mild", "normal", "no", "yes"],
    ["sunny", "mild", "normal", "yes", "yes"],
    ["overcast", "mild", "high", "yes", "yes"],
    ["overcast", "hot", "normal", "no", "yes"],
    ["rainy", "mild", "high", "yes", "no"]]

df = pd.DataFrame(dataset[1:], columns=dataset[0])
df.head()

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,no,no
1,sunny,hot,high,yes,no
2,overcast,hot,high,no,yes
3,rainy,mild,high,no,yes
4,rainy,cool,normal,no,yes


 [To do Students]: Complete the following functions

In [2]:

def prior_probability(play_outcome):
    """
    input:
        - play_outcome: string taking values in ['yes','no']
    output:
        - prior probability P(play = play_outcome) 
    """
    m = 0
    for i in df['play']:
        if i==play:
            m += 1
    return m/len(df['play'])

def likelihood(feature_name, feature_value, play_outcome):
    """
    inputs: 
        feature_name: string with values in df column names
        feature_value: given value of the variable corresponding to feature_name
        play_outcome: outcome of target variable "play" 
    output:
        Compute the conditional probability P(feature_name = feature_value|play= play_outcome)
    """
    m=0
    for i, j in zip(df[feature_name],df['play']):
        if j == play and i == feature_value:
            m += 1
    return m/len(df['play'])

def predict_play_outcome(outlook,temp,humidity,windy):
    """
    inputs:
        outlook: value of outlook for a given observation 
        temp: value of outlook for a given observation 
        humidity: value of outlook for a given observation 
        windy: value of outlook for a given observation 
    Outputs:
        predicted label by multinomial naive bayes for the given observation (outlook,temp,humidity,windy)
"""
    P_yes= likelihood(['outlook', 'temp', 'humidity', 'windy'], df.iloc[:,:-1], df.iloc[:,4:]) * prior_probability(df.iloc[:,4:])
    P_no= likelihood(['outlook', 'temp', 'humidity', 'windy'], df.iloc[:,:-1], df.iloc[:,4:]) * prior_probability(df.iloc[:,4:])
    if P_yes >= 0.5:
        return 'yes'
    else:
        return 'no'
predict_play_outcome('sunny', 'cool', 'high', 'yes')  

'no'

## The same using sklearn library

All features are categorical, we need to use Multinomial Naive Bayes.

* Step 1: encode the feature in categories (MultinomialNB doesn't work with string)
* Step 2: fit MultinomialNB
* Step 3: predict with MultinomialNB

In [3]:
from sklearn.preprocessing import LabelEncoder

# instantiate labelencoder object
le = LabelEncoder()

# fit_transform the dataset
df_enc = df.apply(lambda col: le.fit_transform(col)) # I fit and transform the training data to encode 

df_enc.head()

Unnamed: 0,outlook,temp,humidity,windy,play
0,2,1,0,0,0
1,2,1,0,1,0
2,0,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1


[To do Students]:
- Learn multinomial Naive Bayes using sklearn without and with Laplace smoothing. Plus detail available model hyperparameters of the sklearn implementation.
- Compare predicted probabilities and label on df_test
- Design an example to highlight importance of Laplace smoothing.


In [4]:
from sklearn.naive_bayes import MultinomialNB



In [5]:
# Build new item for prediction
df_test = pd.DataFrame(data=[['sunny', 'cool', 'high', 'yes']],
                    columns=['outlook', 'temp', 'humidity', 'windy'])
# Encode it and make prediction
df_test_enc= df_test.apply(le.transform) 




ValueError: y contains previously unseen labels: 'sunny'

In [None]:
from sklearn import preprocessing
encoders={}
for i in df.columns:
    encoders[i]= preprocessing.LabelEncoder()
    df[i]= encoders[i].fit_transform(df[i])
for i in df.columns:
    df_test_enc[i]=encoders[i].transform(df_test[i])


In [None]:
# 2. instantiate a Multinomial Naive Bayes model
nb = MultinomialNB()
nbmodel= nb.fit(df_enc.iloc[:,:4],df_enc.iloc[:,4:5])
y_predicted= nbmodel.predict(df_test_enc)
y_predicted

## Multinomial Naive Bays: text classification


One place where multinomial naive Bayes is often used is in text classification, where the features are related to word counts or frequencies within the documents to be classified.
We discussed the extraction of such features from text in [Feature Engineering](05.04-Feature-Engineering.ipynb); here we will use the sparse word count features from the 20 Newsgroups corpus to show how we might classify these short documents into categories.

Let's download the data and take a look at the target names:

In [None]:
from sklearn.datasets import fetch_20newsgroups

data = fetch_20newsgroups()
data.target_names

For simplicity here, we will select just a few of these categories, and download the training and testing set:

In [None]:
categories = ['talk.religion.misc', 'soc.religion.christian',
              'sci.space', 'comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)

In [None]:
print(train.data[5])

In order to use this data for machine learning, we need to be able to convert the content of each string into a vector of numbers.
For this we will use the well-known TF-IDF vectorizer, and create a pipeline that attaches it to a multinomial naive Bayes classifier:

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

model = make_pipeline(TfidfVectorizer(), MultinomialNB())

With this pipeline, we can apply the model to the training data, and predict labels for the test data:

In [None]:
model.fit(train.data, train.target)
labels = model.predict(test.data)

Now that we have predicted the labels for the test data, we can evaluate them to learn about the performance of the estimator.
For example, here is the confusion matrix between the true and predicted labels for the test data:

In [None]:
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
mat = confusion_matrix(test.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
            xticklabels=train.target_names, yticklabels=train.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

Evidently, even this very simple classifier can successfully separate space talk from computer talk, but it gets confused between talk about religion and talk about Christianity.
This is perhaps an expected area of confusion!

The very cool thing here is that we now have the tools to determine the category for *any* string, using the ``predict()`` method of this pipeline.
Here's a quick utility function that will return the prediction for a single string:

In [None]:
def predict_category(s, train=train, model=model):
    pred = model.predict([s])
    return train.target_names[pred[0]]

In [None]:
predict_category('sending a payload to the ISS')

In [None]:
predict_category('determining the screen resolution')

Remember that this is nothing more sophisticated than a simple probability model for the (weighted) frequency of each word in the string; nevertheless, the result is striking.
Even a very naive algorithm, when used carefully and trained on a large set of high-dimensional data, can be surprisingly effective.

## Naive Bayes with continuous data from scratch

For continuous data we assume the conditional probabilities P(X|Y) to follow gaussian distributions. 

In [None]:
''' Define the dataset '''
import numpy as np

Yes=[25.2,19.3,18.5,21.7,20.1,24.3,22.8,23.1,19.8]
No=[27.3,30.1,17.4,29.5,15.1]

[TO DO STUDENTS]

- Implement the computation of first and second moments of the continuous feature for each class.
- Implement the computation of the posterior probability P(Y|X)

In [None]:
#the first moment of the features [Yes, NO]
from scipy.stats import moment
#the first moment of the features 
print(moment(Yes,moment=1),moment(No,moment=1)) # I used the moment method to compute 
#the second moment of the features 
print(moment(Yes,moment=2),moment(No,moment=2))

In [None]:
''' Calculate the posterior probability P(Y|X) '''
def P(x, y=True):
    posterior_probability= len(x)/(len(x)+len(y))
    return posterior_probability


In [None]:
P(Yes,No), P(No, Yes)

In [None]:
''' Plot the boundaries '''
import matplotlib.pyplot as plt
import math
%matplotlib inline

x_min, x_max = min(Yes+No)-1, max(Yes+No)+1

plt.plot(Yes, np.zeros_like(Yes),'ro')
plt.plot(No, np.zeros_like(No),'bo')

xx=np.linspace(x_min,x_max,100)
# zz=[1 if P(x, True)>=P(x, False) else 0 for x in xx]

# plt.contourf(xx, [-0.05, 0.05], [zz, zz], alpha=0.3);

## Naive Bayes with continous using sklearn library

Data are continous, we use Gaussian Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB

X = np.array([Yes + No]).reshape(-1, 1)
y = [1]*len(Yes)+[0]*len(No)
# my_y= np.array(y)
"""
[TO DO STUDENTS]
Learn a Gaussian Naive Bayes classifier using sklearn implementation on this data (X,y)
"""
clf = GaussianNB(priors=None, var_smoothing=0.08)
nb= clf.fit(X, y)

In [None]:
xx=np.linspace(x_min,x_max,1000).reshape(-1, 1)
zz = nb.predict(xx).tolist()

plt.plot(Yes, np.zeros_like(Yes),'ro')
plt.plot(No, np.zeros_like(No),'bo')
plt.contourf(xx.ravel().tolist(), [-0.05, 0.05], [zz, zz], alpha=0.3);