# COURSE:   PGP [AI&ML]

## Learner :  Chaitanya Kumar Battula
## Module  : Machine Learning
## Topic   : Naive Bayes Algorithm For Text Classification

## Advantages 

* Easier to build and understand
* Faster than other algorithm
* Easily scalable
* Popular choice of text classification problem

## Applications 

* Real work applications (apps) that are required to respond to user's requests immediately
* Other - Filtering Spam, Classifying documents, sentiment prediction

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split


# 1. Gauassian Naive Bayes 

* When variables are continuous
* Assumed a normal distribution of variables

### Load the data

In [None]:
iris_data = pd.read_csv('data/Iris.csv')
iris_data.head()

In [None]:
#separate features and target variable
x = iris_data.drop(['Id', 'Species'], axis=1)
y= iris_data['Species']

In [None]:
x

In [None]:
y

### Create train and test sets

In [None]:
#create train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=56)

### Implement Gaussian Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB
naive_bayes = GaussianNB()

In [None]:
#train the model and make predictions
naive_bayes.fit(x_train, y_train)

In [None]:
predictions = naive_bayes.predict(x_test)
predictions[:5]

In [None]:
# calculate accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, predictions)

In [None]:
pd.crosstab(y_test, predictions)

# 2. Multinomial Naive Bayes

* When the features represent frequency
* Ignores non-occurences of features
* Works with text classification problems

### Load the dataset

In [None]:
tweets_data = pd.read_csv('data/tweets.csv')
tweets_data.head()

In [None]:
tweets_data['label'].value_counts(normalize=True)

In [None]:
#separate features and target variable
x = tweets_data['tweet']
y = tweets_data['label']

### Create train and test sets

In [None]:
#create train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=56)

### Create bag-of-words

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
count_vector = CountVectorizer(stop_words = 'english')

In [None]:
# Fit the training data
training_data = count_vector.fit_transform(x_train)
# Transform testing data
testing_data = count_vector.transform(x_test)

### Implement Multinomial Naive Bayes

In [None]:
from sklearn.naive_bayes import MultinomialNB
naive_bayes = MultinomialNB()

In [None]:
#train model and make predictions
naive_bayes.fit(training_data, y_train)

In [None]:
predictions = naive_bayes.predict(testing_data)
predictions[:5]

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, predictions)

In [None]:
pd.crosstab(y_test, predictions)

# 3. Bernoulli Naive Bayes

* Binary Features
* Penalize non-occurence of features

### Load the dataset

In [None]:
tweets_data = pd.read_csv('data/tweets.csv')
tweets_data.head()

In [None]:
#separate features and target variable
x = tweets_data['tweet']
y = tweets_data['label']

### Create train and test sets

In [None]:
#create train and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

In [None]:
# Generating Binary Features using countvectorizer
count_vector = CountVectorizer(stop_words = 'english', binary=True)

In [None]:
# Fit the training data 
training_data = count_vector.fit_transform(x_train)

# Transform testing data
testing_data = count_vector.transform(x_test)

### Implement Bernoulli Naive Bayes

In [None]:
from sklearn.naive_bayes import BernoulliNB
naive_bayes = BernoulliNB()

In [None]:
naive_bayes.fit(training_data, y_train)

In [None]:
predictions = naive_bayes.predict(testing_data)
predictions[:5]

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, predictions)

In [None]:
pd.crosstab(y_test, predictions)

**Pros**

* It is easy and fast to predict the class of the test data set. It also performs well in multi-class prediction
* When the assumption of independence holds, a Naive Bayes classier performs bettercompared to other models like logistic regression and you need less training data.
* It performs well in case of categorical input variables compared to numerical variable(s).For a numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).



**Cons**

* If a categorical variable has a category (in the test data set), which was not observed in training data set, then the model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the
smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
* On the other side, Naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
* Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent

**Applications of Naive Bayes**

__There are 4 main applications of this popular and interesting algorithm:__

* **Real-time Prediction :** Naive Bayes is an eager learning classier and it is sure fast. Thus, it could be used for making predictions in real-time.
* **Multi-class Prediction :** This algorithm is also well known for multi-class prediction feature. Here we can predict the probability of multiple classes of the target variable.
* **Text classication/ Spam Filtering/ Sentiment Analysis :** Naive Bayes classiers mostly used in text classication (due to better result in multi-class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam ltering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
* **Recommendation System :** Naive Bayes Classier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to lter unseen information and predict whether a user would like a given resource or not

**Improve your Naive Bayes Model**

__Tips for improving the power of Naive Bayes Model:__

* If continuous features do not have a normal distribution, we should use transformation or different methods to convert it to a normal distribution.
<br> 

* If the test data set has zero frequency issue, apply smoothing techniques “Laplace Correction” to predict the class of test data set. 
<br> 

* Remove correlated features, as the highly correlated features are voted twice in the model and it can lead to over-inating importance.
<br>

* Naive Bayes classiers have limited options for parameter tuning like alpha=1 for smoothing, t_prior=[True|False] to learn class prior probabilities or not and some other options. We would recommend focusing on your pre-processing of data and the feature selection. 
<br>

* Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB
<br>

* You might think to apply some classier combination technique like ensembling, bagging and boosting but these methods would not help. Actually, “ensembling, boosting, bagging” won’t help since their purpose is to reduce variance. Naive Bayes has no variance to minimize.