In [None]:
# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Next buy item prediction with Naive Bayes

## Practical Implementation of Naive Bayes for Next Buy Item Prediction

- This practical implementation demonstrates the use of the Naive Bayes algorithm to predict whether users of a social network platform will make a purchase based on their profile. 
- The objective is to construct a Naive Bayes model that can accurately predict purchase behavior and to compare the performance of models - using single versus multiple features.

We use **GaussianNB** because it assumes that the features follow a Gaussian distribution. It is particularly well-suited for continuous data and is a common assumption for real-world data.


### Dataset Overview

**Dataset from: https://www.kaggle.com/datasets/akram24/social-network-ads**

**Features:**
- **User ID:** Unlikely to be predictive of the outcome, will be dropped from the dataset.
- **Gender:** Categorical feature that will be numerically encoded.
- **Age:** Continuous feature that may influence purchasing behavior.
- **EstimatedSalary:** Continuous feature representing economic power, potentially predictive of purchasing behavior.
- **Purchased:** Binary target variable indicating whether a purchase was made.

### Steps for Implementation
1. **Data Loading**

In [None]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.preprocessing import LabelEncoder

# Step 2: Load the dataset
data = pd.read_csv('data/Social_Network_Ads.csv')

2. **Data Preprocessing**

In [None]:
# Step 3: Preprocess the dataset
# Convert 'Gender' to a binary numerical variable
encoder = LabelEncoder()
data['Gender'] = encoder.fit_transform(data['Gender'])

# Drop 'User ID' as it is not a useful feature for prediction
data.drop('User ID', axis=1, inplace=True)

3. **Feature Selection**

In [None]:
# Step 4: Selecting the target variable and feature set for both single and multiple feature models
y = data['Purchased']
X_single = data[['Age']]  # Single feature
X_multiple = data[['Gender', 'Age', 'EstimatedSalary']]  # Multiple features

4. **Data Splitting**

In [None]:
# Step 5: Splitting the dataset into training and test sets for both single and multiple feature models.
X_train_single, X_test_single, y_train, y_test = train_test_split(X_single, y, test_size=0.25, random_state=0)
X_train_multiple, X_test_multiple, y_train, y_test = train_test_split(X_multiple, y, test_size=0.25, random_state=0)

5. **Model Training**

In [None]:
# Step 6: Creating a Gaussian Naive Bayes classifier and training it on the single feature dataset and on the multiple features dataset.
gnb_single = GaussianNB()
gnb_single.fit(X_train_single, y_train)

gnb_multiple = GaussianNB()
gnb_multiple.fit(X_train_multiple, y_train)

6. **Model Evaluation**

In [None]:
# Step 7: Predicting and evaluating the single feature model.
y_pred_single = gnb_single.predict(X_test_single)
print("Single Feature Model")
print(confusion_matrix(y_test, y_pred_single))
print(classification_report(y_test, y_pred_single))
print(f"Accuracy: {accuracy_score(y_test, y_pred_single)}\n")

# Predicting and evaluating the multiple feature model.
y_pred_multiple = gnb_multiple.predict(X_test_multiple)
print("Multiple Features Model")
print(confusion_matrix(y_test, y_pred_multiple))
print(classification_report(y_test, y_pred_multiple))
print(f"Accuracy: {accuracy_score(y_test, y_pred_multiple)}")

### Conclusion and Interpretation of Model Performance

Upon analyzing the performance metrics of the Naive Bayes classifier with single and multiple features, we notice an interesting outcome. Both models have yielded an accuracy score of 90%. This indicates that for the given dataset, adding more features did not significantly increase the accuracy of the model. However, there are subtle differences in other metrics that are worth noting.

For the single feature model (using 'Age' alone), we observe that while the precision for predicting a 'No' purchase (class 0) is slightly lower than that for a 'Yes' purchase (class 1), the recall for class 0 is considerably higher than that for class 1. This means the single feature model is more conservative, favoring predictions of 'No' purchase, which could be beneficial in scenarios where falsely predicting a purchase is costlier than missing a potential purchase.

The multiple features model, on the other hand, shows a balanced recall, indicating a slight improvement in identifying 'Yes' purchases. The model with multiple features is better-rounded, not significantly biased towards either class, which could be advantageous when a balance between predicting 'Yes' and 'No' purchases is desired.

In practice, the choice between a single feature and multiple features model would depend on the specific requirements of the application. If a simple model is needed, with fewer computations and the ease of interpretation, a single feature might suffice. However, if the context requires a more comprehensive analysis, incorporating multiple features could provide a nuanced understanding, even if the accuracy remains the same.

# Theoretical Introduction to Naive Bayes Classification

Naive Bayes is a probabilistic machine learning model that is used for classification tasks, the fundamental task of assigning a label to a given input sample. This model is particularly known for its simplicity and effectiveness in handling large datasets.

Naive Bayes classifiers are based on Bayes' Theorem, which uses the probabilities of events to make predictions. The 'naive' aspect of the classifier comes from the assumption that the features used to make the classification are mutually independent from each other.

### Bayes' Theorem

Bayes' Theorem is stated mathematically as:

$$
P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}
$$

Where:
- `P(A|B)` is the posterior probability of class `A` given predictor `B`.
- `P(B|A)` is the likelihood which is the probability of predictor `B` given class `A`.
- `P(A)` is the prior probability of class `A`.
- `P(B)` is the prior probability of predictor `B`.

### How Naive Bayes Works?

Naive Bayes classifier computes the posterior probability of each class based on Bayes' Theorem, and the class with the highest posterior probability is the outcome of prediction. The steps involved in this process are:

1. **Calculate the prior probabilities:** Prior probability of each class (e.g., spam or not spam) is calculated by dividing the number of samples in each class by the total number of samples.
2. **Calculate the likelihood:** Probability of each feature in each class is calculated.
3. **Compute the posterior probability for each class:** The likelihood of each class given an input sample is multiplied by the prior of that class. The result is normalized by dividing by the probability of the data.
4. **Classify the sample based on the highest posterior probability.**

Now that we have a foundational understanding of Naive Bayes classification and how it operates based on Bayes' Theorem, let's explore its application through theoretical examples. These will illustrate how the algorithm uses probability to make predictions, whether we are considering a single feature or multiple features to determine an outcome.

### Various types of Naive Bayes classifiers

After exploring how the Naive Bayes classifier works with a theoretical example considering both single and multiple features, we can further delve into the various types of Naive Bayes classifiers provided by Scikit-Learn, each suited for different data distributions:

- **GaussianNB:** Used in classification and it assumes that features follow a normal distribution. This is best when features are continuous and not categorical.
- **BernoulliNB:** Designed for binary/boolean features, this classifier is useful when your features are binary or take on only two values.
- **MultinomialNB:** This is ideal for features that are counts or count rates, commonly used in text classification.
- **ComplementNB:** An adaptation of the standard MultinomialNB that is particularly suited for imbalanced data sets.
- **CategoricalNB:** Appropriate for categorical data, this classifier assumes that each feature has its own Bernoulli distribution.

By selecting the appropriate Naive Bayes model for our data, we enhance the classifier's accuracy and reliability. Each type takes advantage of different data characteristics, which can significantly affect the outcome of our predictions. Let's apply these classifiers to our dataset and observe their performance.

# Single Feature Example: 
## Purchase Likelihood Based on Product Price Range

Initially, we'll start with a single feature example, where we focus solely on the price range of products. Imagine we have a dataset on customer purchases with respect to the price range of the products they bought. Our goal is to predict the likelihood of a customer making a subsequent purchase based on the price range of the item. In this example, we focus on the '`Budget`' price range to predict if customers will make another purchase.

Our dataset includes instances of purchases across various price ranges:

| Price Range | Made Purchase |
|:------------|:--------------|
| Budget      | Yes           |
| Premium     | No            |
| Budget      | Yes           |
| Midrange    | No            |
| Premium     | Yes           |
| Budget      | No            |
| Midrange    | Yes           |

We aim to assess the probability of a customer making another purchase after buying a '`Budget`' range item.

### Process:

- **Step 1: Calculate Prior Probabilities**
  
We begin by calculating the prior probability of both possible outcomes—making another purchase (`Yes`) and not making another purchase (`No`).

$$
P(\text{Yes}) = \frac{\text{Number of 'Yes' purchases}}{\text{Total purchases}} = \frac{4}{7}
$$

$$
P(\text{No}) = \frac{\text{Number of 'No' purchases}}{\text{Total purchases}} = \frac{3}{7}
$$

- **Step 2: Calculate Likelihood Probabilities**
  
Next, the likelihood of customers making another purchase after previously selecting a '`Budget`' range product is calculated.

$$
P(\text{Budget}|\text{Yes}) = \frac{\text{Number of 'Yes' purchases within 'Budget'}}{\text{Total 'Yes' purchases}} = \frac{2}{4}
$$

$$
P(\text{Budget}|\text{No}) = \frac{\text{Number of 'No' purchases within 'Budget'}}{\text{Total 'No' purchases}} = \frac{1}{3}
$$

- **Step 3: Calculate Posterior Probabilities**
  
Employing Bayes' Theorem, we determine the posterior probabilities for a customer making a subsequent purchase (`Yes`) and not making one (`No`), given the product was in the '`Budget`' category:

$$
P(\text{Yes}|\text{Budget}) = \frac{P(\text{Budget}|\text{Yes}) \times P(\text{Yes})}{P(\text{Budget})}
$$

$$
P(\text{No}|\text{Budget}) = \frac{P(\text{Budget}|\text{No}) \times P(\text{No})}{P(\text{Budget})}
$$

Assuming the probability of selecting a '`Budget`' item `P(Budget)` is calculated as the number of '`Budget`' purchases over the total purchases:

$$
P(\text{Budget}) = \frac{3}{7}
$$

The posterior probabilities are therefore:

$$
P(\text{Yes}|\text{Budget}) = \frac{2}{4} \times \frac{4}{7} = \frac{2}{3}
$$

$$
P(\text{No}|\text{Budget}) = \frac{1}{3} \times \frac{3}{7} = \frac{1}{3}
$$

- **Step 4: Make the Prediction**
  
We make our prediction by comparing the two posterior probabilities. With `P(Yes|Budget)` being greater than `P(No|Budget)`, we predict that customers are likely to make another purchase if their previous purchase was a '`Budget`' item

### Conclusion
  
The analysis conducted with the Naive Bayes classifier points to a higher likelihood of repeat purchases within the '`Budget`' price category. 

# Multiple Features Example
## Purchase Likelihood Based on Product Price Range and Category

Building on the single feature scenario, we'll then proceed to a more realistic situation where multiple features influence the likelihood of a customer making a purchase. In this example, we'll predict the likelihood of a customer making a next purchase based on two features: the '`Budget`' price range and '`Electronics`' category.

Consider a dataset that includes records of purchases with the added dimension of product category:

| Price Range | Category    | Made Purchase |
|-------------|-------------|---------------|
| Budget      | Electronics | Yes           |
| Premium     | Clothing    | No            |
| Budget      | Home Goods  | Yes           |
| Midrange    | Electronics | No            |
| Premium     | Home Goods  | Yes           |
| Budget      | Clothing    | No            |
| Midrange    | Clothing    | Yes           |


We want to calculate the likelihood of a customer making another purchase if the item is a '`Budget`' electronics product.

### Process:
- **Step 1: Calculate Prior Probabilities**

Firstly, we establish the prior probabilities of the two possible outcomes—making another purchase (`Yes`) and not making another purchase (`No`).

$$
P(\text{Yes}) = \frac{4}{7}
$$

$$
P(\text{No}) = \frac{3}{7}
$$

- **Step 2: Calculate Likelihood Probabilities**

We then calculate the likelihood of a customer making another purchase given the combined features of '`Budget`' price range and '`Electronics`' category.

$$
P(\text{Budget} \cap \text{Electronics}|\text{Yes}) = \frac{1}{4}
$$

$$
P(\text{Budget} \cap \text{Electronics}|\text{No}) = \frac{0}{3}
$$

- **Step 3: Calculate Posterior Probabilities**

Applying Bayes' Theorem, we derive the posterior probabilities, considering the total probability of a '`Budget Electronics`' product being purchased by summing the instances of purchases over the total number of observations:

$$
P(\text{Budget} \cap \text{Electronics}) = \frac{1}{7}
$$

The posterior probabilities are thus:

$$
P(\text{Yes}|\text{Budget} \cap \text{Electronics}) = \frac{P(\text{Budget} \cap \text{Electronics}|\text{Yes}) \times P(\text{Yes})}{P(\text{Budget} \cap \text{Electronics})} = \frac{1}{4} \times \frac{4}{7} \div \frac{1}{7} = 1
$$

$$
P(\text{No}|\text{Budget} \cap \text{Electronics}) = \frac{P(\text{Budget} \cap \text{Electronics}|\text{No}) \times P(\text{No})}{P(\text{Budget} \cap \text{Electronics})} = \frac{0}{3} \times \frac{3}{7} \div \frac{1}{7} = 0
$$

- **Step 4: Make the Prediction**

Considering our calculated probabilities, the prediction is straightforward. Since `P(Yes|Budget ∩ Electronics)` is greater than `P(No|Budget ∩ Electronics)`, we predict that customers are certain to make another purchase if they have previosly bought a '`Budget`' electronics item.

## Conclusion

The computed outcome from the Naive Bayes classifier suggests a strong likelihood of repeat purchases within the 'Budget Electronics' category.