# Naive Bayes Test
#### Author: Darren Colby
#### Date: March 18th, 2022
### Purpose: To test the functionality of the NaiveBayes class

## Setting up the notebook for subsequent questions

In [1]:
# Imports
import numpy as np
from sklearn.model_selection import train_test_split
from NaiveBayes import NaiveBayes

In [2]:
# Load the data
naive = np.loadtxt("hw4_naive.csv", delimiter = ",", skiprows=1)

### Create the train/test split

In [3]:
# Identify features and labels
x = naive[:, :-1]
y = naive[:, -1]

# Create train-test split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=17)

### Fitting the model
The first step here is fitting our model. Unlike other supervised learning methods that rely on gradient descent to learn some weights that minimize a cost function, Naive Bayes is a probabilistic model. This means the only thing we need to do in this step is calculate the priors for each class label.

In [4]:
# Using a multinomial pdf
multinomial = NaiveBayes()

# Fit the model
multinomial.fit(y_train)

# Get predictions and posteriors
multinomial_predictions, multinomial_posteriors = multinomial.predict(x_train, y_train, x_test, 'multinomial')

In [5]:
# Using a gaussian pdf
gaussian = NaiveBayes()

# Fit the model
gaussian.fit(y_train)

# Get predictions and posteriors
gaussian_predictions, gaussian_posteriors = gaussian.predict(x_train, y_train, x_test, 'gaussian')

### Evaluating the model
No matter how complicated a model is, if it does not perform well, there is no sense in using it. Therfore, it would be useful to know some metrics like the number of true positives, true negatives, false positives, false negatives, accuracy, precision, recall, and F1 score, which the function below calculates.

In [6]:
def evaluate(y_actual, y_pred):
    """Calculates true positives, true negatives, false positives, false negatives, accuracy, precision, recall, and F1 score
       ----------------------------------------------------------------------------------------------------------------------
       
       Parameters:
            y_actual: actual labels
            y_pred: predicted labels
            
       Returns a tuple of the true positives, true negatives, false positives, false negatives, accuracy, precision, recall, 
           and F1 score"""
    
    # Store the class labels
    labels = np.unique(y_actual)
    
    # When there are only two class labels
    if y_actual.size == 2:
        
        # Calculate positives and negatives
        true_positive = np.sum(np.where((y_pred == 1) & (y_actual == 1)))
        true_negative = np.sum(np.where((y_pred == 0) & (y_actual == 0)))
        false_positive = np.sum(np.where((y_pred == 1) & (y_actual == 0)))
        false_negative = np.sum(np.where((y_pred == 0) & (y_actual == 1)))
    
    # When doing multiclass classification
    else:
        
        for label in labels:
        
            # Calculate positives and negatives
            true_positive = np.sum(np.where((y_pred == label) & (y_actual == label)))
            true_negative = np.sum(np.where((y_pred != label) & (y_actual != label)))
            false_positive = np.sum(np.where((y_pred == label) & (y_actual != label)))
            false_negative = np.sum(np.where((y_pred != label) & (y_actual == label)))
            
    # Calculate accuracy
    accuracy = (true_positive + true_negative) / (true_positive + true_negative + false_positive + false_negative)
    
    # Calculate precision and recall
    precision = true_positive / (true_positive + false_positive)
    recall = true_positive / (true_positive + false_negative)
    
    # Calculate the F1 score
    f1 = (2 * (precision * recall)) / (precision + recall)
        
    return true_positive, true_negative, false_positive, false_negative, accuracy, precision, recall, f1

### Using the model
Now it is time to use the model. In the first cell, we assume a multinomial distribution to train, predict, ad evaluate our model. The second cell assumes a gaussian distribution.

In [7]:
# Multinomial distribution

# Evaluate
multinomial_evaluations = evaluate(y_test, multinomial_predictions)
print("The F1 score with a multiomial distribution is " + str(multinomial_evaluations[7]))

The F1 score with a multiomial distribution is 0.8505722541329463


In [8]:
# Gaussian distribution

# Evaluate
gaussian_evaluations = evaluate(y_test, gaussian_predictions)
print("The F1 score with a Gaussian distribution is " + str(gaussian_evaluations[7]))

The F1 score with a Gaussian distribution is 0.31365711284270037
