For Gaussian Naive Bayes classifier, the hyperparameter to tune is the var_smoothing parameter, which controls the amount of smoothing applied to the data. We use the logspace function from NumPy to generate a range of values for var_smoothing. The rest of the code is similar to the previous examples, where we split the data into training and testing sets, define the model, set up a hyperparameter grid to search over using RandomizedSearchCV, fit the model to the training data, and evaluate the performance of the best model on the test data.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
df=pd.read_excel('RE_Data.xlsx')

In [3]:
df = df.drop(df.columns[df.columns.str.contains('unnamed', case=False)], axis=1)

In [4]:
# Shuffle the DataFrame
df_shuff = df

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
var_columns = [c for c in df if c not in ['ph','ph_labels']]

X = df.loc[:,var_columns].values
y = df.loc[:,'ph_labels'].values

In [7]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.metrics import classification_report
import numpy as np

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the Gaussian Naive Bayes classifier model
model = GaussianNB()

# Define the hyperparameter grid to search over
param_dist = {
    'var_smoothing': np.logspace(-12, -2, 100)
}

# Define the randomized search object
search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=50, cv=5, random_state=42)

# Fit the randomized search object to the training data
search.fit(X_train, y_train)

y_pred = search.predict(X_test)
accuracy_train = search.score(X_train, y_train)
accuracy_test = search.score(X_test, y_test)

report_test = classification_report(y_test, y_pred)

print("Gaussian Naive Bayes Classifier train accuracy:", accuracy_train)
print("Gaussian Naive Bayes Classifier test report:\n{}\nAccuracy test: {:.3f}\n".format(report_test, accuracy_test))
print("Gaussian Naive Bayes Classifier best params:\n{}\n".format(search.best_params_))


Gaussian Naive Bayes Classifier train accuracy: 0.4258571428571429
Gaussian Naive Bayes Classifier test report:
              precision    recall  f1-score   support

           0       0.34      0.15      0.21      3818
           1       0.37      0.79      0.50      3701
           2       0.47      0.38      0.42      3719
           3       0.60      0.36      0.45      3762

    accuracy                           0.42     15000
   macro avg       0.44      0.42      0.40     15000
weighted avg       0.44      0.42      0.39     15000

Accuracy test: 0.419

Gaussian Naive Bayes Classifier best params:
{'var_smoothing': 0.004977023564332114}

