# Naïve Bayes with and without Feature Scaling

Compare the accuracy of the model before and after scaling the features.

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gnb_unscaled = GaussianNB()
gnb_unscaled.fit(X_train, y_train)
y_pred_unscaled = gnb_unscaled.predict(X_test)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

gnb_scaled = GaussianNB()
gnb_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = gnb_scaled.predict(X_test_scaled)

accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)
conf_matrix_unscaled = confusion_matrix(y_test, y_pred_unscaled)
report_unscaled = classification_report(y_test, y_pred_unscaled)


accuracy_scaled = accuracy_score(y_test, y_pred_scaled)
conf_matrix_scaled = confusion_matrix(y_test, y_pred_scaled)
report_scaled = classification_report(y_test, y_pred_scaled)

accuracy_unscaled, conf_matrix_unscaled, report_unscaled, accuracy_scaled, conf_matrix_scaled, report_scaled


(1.0,
 array([[10,  0,  0],
        [ 0,  9,  0],
        [ 0,  0, 11]]),
 '              precision    recall  f1-score   support\n\n           0       1.00      1.00      1.00        10\n           1       1.00      1.00      1.00         9\n           2       1.00      1.00      1.00        11\n\n    accuracy                           1.00        30\n   macro avg       1.00      1.00      1.00        30\nweighted avg       1.00      1.00      1.00        30\n',
 1.0,
 array([[10,  0,  0],
        [ 0,  9,  0],
        [ 0,  0, 11]]),
 '              precision    recall  f1-score   support\n\n           0       1.00      1.00      1.00        10\n           1       1.00      1.00      1.00         9\n           2       1.00      1.00      1.00        11\n\n    accuracy                           1.00        30\n   macro avg       1.00      1.00      1.00        30\nweighted avg       1.00      1.00      1.00        30\n')

🧠 Why are results identical?

The Iris dataset is well-behaved, with:
Numerical (continuous) features.
Well-separated classes.
Similar scales for all features.
Gaussian Naïve Bayes assumes features are normally distributed. It models each feature using its mean and variance within each class.

Since these are scale-invariant, scaling the data doesn't change much — it just shifts the mean and rescales the variance, which GNB can already handle!