Q1. (Gaussian Naïve Bayes Classifier)
Implement Gaussian Naïve Bayes Classifier on the Iris dataset from sklearn.datasets using

 (i) Step-by-step implementation

 (ii) In-built function

(i) Step-by-step implementation

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 1: Calculate prior probabilities
classes = np.unique(y_train)
priors = {c: np.mean(y_train == c) for c in classes}

# Step 2: Compute mean and variance for each feature per class
means = {c: X_train[y_train == c].mean(axis=0) for c in classes}
variances = {c: X_train[y_train == c].var(axis=0) for c in classes}

# Step 3: Gaussian probability density function
def gaussian_prob(x, mean, var):
    eps = 1e-9  # avoid division by zero
    coeff = 1 / np.sqrt(2 * np.pi * var + eps)
    exponent = np.exp(-((x - mean) ** 2) / (2 * var + eps))
    return coeff * exponent

# Step 4: Prediction
def predict(X):
    preds = []
    for x in X:
        posteriors = []
        for c in classes:
            likelihood = np.prod(gaussian_prob(x, means[c], variances[c]))
            posterior = np.log(priors[c]) + np.log(likelihood + 1e-9)
            posteriors.append(posterior)
        preds.append(classes[np.argmax(posteriors)])
    return np.array(preds)

# Step 5: Evaluate
y_pred = predict(X_test)
print("Accuracy (manual GNB):", accuracy_score(y_test, y_pred))

Accuracy (manual GNB): 1.0


(ii) In-built function

In [2]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

print("Accuracy (in-built GNB):", accuracy_score(y_test, y_pred))

Accuracy (in-built GNB): 1.0


Q2. Explore about GridSearchCV toot in scikit-learn.

This is a tool that is often used for tuning hyperparameters of machine learning models.

Use this tool to find the best value of K for K-NN Classifier using any dataset.

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define model and parameter grid
knn = KNeighborsClassifier()
param_grid = {'n_neighbors': list(range(1, 21))}

# Grid search with 5-fold cross-validation
grid = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best K:", grid.best_params_['n_neighbors'])
print("Best Cross-Validation Accuracy:", grid.best_score_)

# Test accuracy using best model
best_knn = grid.best_estimator_
y_pred = best_knn.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred))

Best K: 3
Best Cross-Validation Accuracy: 0.9583333333333334
Test Accuracy: 1.0
