### CS559 - Homework #2

**Author**: Sid Bhatia

**Date**: September 18th, 2024

**Pledge**: I pledge my honor that I have abided by the Stevens Honor System.

**Professor**: Dr. In Suk Jang

#### 1. Naive Bayes Classification [40 pts]

Use the following code to generate the train data set. The code will generate a random data set with four features and classes.

```python
from sklearn import datasets
X, y = datasets.make_blobs(n_samples = 400, n_features = 5, centers = 4, cluster_std = 2, random_state = 100)
```

a. [5pts] Compute the prior probability of each class, $p(C_k), \; \forall \, k = 1, ..., 4$.

In [10]:
from sklearn import datasets
import numpy as np

X, y = datasets.make_blobs(n_samples = 400, n_features = 5, centers = 4, cluster_std = 2, random_state = 100)

unique_classes, counts = np.unique(y, return_counts=True)
prior_probabilities = counts / len(y)

print(prior_probabilities)

[0.25 0.25 0.25 0.25]


b. [10pts] Compute the likelihood $p(X \mid C_k), \; \forall \, 1, ..., 4.$

In [None]:
from scipy.stats import multivariate_normal

class_means = {}
class_covariances = {}
class_likelihoods = {}

for cls in unique_classes:

    X_cls = X[y == cls]
    
    mean_cls = np.mean(X_cls, axis=0)
    class_means[cls] = mean_cls
    
    cov_cls = np.cov(X_cls, rowvar=False)
    class_covariances[cls] = cov_cls
    
    likelihood_cls = multivariate_normal.pdf(X, mean=mean_cls, cov=cov_cls)
    class_likelihoods[cls] = likelihood_cls

    # print(f"\nClass {cls + 1} Mean Vector:\n{mean_cls}")
    # print(f"\nClass {cls + 1} Covariance Matrix:\n{cov_cls}")
    print(f"\nLikelihoods p(X | C_{cls + 1}):\n{likelihood_cls}")

c. [15pts] Compute the posterior probability of each point $p(C_k \mid X), \; \forall \, k = 1, ..., 4$. Assign the class ID to each point.

In [28]:
num_classes = len(unique_classes)
num_samples = X.shape[0]

posterior_probabilities = np.zeros((num_samples, num_classes))
for idx, cls in enumerate(unique_classes):
    prior = prior_probabilities[cls]
    likelihood = class_likelihoods[cls]
    posterior_probabilities[:, idx] = likelihood * prior

posterior_probabilities_sum = np.sum(posterior_probabilities, axis=1, keepdims=True)
posterior_probabilities_normalized = posterior_probabilities / posterior_probabilities_sum

predicted_classes = np.argmax(posterior_probabilities_normalized, axis=1)
predicted_class_labels = unique_classes[predicted_classes]

for i in range(5):
    print(f"\nData Point {i + 1}:")
    print(f"True Class: {y[i] + 1}")
    for idx, cls in enumerate(unique_classes):
        print(f"p(C_{cls + 1} | x_{i + 1}) = {posterior_probabilities_normalized[i, idx]:.4f}")
    print(f"Assigned Class: {predicted_class_labels[i] + 1}")


Data Point 1:
True Class: 4
p(C_1 | x_1) = 0.0000
p(C_2 | x_1) = 0.0000
p(C_3 | x_1) = 0.0000
p(C_4 | x_1) = 1.0000
Assigned Class: 4

Data Point 2:
True Class: 3
p(C_1 | x_2) = 0.0000
p(C_2 | x_2) = 0.0000
p(C_3 | x_2) = 1.0000
p(C_4 | x_2) = 0.0000
Assigned Class: 3

Data Point 3:
True Class: 1
p(C_1 | x_3) = 1.0000
p(C_2 | x_3) = 0.0000
p(C_3 | x_3) = 0.0000
p(C_4 | x_3) = 0.0000
Assigned Class: 1

Data Point 4:
True Class: 4
p(C_1 | x_4) = 0.0000
p(C_2 | x_4) = 0.0000
p(C_3 | x_4) = 0.0000
p(C_4 | x_4) = 1.0000
Assigned Class: 4

Data Point 5:
True Class: 4
p(C_1 | x_5) = 0.0000
p(C_2 | x_5) = 0.0000
p(C_3 | x_5) = 0.0000
p(C_4 | x_5) = 1.0000
Assigned Class: 4


d. [5pts] Construct the confusion matrix to show the classification rate using `sklearn.metrics.confusion_matrix`. The confusion matrix should visualize and summarize the performance of a classification algorithm.

In [31]:
from sklearn.metrics import confusion_matrix

conf_matrix = confusion_matrix(y, predicted_class_labels)

print(conf_matrix)

[[100   0   0   0]
 [  0 100   0   0]
 [  0   0 100   0]
 [  0   0   0 100]]


e. [5pts] Classify the target using `sklearn.native_bayes.GaussianNB`. Report the accuracy of the model.

In [33]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

gnb = GaussianNB()

gnb.fit(X, y)

y_pred = gnb.predict(X)

accuracy = accuracy_score(y, y_pred)

print(accuracy)

1.0


#### 2. Perceptron [30 pts]

a. [15pts] In the lecture slide, the methods needed for the perceptron algorithm are provided: `step(X)` and `perceptron_predict(w, X)`. Write a method (`Perceptron_fit(w, X, y, learning_rate, iteration`) that fits the data and returns `w`.

In [38]:
def step(X):
    return 1 if X > 0 else -1

def perceptron_predict(w, X):
    return step(np.dot(np.append(1, X), w))

def perceptron_fit(w, X, y, learning_rate, iterations):

    X_bias = np.c_[np.ones(X.shape[0]), X]
    
    for j in range(iterations):

        for k in range(len(y)):

            h = step(np.dot(X_bias[k], w))
            
            if h != y[k]:

                error = y[k] - h
                
                w += learning_rate * error * X_bias[k]
                
    return w

b. [10pts] Create a sample of $X$ in Question 1 whose $y \in \{0, 1\}$. Fit the sample data and find $w$ when the learning rate is 0.001 and the iteration is 1. The learning rate and iteration numbers can be tuned to increase the performance if necessary.

In [39]:
y_binary = np.where((y == 0) | (y == 1), 0, 1)

w_init_binary = np.zeros(6)

learning_rate_binary = 0.001
iterations_binary = 1

w_trained_binary = perceptron_fit(w_init_binary, X, y_binary, learning_rate_binary, iterations_binary)

w_trained_binary

array([ 0.036     , -0.00103401,  0.0073094 ,  0.00041277, -0.01454153,
        0.00130611])

c. [5pts] Use the final $w$ from 2(b) to classify all observations in $X$. Measure the performance and explain the success. Discuss what other tests can be done in order to improve the classification for $X$.

In [41]:
def classify(X, w):
    predictions = []
    for i in range(X.shape[0]):
        prediction = perceptron_predict(w, X[i])
        predictions.append(prediction)
    return np.array(predictions)

y_pred_binary = classify(X, w_trained_binary)

y_pred_binary = np.where(y_pred_binary > 0, 1, 0)

accuracy_binary = accuracy_score(y_binary, y_pred_binary)

print(accuracy_binary)

0.5525


The model achieves an accuracy of around 55.25%, suggesting limited success. To improve performance, several steps can be taken: increasing the number of iterations to allow better convergence, tuning the learning rate for more effective weight updates, and applying feature scaling (normalization or standardization) to handle feature magnitude differences. Additionally, considering non-linear models such as Support Vector Machines (SVM) or neural networks could help capture more complex relationships. Finally, using cross-validation to fine-tune hyperparameters can further optimize the model’s performance.