### Quadratic Discriminant Analysis (QDA) Classifier
1. Get the number of classes and initialize an array of zeros to store prior. [2 points]

2. Loop through classes and compute respective means, covariances and priors for each class. [3 points]
$$ $$
The mean vector for every class is simple. You take all of the data points in a given class and compute the average.
$$ \text{mean of class k} = \mu_k = \frac{1}{n_k} \sum_{i:y_i=k} x_i $$
The covariance matrix for class k with $\ n_k \$ samples is
$$ \text{covariance matrix of class k} = \Sigma_k= \frac{1}{n_k-1}\sum_{i:y_i=k} (x_i - \hat{\mu}_k)(x_i - \hat{\mu}_k)^T $$
To get the prior probability for class k, you simply count the frequency of data points in class k.
$$ prior = {\pi}_k = \frac{n_k}{n} $$ 

3. Predict the class of the test data using the three values computed above. 
Use the equation (4.28) in the textbook [5 points]: $$\delta_k(x) =  -\frac{1}{2}(x-\mu_k)^T\Sigma_k^{-1}(x-\mu_k)-\frac{1}{2}\log|\Sigma_k|+\log(\pi_k) $$
Hint: The formula can be split into 3 terms: the quadratic term, the log determinant term and the prior term.

4. Verify the accuracy and correctness of your classifier using the Breast Cancer dataset.

In [45]:
import numpy as np

class QDA:
    def fit(self, X, y):

        # Get all unique classes, use np.unique [1 point]
        'YOUR CODE'
        self.classes = np.unique(y)

        # Initialize an array of zeros to store prior probabilities for each class [1 point]
        'YOUR CODE'
        self.class_priors = np.zeros(len(self.classes))

        # Initialize lists to store mean and covariance for each class
        self.class_means = []
        self.class_covariances = []

        # Loop through all the classes
        for c in self.classes:
            # Get all samples of current class
            X_c = X[y == c]

            # Compute the prior of the current class [1 point]
            'YOUR CODE'
            #self.class_priors[c] = np.sum(X_c == c)/len(X_c)
            self.class_priors[c] = len(X_c)/len(X)

            # Compute the mean of the current class, use np.mean [1 point]
            'YOUR CODE'
            mean_c = np.mean(X_c, axis=0)
            
            # Compute the covariance matrix of the current class, use np.cov [1 point]
            'YOUR CODE'
            covariance_c = np.cov(X_c, rowvar=False)

            # Add small identity matrix to the covariance matrix to make it non-singular
            covariance_c += np.eye(len(mean_c)) * 1e-8

            self.class_means.append(mean_c)
            self.class_covariances.append(covariance_c)

    def predict(self, X):
        predictions = []
        for x in X:
            probbabilities = []

            # Compute posterior probabilities for each class, refer to equation (4.28) in the textbook
            for i, c in enumerate(self.classes):
                # compute the difference between the current sample and the mean of the current class [1 point]
                'YOUR CODE'
                diff = x - self.class_means[i]

                # compute the inverse of the covariance matrix of the current class, use np.linalg.inv() [1 point]
                'YOUR CODE'
                covariance_inv = np.linalg.inv(self.class_covariances[i])

                # compute the quadratic term [1 point]
                'YOUR CODE'
                quadratic_term = -0.5 * np.dot(np.transpose(diff), np.dot(covariance_inv, diff))

                # compute the log determinant of the covariance matrix of the current class, use np.linalg.det() [1 point]
                'YOUR CODE'
                log_det_covariance = -0.5 * np.log(np.linalg.det(self.class_covariances[i]))

                # compute the log prior of the current class, use np.log() [1 point]
                'YOUR CODE'
                log_prior = np.log(self.class_priors[i])

                # Sum the terms to get the posterior probability of the current class
                probability_of_current_class = quadratic_term + log_det_covariance + log_prior

                # Add the probability to the list of probabilities
                probbabilities.append(probability_of_current_class)

            # Classify based on the class with the highest posterior probability 
            predicted_class = np.argmax(probbabilities)
            
            predictions.append(predicted_class)

        return np.array(predictions)



In [47]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12)

qda = QDA()
qda.fit(X_train, y_train)
# Make predictions on the test set
y_pred = qda.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9415204678362573
