# Tier 2. Module 2: Numerical Programming in Python

## Topic 8 - LDA, QDA algorithms in classification problems

## Homework

Application of discriminant methods for own implementation of the QDA method.

This will help to consolidate the following skills:

* matrix operations, which are reduced to software implementations of the considered methods;
* features of the QDA method.

### Task

#### 1 - Load the Iris dataset of the sklearn library

In [20]:
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target
print(X.shape)
print(y.shape)

(150, 4)
(150,)


#### 2 - Split data into training and testing

In [21]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
print(X_train.shape)
print(X_test.shape)

(112, 4)
(38, 4)


#### 3 - Use a sample of features separately for each class

In [24]:
import numpy as np

class_labels = np.unique(y_train)
print(class_labels)

X_train_class = {label: X_train[y_train == label] for label in class_labels}
print(len(X_train_class[0]))
print(len(X_train_class[1]))
print(len(X_train_class[2]))

[0 1 2]
38
37
37


#### 4 - Calculate covariance matrices for a set of features of each class

In [29]:
cov_matrices = {label: np.cov(X_train_class[label], rowvar=False) for label in class_labels}

print("Covariance matrix of the class '0'")
print(cov_matrices[0])

print("\nCovariance matrix of the class '1'")
print(cov_matrices[1])

print("\nCovariance matrix of the class '2'")
print(cov_matrices[2])

Covariance matrix of the class '0'
[[0.09348506 0.08594595 0.00719772 0.01295875]
 [0.08594595 0.13175676 0.01013514 0.01432432]
 [0.00719772 0.01013514 0.02316501 0.00495021]
 [0.01295875 0.01432432 0.00495021 0.01229018]]

Covariance matrix of the class '1'
[[0.25527027 0.07567568 0.16823574 0.0466967 ]
 [0.07567568 0.08585586 0.06156156 0.03035285]
 [0.16823574 0.06156156 0.21067568 0.06233483]
 [0.0466967  0.03035285 0.06233483 0.03487988]]

Covariance matrix of the class '2'
[[0.41900901 0.1048048  0.32501502 0.03981231]
 [0.1048048  0.11774775 0.09198949 0.04582583]
 [0.32501502 0.09198949 0.3240991  0.05598348]
 [0.03981231 0.04582583 0.05598348 0.07145646]]


#### 5 - Calculate inverse covariance matrices

In [32]:
from numpy.linalg import inv

inv_cov_matrices = {label: inv(cov_matrices[label]) for label in class_labels}

print("Inverse covariance matrix of the class '0'")
print(inv_cov_matrices[0])

print("\nInverse covariance matrix of the class '1'")
print(inv_cov_matrices[1])

print("\nInverse covariance matrix of the class '2'")
print(inv_cov_matrices[2])

Inverse covariance matrix of the class '0'
[[ 27.63929    -17.0526898    0.93375397  -9.64384368]
 [-17.0526898   19.28057302  -2.38236505  -3.53179813]
 [  0.93375397  -2.38236505  47.63805004 -17.39542786]
 [ -9.64384368  -3.53179813 -17.39542786 102.65702686]]

Inverse covariance matrix of the class '1'
[[  9.40953681  -4.59920671  -7.69691249   5.16029664]
 [ -4.59920671  19.22502389   2.51081321 -15.05961859]
 [ -7.69691249   2.51081321  16.46122762 -21.29875055]
 [  5.16029664 -15.05961859 -21.29875055  72.92994376]]

Inverse covariance matrix of the class '2'
[[ 12.00062655  -3.1843019  -11.94509098   4.71445678]
 [ -3.1843019   13.80939062   0.57486952  -7.53235652]
 [-11.94509098   0.57486952  15.97750702  -6.2311889 ]
 [  4.71445678  -7.53235652  -6.2311889   21.08034414]]


Verification of the inverse covariance matrix

In [34]:
print(cov_matrices[0] @ inv_cov_matrices[0])

[[ 1.00000000e+00 -8.57137666e-17  3.01320008e-18  4.37365746e-17]
 [-5.82137581e-17  1.00000000e+00  1.29591991e-17 -2.24630199e-17]
 [-1.49504280e-17 -3.90462002e-18  1.00000000e+00  5.60190537e-17]
 [ 1.91139687e-17 -3.66254703e-17  1.46373880e-17  1.00000000e+00]]


#### 6 - Calculate the a priori probabilities of each class in the training data

In [35]:
priors = {label: X_train_class[label].shape[0] / X_train.shape[0] for label in class_labels}

print("A priori probability of the class '0'")
print(priors[0])

print("\nA priori probability of the class '1'")
print(priors[1])

print("\nA priori probability of the class '2'")
print(priors[2])

A priori probability of the class '0'
0.3392857142857143

A priori probability of the class '1'
0.33035714285714285

A priori probability of the class '2'
0.33035714285714285


#### 7 - Implement the function of calculating the values ​​of the discriminant function for one line (vector) of test data

In [37]:
def discriminant_function(x, mean, inv_cov, prior):
    diff = x - mean
    return -0.5 * np.dot(np.dot(diff.T, inv_cov), diff) + np.log(prior)

#### 8 - Implement the function of calculating the values ​​of the discriminant function and the probabilities of belonging to each class for the entire matrix of test data

In [38]:
def predict_discriminant(X_test, means, inv_cov_matrices, priors):
    discriminants = np.zeros((X_test.shape[0], len(class_labels)))

    for i, label in enumerate(class_labels):
        mean = means[label]
        inv_cov = inv_cov_matrices[label]
        prior = priors[label]
        for j, x in enumerate(X_test):
            discriminants[j, i] = discriminant_function(x, mean, inv_cov, prior)

    # Predict the class with the highest discriminant score
    predictions = np.argmax(discriminants, axis=1)
    return predictions

In [40]:
means = {label: np.mean(X_train_class[label], axis=0) for label in class_labels}
means

{0: array([4.99473684, 3.45      , 1.48157895, 0.24736842]),
 1: array([5.9972973 , 2.74324324, 4.26486486, 1.31081081]),
 2: array([6.66486486, 2.99459459, 5.60810811, 2.04864865])}

In [41]:
custom_predictions = predict_discriminant(X_test, means, inv_cov_matrices, priors)
custom_predictions

array([0, 1, 1, 1, 0, 1, 2, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0, 1, 0, 1, 2, 1,
       2, 1, 2, 1, 0, 2, 0, 1, 2, 2, 0, 0, 0, 0, 2, 1], dtype=int64)

#### 9 - Make predictions on test data using the `QuadraticDiscriminantAnalysis()` function of the sklearn library and compare the obtained results

In [42]:
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)
sklearn_predictions = qda.predict(X_test)
sklearn_predictions

array([0, 1, 1, 1, 0, 1, 2, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0, 1, 0, 1, 2, 1,
       2, 1, 2, 1, 0, 2, 0, 1, 2, 2, 0, 0, 0, 0, 2, 1])

Classification report for the custom QDA algorithm

In [45]:
from sklearn.metrics import classification_report

custom_report = classification_report(y_test, custom_predictions)
print(custom_report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



Classification report for the sklearn QDA algorithm

In [46]:
sklearn_report = classification_report(y_test, sklearn_predictions)
print(sklearn_report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



#### 10 - Draw a conclusion about the degree of similarity of the results obtained by the own function and the `sklearn` library

Both QDA algorithms (custom and provided by sklearn) perfectly coped with the task and correctly predicted all three classes from the Iris dataset.