# Module 2 - Activity

During this module we discussed the covariance matrix, which is a matrix that describes the variance and covariance between multiple variables. 

- Reference the lecture notes for more details on the covariance matrix.

In this activity you will be creating a covariance matrix, without the use of any built in method, for the following data set.

In [84]:
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()

# iris is a Bunch object, similar to a dictionary, containing data and metadata
# The features (measurements) of the Iris dataset are stored in 'data'
iris_data = iris.data

# The labels (species of each instance) are stored in 'target'
iris_labels = iris.target

# The names of the features and labels are also stored
feature_names = iris.feature_names
label_names = iris.target_names

# To see the shape of the dataset
print("Data shape:", iris_data.shape)  # e.g., (150, 4)
print("Labels shape:", iris_labels.shape)  # e.g., (150,)

# If you want to see the first few entries
print("First 5 rows of data:\n", iris_data[:5])
print("First 5 labels:", iris_labels[:5])
print(iris_data[:5].T)


Data shape: (150, 4)
Labels shape: (150,)
First 5 rows of data:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
First 5 labels: [0 0 0 0 0]
[[5.1 4.9 4.7 4.6 5. ]
 [3.5 3.  3.2 3.1 3.6]
 [1.4 1.4 1.3 1.5 1.4]
 [0.2 0.2 0.2 0.2 0.2]]


Let's create a function that will generate the covariance given a matrix

In [85]:
# Implementation of creating the covariance matrix
import numpy as np
def create_covariance_matrix(mat):
    N = mat.shape[1]
    mat_mean = np.mean(mat, axis=1, keepdims=True)
    mean_centered =(mat - mat_mean)
    return mean_centered @ mean_centered.T / (N-1)

In [86]:
our_cov = create_covariance_matrix(iris_data)

Let's check to see that `our_cov` is the same as the result of numpy's `np.cov`, the predefined method of calculating the covariance of a matrix 

In [87]:
our_cov - np.cov(iris_data)

array([[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        4.4408921e-16, 4.4408921e-16],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        4.4408921e-16, 0.0000000e+00],
       ...,
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 4.4408921e-16],
       [4.4408921e-16, 0.0000000e+00, 4.4408921e-16, ..., 0.0000000e+00,
        4.4408921e-16, 0.0000000e+00],
       [4.4408921e-16, 0.0000000e+00, 0.0000000e+00, ..., 4.4408921e-16,
        0.0000000e+00, 0.0000000e+00]])

Looks like `our_cov` generated from `create_covariance_matrix` is the same as `np.cov` within numerical precision (1e-8). So, we successfully have an implementation of creating the covariance matrix.