# Covaraince matrix vs Scale matrix

In linear algebra and statistics, the covariance matrix and the scale matrix (also known as variance-covariance matrix or dispersion matrix) are closely related concepts. The covariance matrix represents the covariance between multiple random variables, while the scale matrix contains the variances of individual variables along the diagonal and covariances between them in off-diagonal elements.

The `covariance_matrix` function calculates the covariance matrix. To do this, we first center the data by subtracting the mean of each column from the corresponding column elements. Then we compute the covariance matrix by multiplying the transposed centered data by itself and dividing by `(num_samples - 1)` to obtain an unbiased estimator.

The `scale_matrix` function calculates the scale matrix, which is equivalent to the variance-covariance matrix. We use NumPy's cov function with `rowvar=False` to compute the variance-covariance matrix for the given data.

Finally, we print both matrices and check if they are equal within a small tolerance using `np.allclose`. The equality check is necessary because covariance matrices and scale matrices should be identical in practice. However, due to floating-point precision, exact equality may not hold, so we use the `np.allclose` function to verify their similarity.

In [4]:
import numpy as np

In [7]:
def generate_data(num_samples, num_features):
    # Generate random data with mean 0 and variance 1
    return np.random.randn(num_samples, num_features)

def covariance_matrix(data):
    # Calculate the covariance matrix
    mean_centered_data = data - np.mean(data, axis=0)
    return np.dot(mean_centered_data.T, mean_centered_data) / (data.shape[0] - 1)

def scale_matrix(data):
    # Calculate the scale matrix (variance-covariance matrix)
    return np.cov(data, rowvar=False, bias=True)

def main():
    # Define the dimensions of the data
    num_samples = 1000
    num_features = 5

    # Generate random data
    data = generate_data(num_samples, num_features)

    # Calculate the covariance matrix and scale matrix
    cov_matrix = covariance_matrix(data)
    scale_matrix_result = scale_matrix(data)

    # Output the covariance matrix and scale matrix
    print("Covariance Matrix:")
    print(cov_matrix)

    print("\nScale Matrix (Variance-Covariance Matrix):")
    print(scale_matrix_result)

    # Check if both matrices are equal (within a small tolerance). Try to tune rtol and atol to see the difference.
    matrices_equal = np.allclose(cov_matrix, scale_matrix_result, rtol=1e-05, atol=1e-08)
    print("\nAre the Covariance Matrix and Scale Matrix equal?", matrices_equal)

if __name__ == "__main__":
    main()

Covariance Matrix:
[[ 1.07152674 -0.01672361  0.04491799  0.03792043  0.04564139]
 [-0.01672361  1.01288579  0.00360372  0.00975283  0.06587266]
 [ 0.04491799  0.00360372  1.00234635 -0.03316989 -0.00862546]
 [ 0.03792043  0.00975283 -0.03316989  0.9391796  -0.01612904]
 [ 0.04564139  0.06587266 -0.00862546 -0.01612904  0.94788399]]

Scale Matrix (Variance-Covariance Matrix):
[[ 1.07045521 -0.01670689  0.04487308  0.0378825   0.04559575]
 [-0.01670689  1.0118729   0.00360011  0.00974308  0.06580679]
 [ 0.04487308  0.00360011  1.00134401 -0.03313672 -0.00861683]
 [ 0.0378825   0.00974308 -0.03313672  0.93824042 -0.01611291]
 [ 0.04559575  0.06580679 -0.00861683 -0.01611291  0.94693611]]

Are the Covariance Matrix and Scale Matrix equal? False
