# Mean/Covariance of a data set and effect of a linear transformation

In this work, we are going to investigate how the mean and (co)variance of a dataset changes when we apply affine transformation to the dataset.

## 1. Mean and Covariance of a Dataset


In [None]:
def mean_naive(X):
    """Compute the sample mean for a dataset by iterating over the dataset.
    
    Args:
        X: `ndarray` of shape (N, D) representing the dataset. N 
        is the size of the dataset and D is the dimensionality of the dataset.
    Returns:
        mean: `ndarray` of shape (D, ), the sample mean of the dataset `X`.
    """
    N, D = X.shape
    calc = np.zeros(D)
    for n in range(N):
        calc += X[n]
    mean = calc / N
    return mean

In [None]:
def mean(X):
    """Compute the sample mean for a dataset.
    
    Args:
        X: `ndarray` of shape (N, D) representing the dataset.
        N is the size of the dataset (the number of data points) 
        and D is the dimensionality of each data point.
        ndarray: ndarray with shape (D,), the sample mean of the dataset `X`.
    """
    mean = np.sum(X,0)/X.shape[0]
    return mean

In [None]:
def cov_naive(X):
    """Compute the covariance for a dataset of size (D,N) 
    where D is the dimension and N is the number of data points"""
    N, D = X.shape
    calc = np.zeros((D, D))
    mean = mean_naive(X)
    for n in range(N):
        diff = np.asmatrix(X[n] - mean)
        calc += diff.T @ diff 
    covariance = calc / N
    return covariance

In [None]:
def cov(X):
    """Compute the sample covariance for a dataset.
    
    Args:
        X: `ndarray` of shape (N, D) representing the dataset.
        N is the size of the dataset (the number of data points) 
        and D is the dimensionality of each data point.
    Returns:
        ndarray: ndarray with shape (D, D), the sample covariance of the dataset `X`.
    """
    
    # It is possible to vectorize our code for computing the covariance with matrix multiplications,
    # i.e., we do not need to explicitly
    # iterate over the entire dataset as looping in Python tends to be slow
    N, D = X.shape
    covariance_matrix = np.dot((X-mean(X)).T, (X-mean(X)))/X.shape[0]
    return covariance_matrix

## 2. Affine Transformation of Dataset
We are also going to verify a few properties about the mean and covariance of affine transformation of random variables.

Consider a data matrix $X$ of size (N, D). We would like to know what is the covariance when we apply affine transformation $Ax_i + b$ for each datapoint $x_i$ in $X$. i.e.
we would like to know what happens to the mean and covariance for the new dataset if we apply affine transformation.

In [None]:
def affine_mean(mean, A, b):
    """Compute the mean after affine transformation
    Args:
        mean: ndarray, the mean vector
        A, b: affine transformation applied to x. i.e. Ax + b
    Returns:
        ndarray of size (D, ): mean vector after affine transformation
    """
    affine_m = A @ mean +b # EDIT THIS
    return affine_m

In [None]:
def affine_covariance(S, A, b):
    """Compute the covariance matrix after affine transformation
    Args:
        S: ndarray of size (D, D), the covariance matrix
        A, b: affine transformation applied to each element in X        
    Returns:
        ndarray of size (D, D): covariance matrix after the transformation
    """
    affine_cov = A @ S @ A.T
    return affine_cov