# 3. Exploiting Correlation

Financial Data is meant not only to process data but to understand how meaningful factors can be used to summarize or represent the data. Let's understand the role that correlation and principal components play.

**a. Generate 5 uncorrelated Gaussian random rariables that simulate yield changes(they can be positive or negative with a mean close to 0 and a standard deviation that is small).**

In [3]:
import numpy as np


# Set the mean an standard deviation
mean = 0
std_dev = 0.01 # can adjust this value to change the spread of the data

# Generate 5 uncorrelated Gaussian random varibales
random_variables = np.random.normal(mean, std_dev, 5)

print(random_variables)

[ 0.00620131 -0.01282688  0.00722685  0.00318651  0.00025526]


This code will generate an array of 5 random variables sampled from a normal distribution with the specified mean and standard deviation. The variables will be uncorrelated because they are drawn independently from the distribution.

**b. Run a principal Components using EITHER the correlation OR covariace matrix.**

To perform Principal Component Analysis(PCA) using either the correlation or covariance matrix, can follow these steps.

**Using the Covariance Matrix**

1. **Standardize the Data:** Ensure data is standardized (mean of 0 and standard deviation of 1) if I want the covariance matrix to be equivalent to the correlaiton matrix.
2. **Compute the Covariance Matrix:** Calculate the covariance matrix of the standardized dataset.
3. **Eigenvalue Decompostion:** Perform eigenvalue decomposition on the covariance matrix to obtain the eigenvalues and eigenvectors.
4. **Sort Eigenvalues and Eigenvectors:** Sort the eigenvalues in descending order and arrange the corresponding eigenvectores accordingly.
5. **Principal Components:** The eigenvectors corresponding to the sorted eigenvalues are the principal components. The eigenvalues represents the variance the variance explained by each principal component.

**Using the Correlation Matrix**

1. **Standardize the Data:** Ensure data is standardized(mean of 0 and standard deviation of 1) if I want the covariance matrix to be equivalent to the correaltion matrix.
2. **Compute the Correlation Matrix:** Calculate the correaltion matrix of the standardized dataset. For a standardized dataset, the correlation matrix is equivalent to the covariance matrix.
3. **Eigenvalue Decompostion:** Perform eigenvalue decomposition on the correlation matrix to obtain the eigenvalues and eigenvectors.
4. **Sort Eigenvalues and Eigenvectors:** Sort the eigenvalues in descending order and arrange the corresponding eigenvectores accordingly.
5. **Principle Components:** The eigenvectors corresponding ot sorted eighenvalues are the principal components. The eigenvalues represents the variance explained by each principal component.

In [6]:
# step 1: Standardize the data
data = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0], 
                 [2.3, 2.7], [2, 1.6], [1, 1.1], [1.5, 1.6], [1.1, 0.9]])
mean = np.mean(data, axis=0)
std_dev = np.std(data, axis=0)
standardized_data = (data - mean) / std_dev

# Step 2: Compute the Covariance Matrix
covariance_matrix = np.cov(standardized_data, rowvar=False)

# Step 3: Eigenvalues Decompostion
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

# Step 4: sort Eigenvalues and Eigenvectors
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indices]
sorted_eigenvectors = eigenvectors[:, sorted_indices]

# Step 5: Principal Components
principal_components = sorted_eigenvectors

print("Eigenvalues (Variance Explained):", sorted_eigenvalues)
print("Principal Components:", principal_components)

Eigenvalues (Variance Explained): [2.13992141 0.08230081]
Principal Components: [[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]


**Using Singular Value Decomposition (SVD)**

In [8]:
# Step 1: Standardize the Data
data = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0], 
                 [2.3, 2.7], [2, 1.6], [1, 1.1], [1.5, 1.6], [1.1, 0.9]])
mean = np.mean(data, axis=0)
std_dev = np.std(data, axis=0)
standardized_data = (data - mean) / std_dev

# Step 2: Compute SVD
U, S, Vt = np.linalg.svd(standardized_data)

# Step 3: Principle component and Eigenvalues
eigenvalues = S**2 / (standardized_data.shape[0] - 1)
eigenvectors = Vt.T


print("Eigenvalues (Variance Explained):", eigenvalues)
print("Principal Components:", eigenvectors)

Eigenvalues (Variance Explained): [2.13992141 0.08230081]
Principal Components: [[-0.70710678  0.70710678]
 [-0.70710678 -0.70710678]]


In the both methods, the eigenvalues represents the variance explained by each principal component, and the eigenvectors are the principal components themselves