## Principal Component Analysis (PCA) on U.S. Interest Rates

This exercise is a simplification of the classic yield-curve decomposition into level, slope, and bend. I encourage you to try this technique out on a forward curve of your choice.

In [None]:
import Haver
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
Haver.path()

In [None]:
Haver.path('c:\DLX\dat')

In [None]:
df=Haver.data(['ftb3','fcm1', 'fcm5', 'fcm7', 'fcm30'], 'us1plus', dates=True)

In [None]:
df=df.dropna()

In [None]:
plt.plot(df)
plt.ylabel('Percentage')
plt.xlabel('Date')
plt.title('U.S. Interest Rates 3M to 30 Year')
plt.show()

In [None]:
pca = PCA(n_components=5)
pca.fit(df)

We can calculate the loadings of each PC to better understand what they might represent. 

In [None]:
(pca.components_)

If you have 5 variables, you will have 5 eigenvectors and the explained variance has to sum to 1. Remember the goal of PCA is dimensionality reduction. You are hoping that there might be 1 or 2 components that explain a large portion of the covariance amoung your variables.

In [None]:
print(pca.explained_variance_ratio_)

In [None]:
pca = PCA().fit(df)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');

In [None]:
pc=pca.fit_transform(df)
pc=pd.DataFrame(pc, columns=['PC1', 'PC2', 'PC3', 'PC4', 'PC5'])
pc