# 0. PCA UK YIELD CURVE:  A "2 FACTOR" MODEL OF ALL SPOT RATES

This notebook is base on by Nathan Thomas's notebook published in:
https://towardsdatascience.com/applying-pca-to-the-yield-curve-4d2023e555b3
which we have commented and extended.

We are going to show how to apply PCA to the yield curve of the UK.
We will show that the first and second principal component transforms 
(what we have been calling the tranformed "Z1" and "Z2" features of  4.PCAInMoreDepth.pptx slides 25 to 32)
are "latent" or hidden feature (as per slide 37) that
drive the behavior of the spot rates as a whole and that
correspond closely to the 10 Year Spot Rate and the UK Inflation Rate.

# 1. Import and clean data

First we import the spot rates at different maturities from 6 months to 10 years (20 rates=features in columns).

In [50]:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

# Import Bank of England spot curve data from excel
df = pd.read_csv("indu_dly.csv", 
                   index_col=0)

df.head(20)

Unnamed: 0_level_0,CSCO,DIS,XOM,BA,UNH,MMM,HD,VZ,TRV,JNJ,IBM,PG,NKE,WBA,JPM,MRK,CVX,KO,PFE,WMT,GS,AAPL,UTX,MCD,AXP,MSFT,INTC,CAT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
1/2/2008,20.984388,27.342632,67.229462,64.505844,49.060883,61.666119,19.503134,22.685774,39.460739,46.841499,76.52536,50.629448,10.203184,29.631153,32.001144,37.507103,61.992542,18.792377,14.51558,35.724121,179.894257,18.559986,57.574043,40.849876,42.19574,26.874113,17.842361,49.893444
1/3/2008,21.150425,27.28252,67.459534,64.773933,49.285957,61.658646,19.286514,22.790775,39.988277,46.855709,76.678848,50.629448,10.111278,27.821856,31.781075,37.474411,62.755371,18.989252,14.686652,35.328033,177.502563,18.568562,58.393135,40.730358,41.674877,26.988579,17.363743,49.716854
1/4/2008,20.652304,26.732916,66.201355,63.910072,48.480827,60.95784,18.644131,22.365517,38.729691,46.791748,73.923096,50.426399,9.954878,27.210827,31.060158,37.167137,61.919586,19.026157,14.464897,34.825302,173.247818,17.151121,57.428581,40.111637,40.624966,26.23316,15.956061,48.409992
1/7/2008,20.66021,26.75868,65.583069,61.713223,49.242664,60.473217,18.950386,22.759279,39.21957,47.51664,73.133636,50.755497,10.014534,27.829788,31.371283,37.866669,61.110355,19.47529,14.718331,35.465145,168.741821,16.921555,57.16066,40.800659,40.806843,26.408663,16.103865,48.551277
1/8/2008,20.106749,26.22625,64.741875,59.508919,48.662632,59.8022,18.479799,22.282015,38.021275,47.573505,71.335457,50.888527,9.969388,27.163223,30.126766,39.004238,60.327671,19.555275,14.87039,35.015739,164.209808,16.312857,54.956001,40.132729,39.641163,25.523544,15.667488,47.435162
1/9/2008,20.747189,25.899933,65.827492,59.799355,48.645313,59.787308,18.472334,22.536732,38.53376,48.1847,71.861755,50.790497,9.99196,26.782312,30.551727,39.553432,61.402203,20.07822,15.15551,35.724121,166.1595,17.089205,55.08614,40.406944,40.575344,26.278934,16.012363,47.491676
1/10/2008,20.747189,26.337894,65.899414,61.333412,48.558754,59.8022,18.88316,23.056765,38.865356,48.262878,73.038612,50.748497,10.083866,27.393347,31.363697,39.533802,60.957783,20.167431,15.256886,36.866684,170.630814,16.957756,55.415287,40.899097,40.443069,26.195017,15.864564,47.618793
1/11/2008,20.454641,26.037329,64.921638,59.963165,48.402924,57.841347,18.457399,22.552643,38.782459,48.241539,71.393929,49.138092,9.787188,26.74263,31.007036,39.586105,60.14193,19.616798,15.218871,36.348724,172.216644,16.450031,54.871788,38.192181,36.375626,25.874537,15.477448,46.62986
1/14/2008,20.786722,26.063087,65.30265,60.819561,47.97871,58.527264,18.965324,22.812664,38.473461,48.291302,75.238831,49.215115,9.792024,26.433151,31.386461,39.082706,60.301136,19.779833,15.187185,36.310635,174.738266,17.030151,55.262203,38.642155,36.78899,26.240786,16.244637,47.519909
1/15/2008,20.438824,25.633717,64.001366,57.982265,47.571808,57.543121,18.950386,22.377531,37.440987,48.156265,74.434792,48.802006,9.200279,26.195086,29.72456,38.036652,58.549973,19.567572,14.946425,35.792675,167.493988,16.102345,54.626816,37.798439,35.358761,25.943209,15.970139,46.198952


In [51]:
# Drop nan values
df = df.dropna(how="any")

In [52]:
df.head()

Unnamed: 0_level_0,CSCO,DIS,XOM,BA,UNH,MMM,HD,VZ,TRV,JNJ,IBM,PG,NKE,WBA,JPM,MRK,CVX,KO,PFE,WMT,GS,AAPL,UTX,MCD,AXP,MSFT,INTC,CAT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
1/2/2008,20.984388,27.342632,67.229462,64.505844,49.060883,61.666119,19.503134,22.685774,39.460739,46.841499,76.52536,50.629448,10.203184,29.631153,32.001144,37.507103,61.992542,18.792377,14.51558,35.724121,179.894257,18.559986,57.574043,40.849876,42.19574,26.874113,17.842361,49.893444
1/3/2008,21.150425,27.28252,67.459534,64.773933,49.285957,61.658646,19.286514,22.790775,39.988277,46.855709,76.678848,50.629448,10.111278,27.821856,31.781075,37.474411,62.755371,18.989252,14.686652,35.328033,177.502563,18.568562,58.393135,40.730358,41.674877,26.988579,17.363743,49.716854
1/4/2008,20.652304,26.732916,66.201355,63.910072,48.480827,60.95784,18.644131,22.365517,38.729691,46.791748,73.923096,50.426399,9.954878,27.210827,31.060158,37.167137,61.919586,19.026157,14.464897,34.825302,173.247818,17.151121,57.428581,40.111637,40.624966,26.23316,15.956061,48.409992
1/7/2008,20.66021,26.75868,65.583069,61.713223,49.242664,60.473217,18.950386,22.759279,39.21957,47.51664,73.133636,50.755497,10.014534,27.829788,31.371283,37.866669,61.110355,19.47529,14.718331,35.465145,168.741821,16.921555,57.16066,40.800659,40.806843,26.408663,16.103865,48.551277
1/8/2008,20.106749,26.22625,64.741875,59.508919,48.662632,59.8022,18.479799,22.282015,38.021275,47.573505,71.335457,50.888527,9.969388,27.163223,30.126766,39.004238,60.327671,19.555275,14.87039,35.015739,164.209808,16.312857,54.956001,40.132729,39.641163,25.523544,15.667488,47.435162


In [53]:
df = df.pct_change(periods = 1, fill_method = 'pad').fillna(0)
df.head()

Unnamed: 0_level_0,CSCO,DIS,XOM,BA,UNH,MMM,HD,VZ,TRV,JNJ,IBM,PG,NKE,WBA,JPM,MRK,CVX,KO,PFE,WMT,GS,AAPL,UTX,MCD,AXP,MSFT,INTC,CAT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
1/2/2008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1/3/2008,0.007912,-0.002198,0.003422,0.004156,0.004588,-0.000121,-0.011107,0.004628,0.013369,0.000303,0.002006,0.0,-0.009008,-0.061061,-0.006877,-0.000872,0.012305,0.010476,0.011785,-0.011087,-0.013295,0.000462,0.014227,-0.002926,-0.012344,0.004259,-0.026825,-0.003539
1/4/2008,-0.023551,-0.020145,-0.018651,-0.013337,-0.016336,-0.011366,-0.033307,-0.018659,-0.031474,-0.001365,-0.035939,-0.00401,-0.015468,-0.021962,-0.022684,-0.0082,-0.013318,0.001943,-0.015099,-0.01423,-0.02397,-0.076336,-0.016518,-0.015191,-0.025193,-0.02799,-0.08107,-0.026286
1/7/2008,0.000383,0.000964,-0.009339,-0.034374,0.015714,-0.00795,0.016426,0.017606,0.012649,0.015492,-0.010679,0.006526,0.005993,0.022747,0.010017,0.018821,-0.013069,0.023606,0.017521,0.018373,-0.026009,-0.013385,-0.004665,0.017178,0.004477,0.00669,0.009263,0.002919
1/8/2008,-0.026789,-0.019897,-0.012826,-0.035719,-0.011779,-0.011096,-0.024833,-0.02097,-0.030553,0.001197,-0.024588,0.002621,-0.004508,-0.023951,-0.039671,0.030041,-0.012808,0.004107,0.010331,-0.012672,-0.026858,-0.035972,-0.03857,-0.016371,-0.028566,-0.033516,-0.027098,-0.022988


# Assignment code:

In [38]:
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(df)

PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [44]:
pca1_vec = pca.components_
pca1_val = pca.explained_variance_
print("eigenvectors:", pca.components_)
print("eigenvalues:", pca.explained_variance_)

eigenvectors: [[ 2.03719228e-01  2.06641638e-01  1.73781907e-01  1.95281520e-01
   1.95801634e-01  1.63117366e-01  1.86500328e-01  1.32984392e-01
   2.05211273e-01  1.06099069e-01  1.46525912e-01  1.08875660e-01
   1.75939879e-01  1.50845441e-01  3.26604765e-01  1.50399869e-01
   1.96664449e-01  1.09202189e-01  1.45752314e-01  9.78738340e-02
   2.87411042e-01  1.81236642e-01  1.85548794e-01  1.09779301e-01
   2.97261759e-01  1.85523075e-01  1.98151502e-01  2.34421393e-01]
 [-8.09442496e-02 -7.88344607e-02 -1.87346057e-01 -1.18024853e-01
  -2.11255006e-01 -7.45102778e-02 -1.41670972e-03 -8.73280332e-02
   2.17168402e-02 -1.30983466e-01 -5.21703978e-02 -1.13334725e-01
  -6.50588338e-02 -1.30376905e-01  6.05239407e-01 -1.79667798e-01
  -1.79087359e-01 -1.41874015e-01 -1.10102130e-01 -1.13087020e-01
   4.49849053e-01 -5.90111290e-02 -9.59156578e-02 -1.04407895e-01
   3.05700472e-01 -1.30757260e-01 -9.85319284e-02 -6.36984260e-02]
 [-2.34371526e-01 -1.32873990e-02 -2.02432178e-02 -7.9039829

In [81]:
proj = pca.transform(df)
pc1_proj = proj[:,0]
print(pc1_proj)

[-0.00278049 -0.01714361 -0.12467995 ... -0.03511155 -0.00619676
 -0.01883786]


In [67]:
df2 = pd.read_csv("indu_index_dly.csv", 
                   index_col=0)
indu_index = df2.pct_change(periods = 1, fill_method = 'pad').fillna(0)

indu_index.head(20)

Unnamed: 0_level_0,^DJI
Date,Unnamed: 1_level_1
2008-01-02,0.0
2008-01-03,0.000978
2008-01-04,-0.019648
2008-01-07,0.002134
2008-01-08,-0.018587
2008-01-09,0.011616
2008-01-10,0.009248
2008-01-11,-0.019201
2008-01-14,0.013632
2008-01-15,-0.021681


In [68]:
correlation = np.corrcoef(pc1_proj, indu_index['^DJI'])
print(correlation)

[[1.         0.98062386]
 [0.98062386 1.        ]]


In [87]:
'''Calculate the explained variance (in percent) of the first eigenvector (1).
Working backwards from the all principal component projections,
calculate their covariance matrix, and 
calculate the variance (in percent) of the first component projection (2).
(1) and (2) should be the same.'''
explained_variance = pca1_val[0]
print('explained variance first vector:', explained_variance)
principal_component_projections = pd.DataFrame(proj)
print('variance:', pc1_proj.var())
principal_component_projections.corr()


explained variance first vector: 0.004923951155053203
variance: 0.0049217791430748


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
0,1.0,2.77219e-14,-1.63666e-16,2.566861e-16,-1.068315e-16,1.97037e-16,5.839056e-16,-3.944198e-16,-6.935713e-16,-2.497058e-16,-7.589786e-17,-7.799038e-18,-4.437187e-16,2.626346e-16,-1.245223e-16,1.729557e-17,1.413365e-15,-4.776721e-16,5.306129e-16,5.327313e-16,-3.795667e-16,-2.924942e-16,3.728847e-17,5.2163910000000005e-17,1.001573e-16,7.250758000000001e-17,1.95699e-16,2.533328e-16
1,2.77219e-14,1.0,-2.298452e-16,-1.0096760000000001e-17,-3.395331e-16,1.1387130000000001e-17,4.580247e-16,-2.304086e-16,-9.357598e-18,1.768762e-16,4.2823630000000005e-17,-1.275678e-16,-2.440464e-16,5.841096000000001e-17,1.385113e-16,-2.157036e-16,-2.717818e-16,5.467945e-16,-3.153747e-16,-8.329304e-17,1.752422e-16,5.0579660000000006e-17,4.027474e-16,-1.33004e-16,1.93882e-16,8.264728e-17,-2.033778e-17,3.736979e-16
2,-1.63666e-16,-2.298452e-16,1.0,-2.029315e-16,-4.4500470000000004e-17,1.0476310000000002e-17,1.239193e-16,8.735913000000001e-17,9.287560000000001e-17,-7.292751e-17,-3.622758e-16,6.193525000000001e-17,1.9297820000000002e-17,4.425136e-16,1.550484e-16,-2.1741800000000002e-17,-1.525261e-16,2.401164e-16,-2.202181e-16,-4.6627840000000006e-17,-7.029678e-17,1.912405e-16,-3.909961e-16,-1.027189e-17,-5.637113000000001e-17,2.784e-18,-1.14766e-16,6.565744e-16
3,2.566861e-16,-1.0096760000000001e-17,-2.029315e-16,1.0,9.813209000000001e-17,5.878077e-16,-9.219719e-18,-7.860394e-16,3.189957e-17,4.471832e-16,2.696499e-16,-3.968061e-16,3.3563110000000004e-17,-7.087933e-16,1.103096e-16,-2.508397e-16,-5.69789e-16,3.190941e-16,-1.320521e-16,-6.469511e-17,-1.012833e-16,-6.948429e-17,5.705452000000001e-17,-4.6260780000000006e-17,-3.665116e-16,3.416344e-16,3.8325e-16,-5.927563e-16
4,-1.068315e-16,-3.395331e-16,-4.4500470000000004e-17,9.813209000000001e-17,1.0,1.84517e-16,3.461897e-16,2.464635e-16,1.102864e-15,-8.272226e-16,2.94985e-16,1.701065e-16,1.145846e-17,3.466868e-16,-3.477976e-16,1.489111e-16,4.586095e-16,-2.00011e-16,-5.097749e-16,-4.626512e-16,-2.500878e-16,2.7529e-16,9.725135e-16,8.602354000000001e-17,1.073983e-16,4.855147e-16,7.250356e-16,3.808667e-17
5,1.97037e-16,1.1387130000000001e-17,1.0476310000000002e-17,5.878077e-16,1.84517e-16,1.0,-1.87939e-16,1.145856e-15,-5.341904000000001e-17,1.193674e-16,-2.6055350000000004e-17,4.8709840000000005e-17,3.17732e-16,-1.019157e-15,2.495663e-16,-9.296136000000001e-17,-4.092215e-16,1.482584e-16,-3.207432e-17,-1.425827e-16,-1.224322e-16,1.5938290000000003e-17,3.96778e-16,2.5762360000000002e-17,-3.159254e-16,-1.037744e-16,-3.613952e-16,2.377092e-16
6,5.839056e-16,4.580247e-16,1.239193e-16,-9.219719e-18,3.461897e-16,-1.87939e-16,1.0,1.114631e-16,5.003492e-16,-1.635965e-16,4.105432e-16,-3.987458e-16,2.69468e-16,-1.16108e-16,2.114259e-16,2.32911e-17,-5.349921000000001e-17,-3.769946e-16,6.605051e-17,-3.466056e-16,-2.616855e-16,6.880316e-17,8.462128000000001e-17,-6.933747000000001e-17,-5.474904e-17,8.625947e-17,5.149602e-16,-1.247457e-16
7,-3.944198e-16,-2.304086e-16,8.735913000000001e-17,-7.860394e-16,2.464635e-16,1.145856e-15,1.114631e-16,1.0,-3.3979250000000005e-17,-2.650821e-16,-1.117038e-16,2.626904e-16,6.13634e-17,2.270161e-16,-8.862239000000001e-17,-1.854362e-16,2.259114e-16,-4.247345e-16,-9.194014000000001e-17,7.108735e-18,1.134651e-17,-1.208959e-16,9.568557e-17,-5.0566480000000006e-17,-1.8601900000000003e-17,2.408004e-16,2.085916e-16,-8.543069e-17
8,-6.935713e-16,-9.357598e-18,9.287560000000001e-17,3.189957e-17,1.102864e-15,-5.341904000000001e-17,5.003492e-16,-3.3979250000000005e-17,1.0,-1.631054e-15,1.992925e-16,-2.401144e-16,-7.648874e-17,3.82088e-16,2.016074e-18,3.16954e-16,6.986757000000001e-17,8.120893000000001e-17,-1.812369e-16,8.217724e-17,-8.416917000000001e-17,2.160893e-16,2.512449e-16,2.4953570000000002e-17,-4.76325e-17,-2.348575e-16,1.937271e-16,1.005939e-16
9,-2.497058e-16,1.768762e-16,-7.292751e-17,4.471832e-16,-8.272226e-16,1.193674e-16,-1.635965e-16,-2.650821e-16,-1.631054e-15,1.0,9.97007e-16,-8.048162e-16,1.375826e-16,-4.435667e-16,2.583657e-16,2.650121e-16,-9.920878000000001e-18,-2.096346e-16,-2.565796e-16,-2.972234e-16,-1.763991e-16,-2.411757e-16,3.15113e-18,1.0378130000000001e-17,-2.520561e-16,-3.554014e-16,2.33772e-16,-1.838292e-16


In [77]:
#beta
from sklearn.linear_model import LinearRegression
betas_by_regression = []
for col in df.columns:
  X = pc1_proj.reshape(-1,1)
  lr = LinearRegression()
  y = df[col]
  reg = lr.fit(X,y)
  betas_by_regression.append(reg.coef_)
betas_by_regression

[array([0.20371923]),
 array([0.20664164]),
 array([0.17378191]),
 array([0.19528152]),
 array([0.19580163]),
 array([0.16311737]),
 array([0.18650033]),
 array([0.13298439]),
 array([0.20521127]),
 array([0.10609907]),
 array([0.14652591]),
 array([0.10887566]),
 array([0.17593988]),
 array([0.15084544]),
 array([0.32660477]),
 array([0.15039987]),
 array([0.19666445]),
 array([0.10920219]),
 array([0.14575231]),
 array([0.09787383]),
 array([0.28741104]),
 array([0.18123664]),
 array([0.18554879]),
 array([0.1097793]),
 array([0.29726176]),
 array([0.18552308]),
 array([0.1981515]),
 array([0.23442139])]

In [80]:
betas_by_pc1_eigenvector = pca1_vec[0]
print(betas_by_pc1_eigenvector)

[0.20371923 0.20664164 0.17378191 0.19528152 0.19580163 0.16311737
 0.18650033 0.13298439 0.20521127 0.10609907 0.14652591 0.10887566
 0.17593988 0.15084544 0.32660477 0.15039987 0.19666445 0.10920219
 0.14575231 0.09787383 0.28741104 0.18123664 0.18554879 0.1097793
 0.29726176 0.18552308 0.1981515  0.23442139]


# Sample Code:

We examine the correlation of the 20 rates. 
As you can see, the correlation among the rates is very high, 
signifying that there may be one or two major underlying features driving the entire system of rates over time.

In [None]:
df.corr()

We will now standardize the 20 rates by subtracting the mean and dividing by the standard deviation.
This standardization prevents a situation where 
a given rate winds up as a protagonist of our PCA analysis simply 
because the series happens to be have markedly higher levels than the other series with which it is compared.

In [None]:
# Standardise the data in the df into z scores
#df_std = ((df-df.mean()) / df.std())
#df_std.head()

We will now apply the np.cov (covariance) function to the 20 standardized rates
to calculate the covariance matrix but 
since the 20 rates are standardized
the calculation of the covariance matrix yields the correlation matrix.
To perform PCA: 
We will use the correlation matrix as input to the function np.linalg.eig.
We could instead skip the calculation of the covariance/correlation matrix
and apply sklearn.decomposition.PCA to the standardized rates (df_std) directly.
Both approaches give the same result.

In [49]:
# Create a covariance matrix 

corr_matrix_array = np.array(np.cov(df_std, rowvar=False))

## 2. Compute the eigenvalues & eigenvectors of the correlation matrix

In [None]:
# Perform eigendecomposition

eigenvalues, eigenvectors = np.linalg.eig(corr_matrix_array)

# Put data into a DataFrame and save to excel
df_eigval = pd.DataFrame({"Eigenvalues":eigenvalues}, index=range(1,21))

df_eigval.to_excel("df_eigval.xlsx")
eigenvalues

In [None]:
# Save output to Excel
df_eigvec = pd.DataFrame(eigenvectors, index=range(1,21))

df_eigvec.to_excel("df_eigvec.xlsx")
eigenvectors[0]

We will now calculate the proportion of variance explained by each eigenvector 
using as input the eigenvalues calculated by  np.linalg.eig

In [None]:
# Work out explained proportion 
df_eigval["Explained proportion"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])

#Format as percentage
df_eigval.style.format({"Explained proportion": "{:.2%}"})


# sample:PCA projections 

We now calculate the PCA projections (or
what we have been calling the tranformed "Z" features of  4.PCAInMoreDepth.pptx slides 25 to 32).
These are "latent" or hidden feature (as per slide 37) that
drive the movement of the rates as a whole.
The most important is these Z features is Z1,
the one with the highest eigenvalue of 19.660842 
corresponding to an explained variance proportion of 98.30%.
We are going to select this feature by calling upon
principal_components[0] below and plotting it.
When we plot the first principal component, 
we can see that it looks very similar 
to the history of the 10-year maturity spot rate.

In [None]:
principal_components = df_std.dot(eigenvectors)
principal_components.head()

In [None]:
#plt.style.use('ggplot')
ax = plt.figure(figsize=(9,5))
ax = plt.plot(principal_components[0])
ax = plt.title("First Principal Component")

In [None]:
df_10 = pd.DataFrame(df.iloc[:, [19]].values, columns=['10Y-Rate'], index = df.index)
ax = df_10.plot(y='10Y-Rate', legend=False)
ax = plt.title("10Y Spot Rate")
ax = plt.figure(figsize=(9,5))

The second principal component, Z2, is 
the one with the second highest eigenvalue of 0.309852 
corresponding to an explained variance proportion of 1.55%.
The up and down movements of this second principal hidden or latent Z feature 
correspond closely to the up and down movements of the UK inflation rate that
we plot below, and with which it is highly correlated (pearson corr =0.95856134).
The UK inflation rate is approximated as the difference 
between the rates of two different maturity spot rates: 10Y-2Y.

In [None]:
# Calaculate 10Y-2Y slope

df_s = pd.DataFrame(data = df)
df_s = df_s[[2,10]]
df_s["slope"] = df_s[10] - df_s[2]
df_s.head()

In [None]:
ax = df_s.plot(y="slope", legend=False)
ax = plt.title("10Y - 2Y slope")
ax = plt.figure(figsize=(9,5))

In [None]:
np.corrcoef(principal_components[1], df_s["slope"])

We call this a "two factor" model of all spot rates,
with the 10-year spot rate and the
UK inflation rate as the underlying driving forces.

These results make sense. Remember that as per pca_yield_curve_INTRO.ipynb: 
The first eigenvector and principal component has been identified as the TSIR sensitivity to changes in yield curve level,
the second eigenvector and principal component has been identified as the TSIR sensitivity to changes in yield curve slope,
the third eigenvector and principal component has been identified as the TSIR sensitivity to changes in yield curve curvature.
The results of this notebook confirm this finding because:
The changes in yield curve level are closely tracked by the changes in the level of the 10 Year Spot Rate.
The changes in the yield curve slope are closely tracked by the changes in the Inflation Rate.

# sample: PCA projections: eigenvalue calculation by hand

It is possible to calculate the eigenvectors (and the associated explained variance)
by working backwards from the PCA projected Z features.
The eigenvectors appear as the diagonal elements of the Z features covariance matrix.
So:
1.966084e+01 is the first eigenvalue 19.66084,
3.098525e-01 is the second eigenvalue 0.309852,
etc.
To calculate the sum of the diagonal, use np.trace()

In [None]:
principal_components.cov()