# Sector PCA

The SPDR Select Funds track specific sectors within the S&P 500.  The purpose of this exercise is the guide you through performing a PCA analysis on the returns of these sector ETFs.

#### 1) Import any packages that you think you will need.

In [1]:


import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt



Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


#### 2) Read-in the data in `sector_pca.csv` into a `DataFrame` called `df_sector`.

In [2]:

# Assuming the file name is 'sector_pca.csv' and it's in the same directory
df_sector = pd.read_csv('sector_pca.csv')

# Check the first few rows of the data
print(df_sector.head())




         date       XLB       XLE       XLF       XLI       XLK       XLP  \
0  2005-01-03 -0.010142 -0.037592 -0.003610 -0.008728 -0.008563  0.000000   
1  2005-01-04 -0.018173 -0.005159 -0.009578 -0.012415 -0.018809 -0.003472   
2  2005-01-05 -0.013938 -0.004320 -0.001661 -0.006927 -0.003414 -0.005231   
3  2005-01-06  0.006645  0.017168  0.004974  0.003305 -0.002935  0.004796   
4  2005-01-07  0.001741 -0.007975 -0.005307 -0.004630  0.001958  0.004773   

        XLU       XLV       XLY  
0 -0.011557 -0.011661 -0.006539  
1 -0.006926 -0.008075 -0.011764  
2 -0.015853 -0.001691 -0.003759  
3  0.005189  0.007418 -0.005229  
4  0.000000 -0.000672 -0.000291  


#### 3) Select the features into a variable called `X`. 

In [3]:

# Dropping the 'date' column to keep only the numeric data for PCA
X = df_sector.drop(columns=['date'])




#### 4) Scale the features to have zero mean and unity standard deviation.  Call the scaled feature set `Xs`. 

In [4]:


# Scaling the features
scaler = StandardScaler()
Xs = scaler.fit_transform(X)



#### 5) Fit a 9 component PCA to the `Xs`.

In [6]:
# Fit PCA with 9 components
pca = PCA(n_components=9)
pca.fit(Xs)



#### 6) How much of the total variance is explained by the first principal component?

In [7]:
# Get explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Get the variance explained by the first principal component
first_pc_variance = explained_variance_ratio[0]

print(f'The first principal component explains {first_pc_variance:.4f} of the total variance.')





The first principal component explains 0.7219 of the total variance.


#### 7) Take a look at the first principal component.  What intuitive interpretation would you give this component?  What does the answer to #6 say about these ETFs?

In [9]:
# Loadings for the first principal component
pc1_loadings = pca.components_[0]

# Create a DataFrame for better readability
pc1_df = pd.DataFrame({'ETF': X.columns, 'PC1 Loadings': pc1_loadings})

# Sort the loadings by absolute value for better interpretation
pc1_df_sorted = pc1_df.reindex(pc1_df['PC1 Loadings'].abs().sort_values(ascending=False).index)

pc1_df_sorted


Unnamed: 0,ETF,PC1 Loadings
3,XLI,0.365522
8,XLY,0.356696
4,XLK,0.351669
0,XLB,0.348404
7,XLV,0.325643
5,XLP,0.324077
2,XLF,0.322089
1,XLE,0.315809
6,XLU,0.282062


Intuitive Interpretation:
- Positive Loadings Across All ETFs: Since all the loadings for the ETFs are positive, it suggests that the first principal component (PC1) represents a common market-wide factor influencing all sectors. This factor could be driven by broad economic conditions or overall market sentiment.
- Key Contributors: The sectors with the highest loadings on PC1 (e.g., XLI, XLY, XLK) contribute most to this common trend. These sectors—industrials (XLI), consumer discretionary (XLY), and technology (XLK)—are typically sensitive to the broader economic environment, meaning they are heavily influenced by macroeconomic factors such as economic growth or downturns.
- Lower Loadings: The sectors with lower loadings, such as utilities (XLU), still move with the market but are less influenced by the same factor compared to other sectors. Utilities tend to be more defensive and less volatile, which could explain their lower contribution to the common trend.

The first principal component explains 72.19% of the total variance, indicating that a significant portion of the movement in these sector ETFs is driven by a common market-wide factor. This suggests that despite sector differences, most ETFs in the dataset move in sync, largely responding to broader market trends rather than sector-specific factors.