In this space we will implement Principal Component Analysis to reduce complexity.
We start by importing all libraries that we will use.

In [1]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

We now load in our .csv file

In [2]:
data = pd.read_csv('simulated_health_wellness_data.csv')

Now that we have the dataset to work with, we set the features to be the data columns. We can now standardize the data columns 

In [3]:
features = data.columns  
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data[features])

We now have to set our amount of components for the PCA. We have the PCA ran on the scaled_data from above.

In [4]:
n_components = 2  
pca = PCA(n_components=n_components)
principal_components = pca.fit_transform(scaled_data)

We now have to create a DataFrame with the principal components. Inside of columns code is to automatically designate columns as PC(whatever number column), this serves to always add 1 to the next column over.

In [5]:
principal_df = pd.DataFrame(data=principal_components, columns=[f'PC{i+1}' for i in range(n_components)])

At this point we want to see how well our model explains the total variance captured in the principal components. To do this we use the code below.

In [6]:
print(f'Explained variance ratio: {pca.explained_variance_ratio_}')

Explained variance ratio: [0.23691549 0.22082517]


We now want to save our PCA to a new .csv file.

In [7]:
principal_df.to_csv('principal_components.csv', index=False)