In [8]:
# DataFrame with the engineered features is imported, along with necessary modules.

import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import QuantileTransformer, StandardScaler

model_df = pd.read_csv('../Data/model_data.csv', index_col = 0)
X = model_df.drop(['industry', 'office'], axis = 1)
df_columns = X.columns

Two scaling processes with default parameters will be compared in regards to how they interact with PCA.
StandardScaler is a straightforward method but can be more vulnerable to outliers.
QuantileTransformer is a more thorough method that tends to punish outliers to a greater extent than other scalers.

In [13]:
# The features appear to have a more linear distribution in terms of explained variance. This does not necessarily mean that 
# this better reflects reality, however.

scaler = StandardScaler()
X_standard = scaler.fit_transform(X)
pca_df = pd.DataFrame(X_standard, columns = df_columns)

pca = PCA(n_components = 8, svd_solver = 'full')

pca.fit(pca_df)

print(pca.explained_variance_ratio_)
print(pca.feature_names_in_)

[0.20647195 0.20179142 0.1470405  0.12501594 0.12324258 0.08883371
 0.05997031 0.04763358]
['current_ratio' 'operating_cash_flow' 'debt_to_equity'
 'interest_coverage' 'operating_margin' 'return_on_assets'
 'return_on_equity' 'has_interest_payments']


In [14]:
# With this scaler, the components with greater influence are magnified, and those with less impact are shrunk.
# Current ratio in particular is given a very high weighting, followed distantly by operating cash flow and debt to equity.

scaler_quantile = QuantileTransformer()
X_quantile = scaler_quantile.fit_transform(X)
pca_df = pd.DataFrame(X_quantile, columns = df_columns)

pca = PCA(n_components = 8, svd_solver = 'full')

pca.fit(pca_df)

print(pca.explained_variance_ratio_)
print(pca.feature_names_in_)

[0.42502874 0.19123835 0.13394252 0.0999914  0.0791696  0.04079046
 0.01779352 0.01204541]
['current_ratio' 'operating_cash_flow' 'debt_to_equity'
 'interest_coverage' 'operating_margin' 'return_on_assets'
 'return_on_equity' 'has_interest_payments']


Quantile Transformer results in a more decisive PCA, which will ultimately be important when attempting to correctly classify companies using a large set of industries. For that reason I will be using it in my final pipeline.