### PCA

Principal Component Analysis (PCA) is a dimensionality reduction technique widely used in machine learning and statistics. It aims to transform a dataset into a new coordinate system in which the data's variability is maximized along the new axes. PCA achieves this by identifying the principal components, which are the directions in the feature space along which the data varies the most.

In [1]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create a DataFrame for better visualization
df = pd.DataFrame(data=X, columns=[f'Feature_{i+1}' for i in range(X.shape[1])])

# Standardize the data (optional)
# X = (X - X.mean(axis=0)) / X.std(axis=0)

# Create a PCA instance
pca = PCA(n_components=2)  # Choose the number of components

# Fit and transform the data
X_pca = pca.fit_transform(X)

# Add PCA components to DataFrame
df['PCA_Component_1'] = X_pca[:, 0]
df['PCA_Component_2'] = X_pca[:, 1]

# Display the DataFrame with PCA components
print(df[['PCA_Component_1', 'PCA_Component_2']])


     PCA_Component_1  PCA_Component_2
0          -2.684126         0.319397
1          -2.714142        -0.177001
2          -2.888991        -0.144949
3          -2.745343        -0.318299
4          -2.728717         0.326755
..               ...              ...
145         1.944110         0.187532
146         1.527167        -0.375317
147         1.764346         0.078859
148         1.900942         0.116628
149         1.390189        -0.282661

[150 rows x 2 columns]
