## 🌸 PCA Workflow on Iris Dataset

This section explains step by step how to apply PCA to the Iris dataset.

---

### 1. Standardize the Data
PCA is scale-sensitive. Before applying it, we standardize features so that each has mean = 0 and standard deviation = 1.  
This ensures all features contribute equally.

In [17]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn import datasets

# Load data
iris = datasets.load_iris()
X = iris.data
Xs = StandardScaler().fit_transform(X)

### 2. Fit PCA with All Components
We first fit PCA keeping all components (equal to the number of original features).  
- This gives us **eigenvalues** (variance explained by each component).  
- And **eigenvectors** (directions of maximum variance, i.e., principal components).

In [19]:
from sklearn.decomposition import PCA
pca_full = PCA()
pca_full.fit(Xs)

print("Explained variance ratio:", pca_full.explained_variance_ratio_)
print("Cumulative variance:", pca_full.explained_variance_ratio_.cumsum())


Explained variance ratio: [0.72962445 0.22850762 0.03668922 0.00517871]
Cumulative variance: [0.72962445 0.95813207 0.99482129 1.        ]


### 3. Inspect Explained Variance
- **Explained variance ratio** shows how much variance each PC captures.  
- **Cumulative explained variance** tells us how many PCs are needed to retain most of the information.  
- This is usually visualized in a **scree plot** (elbow-shaped curve).  

---

### 4. Decide Number of Components
- Pick a fixed number of PCs (e.g., 2 for visualization).  
- Or choose a variance threshold (e.g., keep enough PCs to explain 95% of variance).  
- Example: In Iris, the first **2 PCs** usually capture most of the variance.

In [21]:
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(Xs)

# Option B: Automatic based on variance threshold
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(Xs)

print("Reduced shape:", X_reduced.shape)

Reduced shape: (150, 2)


### 5. Refit PCA and Transform Data
Once the number of components is chosen:
- Refit PCA with that number.  
- Apply `.fit_transform()` to project the original data into the reduced space.  
- Result: A new dataset with fewer features (PC1, PC2, …) that captures most of the structure.  
