[kaggle notebook]( https://www.kaggle.com/code/mrtaiech/pca-1-demo)

### Finding the optimum number of Principle components

explained_variance_ → absolute variance (eigenvalues)

explained_variance_ratio_ → percentage contribution

## Steps 

1) Fit PCA without fixing components

```python
from sklearn.decomposition import PCA
import numpy as np

pca = PCA()
pca.fit(X_tr
```

2) Get explained variance ratio

```python
explained_var = pca.explained_variance_ratio_```
3) Compute cumulative sum

```python
cumulative_var = np.cumsum(explained_var)```

4) Find number of components for 90%
```python
n_components_90 = np.argmax(cumulative_var >= 0.90) + 1
print("Components needed for ~90% variance:", n_components_90)```

in)



## when PCA does not work


Principal Component Analysis (PCA) is a powerful dimensionality reduction technique, but it has several limitations. PCA may not be suitable in the following scenarios:

### 1. Non-linear relationships in data
PCA is a linear transformation method. If the underlying structure of the data is non-linear (e.g., circular or spiral patterns), PCA fails to capture meaningful structure.

**Alternative:** Kernel PCA, t-SNE, UMAP

---

### 2. When class separation is important
PCA is an unsupervised technique and does not consider class labels. High-variance directions may not correspond to directions that best separate classes.

**Alternative:** Linear Discriminant Analysis (LDA)

---

### 3. When features are not scaled
PCA is sensitive to the scale of features. If features are not standardized, variables with larger scales dominate the principal components.

**Solution:** Apply standardization before PCA.

---

### 4. When interpretability is required
Principal components are linear combinations of original features, making them difficult to interpret and explain to stakeholders.

**Alternative:** Feature selection methods

---

### 5. Small dataset with high dimensionality
When the number of samples is small compared to the number of features, PCA can produce unstable and noisy components due to poor covariance estimation.

---

### 6. When variance does not represent useful information
PCA assumes that higher variance corresponds to more information, which may not always be true. Noise can have high variance, while important signals may have low variance.

---

### 7. Sparse or categorical data
PCA performs poorly on sparse matrices and categorical features (e.g., one-hot encoded data).

**Alternative:** Truncated SVD for sparse data

---

### Summary
PCA is most effective for dense, numerical, and linearly correlated data. It should be avoided when data is non-linear, labels are crucial, interpretability matters, or variance does not reflect true information.
