# 👩‍💻 Reducing Dimensionality with PCA: From 64 Features to 2

## 📋 Overview
In this lab, you will have the opportunity to gain hands-on experience implementing PCA to effectively reduce the dimensionality of data using the Digits dataset. By the end of this activity, you will understand how PCA selects the most informative features, reducing complex datasets into essential dimensions while preserving their core structure.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- ✅ Apply Principal Component Analysis (PCA) to reduce the dimensions of a dataset
- ✅ Analyze the explained variance to evaluate the effectiveness of PCA
- ✅ Visualize the results of PCA to interpret data clustering

## Task 1: Data Import and Preparation

**Context:** Proper data preparation is essential before applying PCA.

**Steps:**

1. Standardize the dataset to ensure that all features contribute equally to PCA, eliminating biases due to different scales.

In [None]:
# Required Imports
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

# Load the Digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Data Prepartion
# your code here...

💡 **Tip:** Use `load_digits()` to load the dataset, and `StandardScaler` to standardize the data.

⚙️ **Test Your Work:**
- Display the first 5 rows of the standardized dataset.

**Expected output:** Standardized feature values for the first 5 samples.

## Task 2: Implement PCA

**Context:** Applying PCA reduces the dimensionality of the dataset while retaining the most significant features.

**Steps:**

1. Apply PCA to transform the original dataset into two principal components.
2. Analyze the explained variance to evaluate how well these two components represent the overall variance in the data.

In [None]:
# Task 2: Implement PCA

💡 **Tip:** Use `PCA` from `sklearn.decomposition` with `n_components=2`.

⚙️ **Test Your Work:**
- Print the explained variance for the two principal components.

**Expected output:** Percentage of variance explained by each of the two components.

## Task 3: Visualize PCA Results

**Context:** Visualization helps in interpreting the results of PCA and understanding data clustering.

**Steps:**

1. Plot the reduced dimensions and visualize the clusters of digits.
2. Interpret the visualization to understand how different digits are grouped based on the principal components.

In [None]:
# Task 3: Visualize PCA Results

💡 **Tip:** Use `matplotlib` for plotting with appropriate labels and title.

⚙️ **Test Your Work:**
- Display a scatter plot of the two principal components with color coding for different digits.

**Expected output:** A visual representation showing clusters of different digits.

### ✅ Success Checklist

- Successfully loaded and standardized the dataset
- Applied PCA to reduce the dataset to two principal components
- Analyzed the explained variance of the principal components
- Visualized the PCA results to interpret data clustering
- Reflect on the PCA process and its applications

### 🔍 Common Issues & Solutions

**Problem:** Dataset not loading.  
**Solution:** Ensure the correct function `load_digits()` is used.

**Problem:** PCA implementation errors.  
**Solution:** Verify the PCA setup with the correct number of components.

**Problem:** Visualization issues.  
**Solution:** Ensure that `plt.scatter()` is correctly configured with labels and color coding.

### 🔑 Key Points

- PCA is a powerful technique for dimensionality reduction that retains the most informative features.
- Proper data standardization is crucial before applying PCA.
- Visualizing PCA results helps in understanding and interpreting data clustering.

## 💻 Exempler Solution

<details>    
<summary><strong>Click HERE to see an exemplar solution</strong></summary>    

```python
# Required Imports
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

# Load the Digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Standardize data
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_std)

# Visualize
plt.figure(figsize=(10, 6))
scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='Spectral', alpha=0.7, edgecolors='w')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Digits Dataset')
plt.colorbar(scatter)
plt.show()

# Explained variance
explained_variance = pca.explained_variance_ratio_
print(f"Explained Variance: {explained_variance}")
print(f"Total Explained Variance by {pca.n_components} components: {np.sum(explained_variance):0.4f}")
```