# Experiment 4: Understanding the Concept and Applications of PCA with Scikit-Learn

## Aim
To understand the concept of Principal Component Analysis (PCA) and apply it using Scikit-Learn for dimensionality reduction.

## Objectives
- Learn the theoretical foundation of PCA.
- Perform PCA on a high-dimensional dataset.
- Visualize the results and understand the reduction in dimensionality.

## Tools Used
- **Scikit-Learn**: For PCA implementation.
- **Pandas** and **NumPy**: For data manipulation.
- **Matplotlib** and **Seaborn**: For visualizations.

## Implementation

### Step 1: Import Libraries
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
```

### Step 2: Load and Explore the Dataset
```python
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names

# Create a DataFrame for better readability
df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y

# Display the first few rows
print("Original Dataset:")
print(df.head())

# Visualize the pairplot of the original features
sns.pairplot(df, hue='Target', corner=True)
plt.show()
```

### Step 3: Standardize the Data
```python
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("\nStandardized Features:")
print(pd.DataFrame(X_scaled, columns=feature_names).head())
```

### Step 4: Apply PCA
```python
# Apply PCA to reduce dimensions
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Display explained variance ratio
print("\nExplained Variance Ratio:")
print(pca.explained_variance_ratio_)

# Create a DataFrame for PCA results
pca_df = pd.DataFrame(X_pca, columns=['Principal Component 1', 'Principal Component 2'])
pca_df['Target'] = y

print("\nPCA Transformed Dataset:")
print(pca_df.head())
```

### Step 5: Visualize the Results
```python
# Scatter plot of the first two principal components
plt.figure(figsize=(10, 6))
sns.scatterplot(
    x='Principal Component 1',
    y='Principal Component 2',
    hue='Target',
    palette='Set1',
    data=pca_df
)
plt.title("PCA Result: First Two Principal Components")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend(title='Target', labels=iris.target_names)
plt.show()
```

### Step 6: Evaluate PCA Impact
```python
# Print the cumulative explained variance ratio
cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
print("\nCumulative Explained Variance Ratio:")
print(cumulative_variance)

# Plot cumulative explained variance
plt.figure(figsize=(8, 4))
plt.plot(range(1, len(cumulative_variance)+1), cumulative_variance, marker='o', linestyle='--')
plt.title("Cumulative Explained Variance")
plt.xlabel("Number of Principal Components")
plt.ylabel("Cumulative Explained Variance")
plt.grid()
plt.show()
```

### Step 7: Summary and Observations
```python
print("\nSummary:")
print("1. PCA reduced the dimensionality of the Iris dataset from 4 to 2 features while retaining most of the variance.")
print("2. The first two principal components explained {:.2f}% of the total variance.".format(cumulative_variance[1] * 100))
print("3. PCA helped visualize the high-dimensional data in 2D, enabling easier interpretation and analysis.")
