# Lab PCA: Principal Component Analysis (PCA) using prince on the Iris Dataset

**Lab Duration:** 3 Hours

**Level:** Intermediate

**Prerequisites:** Basic understanding of Python, Pandas, Matplotlib, and Principal Component Analysis (PCA)

## Lab Objectives
By the end of this lab, students will be able to:
1. Perform PCA using the prince library.
2. Visualize the explained variance using a Scree plot.
3. Create a Correlation Circle to interpret the relationships between original features.
4. Generate a Score Plot to visualize the data in the principal component space.

## Lab Outline
**Part 1: Introduction to PCA and the Iris Dataset (15 minutes)**
 * Overview of PCA.
 * Description of the Iris dataset.
**Part 2: Setting up the Environment and Loading Data (15 minutes)**
 * Installing required libraries.
 * Loading the Iris dataset using sklearn.
 * Preprocessing data with Pandas.
**Part 3: Performing PCA using prince (30 minutes)**
 * Initializing the prince PCA model.
 * Fitting the model to the data.
 * Extracting principal components and explained variance.
**Part 4: Visualization (60 minutes)**
 * Scree Plot: Visualizing explained variance by each principal component.
 * Correlation Circle: Understanding relationships between original features and principal components.
 * Score Plot: Visualizing the dataset in the new principal component space.
**Part 5: Interpretation and Analysis (30 minutes)**
 * Discussing the insights obtained from the plots.
 * How to interpret the variance and the position of the features in the correlation circle.
 * Analyzing the grouping of species in the score plot.
**Part 6: Conclusion and Q&A (30 minutes)**
 * Summarize key points.
 * Open the floor for questions and additional discussion.

## Part 1: Introduction to PCA and the Iris Dataset
**Time:** 15 minutes

### PCA Overview
 * Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a dataset into a set of orthogonal (uncorrelated) variables called principal components.
 * PCA is useful for reducing the complexity of data, visualizing high-dimensional datasets, and identifying patterns.

### The Iris Dataset
 * The Iris dataset consists of 150 samples of iris flowers, with 50 samples each from three species: Setosa, Versicolor, and Virginica.
 * The dataset has four features: sepal length, sepal width, petal length, and petal width.

## Part 2: Setting up the Environment and Loading Data
**Time:** 15 minutes

### Step 1: Install Required Libraries
 Install the required Python packages:
```bash
pip install prince pandas seaborn matplotlib scikit-learn
```

In [None]:
# Step 2: Load the Iris Dataset
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name='species')
y = y.map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

In [None]:
# Step 3: Data Preprocessing
# Briefly explore the dataset:
X.head()

In [None]:
# Exploratory data analysis: using a pairplot:
import seaborn as sns
import matplotlib.pyplot as plt

sns.pairplot(X)
plt.show()

## Part 3: Performing PCA using prince
**Time:** 30 minutes

### Step 1: Initialize the PCA Model
 Initialize PCA with 2 components:

In [None]:
import prince
pca = prince.PCA(n_components=2)

### Step 2: Fit the PCA Model
 Fit the PCA model to the Iris dataset:

In [None]:
pca = pca.fit(X)

### Step 3: Extract Principal Components and Explained Variance
 Transform the data to principal components:

In [None]:
principal_components = pca.transform(X)

In [None]:
# Print explained variance:
print(pca.explained_inertia_)

## Part 4: Visualization
**Time:** 60 minutes

### Step 1: Scree Plot
 **Objective:** To visualize the explained variance by each principal component.
 **Code:**

In [None]:
pca.plot_eigenvalues()
plt.show()

### Step 2: Correlation Circle
 **Objective:** To visualize how the original features correlate with the principal components.
 **Code:**

In [None]:
pca.plot_correlation_circle(axes=(0, 1))
plt.show()

### Step 3: Score Plot
 **Objective:** To visualize the data in the principal component space.
 **Code:**

In [None]:
ax = pca.plot_row_coordinates(
     X,
     ax=None,
     figsize=(6, 6),
     x_component=0,
     y_component=1,
     labels=None,
     color_labels=y,
     ellipse_outline=False,
     ellipse_fill=True,
     show_points=True
 )
plt.show()

### Step 4: Biplot
 **Objective:** To visualize the scores of individuals/data (rows) and loadings of variable/features (columns) in the new principal component space.
 **Code:

In [None]:
biplot = pca.biplot(X, y)
plt.show()

## Part 5: Interpretation and Analysis
**Time:** 30 minutes

### Scree Plot Interpretation
 Discuss how much variance is explained by the first two components and the importance of higher components.

### Correlation Circle Interpretation
 Discuss how each feature contributes to the principal components and what the circle tells us about feature correlations.

### Score Plot Interpretation
 Discuss how the different species are grouped in the principal component space and what this tells us about the separability of the species.

## Part 6: Conclusion and Q&A
**Time:** 30 minutes

### Summary
 Recap the steps of performing PCA using prince.
 Review the insights gained from each of the visualizations.

### Q&A
 Open the floor for questions.
 Discuss any challenges faced during the lab.
 Explore possible extensions or applications of PCA in other datasets.

## Additional Resources
 **Documentation:** [Prince Documentation](https://prince.readthedocs.io/en/latest/)
 **Further Reading:** Articles on PCA and dimensionality reduction techniques.