# Lab 05 - Iris Dataset Analysis

Start by copying this lab notebook into your notebook folder, and run it step by step from there.

The Iris dataset is a well-known dataset in the machine learning community, often used for demonstrating classification algorithms and exploratory data analysis. The dataset contains measurements of four features from three different species of iris flowers.

## Description of the Dataset

The dataset includes the following columns:
- `sepal length` (in cm)
- `sepal width` (in cm)
- `petal length` (in cm)
- `petal width` (in cm)
- `species` (Iris-setosa, Iris-versicolor, Iris-virginica)

The aim is often to classify the iris species based on the given measurements.

## Loading the Dataset

We will begin by loading the dataset and taking a look at its structure.

In [None]:
import pandas as pd
from sklearn.datasets import load_iris
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

# Load Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = iris.target

# Map target numbers to actual species names
species_map = {0: 'Iris-setosa', 1: 'Iris-versicolor', 2: 'Iris-virginica'}
iris_df['species'] = iris_df['species'].map(species_map)

# Display the first few rows of the dataset
iris_df.head()

## Summary Statistics

We will now look at the summary statistics of numerical columns in the dataset.


In [None]:
# Summary statistics
iris_df.describe()

## Data Visualization

Visualizing the data is often very helpful in understanding the distribution and difference between the species based on various features. We will start by plotting pairplots using `seaborn`.

In [None]:
# Pairplot of the dataset
sns.pairplot(iris_df, hue='species')
plt.show()

## 3D Scatter Plot using Plotly

Finally, we will create a 3D scatter plot to better visualize the spread of each species across three selected features: sepal length, sepal width, and petal length.

In [None]:
# 3D Scatter plot
fig = px.scatter_3d(iris_df, 
                    x='sepal length (cm)', 
                    y='sepal width (cm)', 
                    z='petal length (cm)', 
                    color='species',
                    title='3D Scatter plot of the Iris Dataset')
fig.show()


## Wrap up

This notebook provides an overview and initial exploratory analysis of the Iris dataset. We examined the structure and summary statistics of the data, and visualized it using pairplots and a 3D scatter plot to help understand the relationships between the different species and their features.

Update your Overleaf.

In particular note how many dimensions the iris dataset has.