# Goal
Create a notebook that loads the iris dataset and generates large charts from it.

## Load the dataset

### Subtask:
Load the Iris dataset using scikit-learn.

**Reasoning**:
Import the necessary function and load the iris dataset.


In [1]:
from sklearn.datasets import load_iris

iris_data = load_iris()

print(iris_data.keys())

## Create visualizations

### Subtask:
Generate various visualizations of the Iris dataset, adjusting plot parameters to make them large.

**Reasoning**:
Create a pandas DataFrame from the iris data and add the species column, then generate the requested large scatter plots and a large pair plot to visualize the data by species.


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Create a pandas DataFrame
df = pd.DataFrame(iris_data['data'], columns=iris_data['feature_names'])

# 2. Add a new column named 'species'
df['species'] = iris_data['target']
df['species'] = df['species'].map({i: name for i, name in enumerate(iris_data['target_names'])})

# 3. Generate a large scatter plot of 'sepal length (cm)' vs 'sepal width (cm)'
plt.figure(figsize=(10, 8))
sns.scatterplot(data=df, x='sepal length (cm)', y='sepal width (cm)', hue='species')
plt.title('Sepal Length vs Sepal Width by Species')
plt.show()

# 4. Generate a large scatter plot of 'petal length (cm)' vs 'petal width (cm)'
plt.figure(figsize=(10, 8))
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)', hue='species')
plt.title('Petal Length vs Petal Width by Species')
plt.show()

# 5. Generate a large pair plot of the entire dataset
sns.pairplot(df, hue='species', height=2.5) # height controls the size of each subplot, total size is height * number_of_features
plt.suptitle('Pair Plot of Iris Dataset by Species', y=1.02) # Add a title above the pair plot
plt.show()

## Display the charts

### Subtask:
Display the generated charts.

## Summary

### Data Analysis Key Findings

*   The Iris dataset was successfully loaded using scikit-learn.
*   A pandas DataFrame was created from the dataset, including a 'species' column mapped to the target names.
*   Large scatter plots were generated visualizing the relationship between sepal length and width, and petal length and width, differentiated by species.
*   A large pair plot of the entire dataset was generated, showing relationships between all features, colored by species.

### Insights or Next Steps

*   The visualizations clearly show distinct groupings for different Iris species based on petal measurements, which appear to be stronger discriminators than sepal measurements.
*   Further analysis could involve building a classification model using these features to predict the species of an Iris flower.

## Final Statistics

### Subtask:
Display summary statistics of the Iris dataset.

**Reasoning**:
Use the `describe()` method to display the summary statistics of the dataframe.

In [4]:
display(df.describe())