# Jupyter Notebook Using the Iris Dataset Example

In [None]:
# Task 1: Load and Explore the Dataset

In this section, we load the Iris dataset using `sklearn.datasets`. Then, we preview the dataset and explore its structure including data types and missing values.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load the Iris dataset
try:
    iris = load_iris(as_frame=True)
    df = iris.frame
    print("✅ Dataset loaded successfully.")
except Exception as e:
    print("❌ Error loading dataset:", e)


## Preview the dataset

Let's look at the first few rows to understand the structure and content.


In [None]:
df.head()


## Check data structure and missing values

We inspect the data types and check for any null values. This helps us determine if we need to clean the data.


In [None]:
df.info()
print("\nMissing values:\n", df.isnull().sum())


# Task 2: Basic Data Analysis

We compute basic descriptive statistics and analyze grouped summaries to better understand the dataset.


In [None]:
# Basic statistics
df.describe()


## Grouping by species

We group the data by the target variable (species) and compute the mean of each numerical column.


In [None]:
# Group by species and compute mean
df['species'] = df['target'].map(dict(enumerate(iris.target_names)))
df.groupby('species').mean()


# Task 3: Data Visualization

Now we create visualizations to better understand patterns and relationships in the data. We’ll create 4 different types of charts.


## 1. Line Chart - Average Sepal Length per Species

A simple line plot to show average sepal length for each species.


In [None]:
sns.set(style="whitegrid")
avg_sepal = df.groupby('species')['sepal length (cm)'].mean()
avg_sepal.plot(kind='line', marker='o', title="Average Sepal Length per Species")
plt.xlabel('Species')
plt.ylabel('Sepal Length (cm)')
plt.grid(True)
plt.show()


## 2. Bar Chart - Average Petal Length per Species

This bar chart compares the average petal length across species.


In [None]:
avg_petal = df.groupby('species')['petal length (cm)'].mean()
avg_petal.plot(kind='bar', color='coral', title="Average Petal Length per Species")
plt.ylabel('Petal Length (cm)')
plt.xlabel('Species')
plt.tight_layout()
plt.show()


## 3. Histogram - Distribution of Sepal Width

This histogram helps us understand the distribution of sepal width values.


In [None]:
plt.hist(df['sepal width (cm)'], bins=15, color='skyblue', edgecolor='black')
plt.title("Histogram of Sepal Width")
plt.xlabel("Sepal Width (cm)")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()


## 4. Scatter Plot - Sepal Length vs Petal Length

This scatter plot shows the relationship between sepal length and petal length, colored by species.


In [None]:
sns.scatterplot(data=df, x='sepal length (cm)', y='petal length (cm)', hue='species')
plt.title("Sepal Length vs Petal Length by Species")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.legend(title="Species")
plt.show()


# Findings and Observations

Based on our analysis and visualizations, we summarize the main insights from the Iris dataset.


In [None]:
print("""
🔎 Findings:
- Setosa has shorter petal lengths and sepal lengths compared to other species.
- Versicolor and Virginica have overlapping distributions but Virginica tends to have longer petals.
- Petal length and sepal length have a positive correlation.
- Sepal width distribution appears approximately normal.
""")
