# Task 1: Exploring and Visualizing the Iris Dataset

#### Problem Statement and Goal

The goal of this task is to explore the classic Iris dataset, perform data visualization to understand the relationships between its features, and gain insights into how different species of Iris flowers differ from one another.
This analysis will also prepare the ground for future modeling tasks (classification).

## Step 1: Import Required Libraries

In [None]:
# Importing required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Set default theme for plots
# sns.set(style='whitegrid')
sns.set_theme(style="darkgrid")

## Step 2: Load the Dataset

The Iris dataset is a built-in dataset available in the Seaborn library.

In [None]:
# Load the Iris dataset
iris = sns.load_dataset('iris')

## Step 3: Inspect the Dataset

We’ll check the dataset’s shape, column names, and the first few rows.

In [None]:
print("Dataset Shape:", iris.shape)
print("\nColumn Names:", iris.columns.tolist())
print("\nFirst 5 Rows:")
display(iris.head())


Observation:

The dataset has 150 rows and 5 columns.

The features include sepal and petal measurements, and the target column species

## Step 4: Summary Information

In [None]:
print("\n--- Dataset Info ---")
print(iris.info())

print("\n--- Statistical Summary ---")
display(iris.describe())


Insights:

All features are numeric.

No missing values are present.

Feature scales vary slightly, which may affect models if we proceed to training later.

## Step 5: Data Visualization

### (a) Scatter Plot — Relationship Between Features

We’ll explore how sepal length and petal length vary among species.

In [None]:
plt.figure(figsize=(8,6))
sns.scatterplot(
    x='sepal_length',
    y='petal_length',
    hue='species',
    data=iris,
    palette='viridis'
)
plt.title('Scatter Plot: Sepal Length vs Petal Length')
plt.show()


Interpretation:

Clear separation is visible — especially setosa, which forms a distinct cluster from versicolor and virginica.

### (b) Histograms — Distribution of Feature Values

In [None]:
iris.hist(figsize=(10,8), bins=15, color='skyblue', edgecolor='black')
plt.suptitle('Histograms of Iris Features')
plt.tight_layout()
plt.show()


Interpretation:

Most features show normal-like distributions. Petal length and petal width show distinct clusters corresponding to species.

### (c) Box Plots — Identifying Outliers

In [None]:
plt.figure(figsize=(10,6))
sns.boxenplot(data=iris.drop(columns=['species']))
plt.title('Box Plots of Iris Features')
plt.show()

Interpretation:

Box plots show that setosa typically has smaller petal measurements, and there are no significant outliers in the dataset.

### (d) Pairplot — Combined Relationships and Distributions

In [None]:
sns.pairplot(iris, hue='species', palette='husl')
plt.suptitle('Pairplot of Iris Dataset', y=1.02)
plt.show()


Interpretation:

The pairplot reveals that petal_length and petal_width provide the clearest separation between species.

## Step 6: Insights and Conclusion

#### Key Takeaways:

1. The dataset is clean — no missing values or outliers.

2. Petal length and width are the most discriminative features for identifying species.

3. Setosa is easily separable from the other two species.

4. This exploratory analysis provides a solid foundation for building a classification model in future tasks.