## Introduction

In this notebook, I explore the Iris dataset using Python.

The goal is to load the dataset, understand its structure, and perform basic exploratory data analysis (EDA).

This includes generating summary statistics and visualizing the data using scatter plots, histograms, and box plots.

These steps help identify relationships between features, understand data distribution, and detect any patterns or outliers.

## DATA EXPLORATION

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = sns.load_dataset('iris')

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

## DATA VISUALIZATION

In [None]:
sns.set_style('whitegrid')

In [None]:
# Scatter Plot using Subplots

fig, axs = plt.subplots(1, 2, figsize=(10, 5))

sns.scatterplot(data=df, x='sepal_length', y='sepal_width', hue='species', ax=axs[0])
axs[0].set_title('Relation between Sepal Length and Width')
axs[0].set_xlabel('Sepal Length')
axs[0].set_ylabel('Sepal Width')

sns.scatterplot(data=df, x='petal_length', y='petal_width', hue='species', ax=axs[1])
axs[1].set_title('Relation between Petal Length and Width')
axs[1].set_xlabel('Petal Length')
axs[1].set_ylabel('Petal Width')

plt.suptitle('Scatter Plot between Lengths and Widths')

axs[1].legend(title = 'Species', loc = 'lower right')


plt.tight_layout()
plt.show()


In [None]:
# Histograms for all 4 numeric columns

fig, axs = plt.subplots(2, 2, figsize=(10, 8))

sns.histplot(data=df, x='sepal_length', hue='species', kde=True, ax=axs[0, 0])
axs[0,0].set_title('Distribution of Sepal Length')

sns.histplot(data=df, x='sepal_width', hue='species', kde=True, ax=axs[0, 1])
axs[0,1].set_title('Distribution of Sepal Width')

sns.histplot(data=df, x='petal_length', hue='species', kde=True, ax=axs[1, 0])
axs[1,0].set_title('Distribution of Petal Length')

sns.histplot(data=df, x='sepal_width', hue='species', kde=True, ax=axs[1, 1])
axs[1, 1].set_title('Distribution of Petal Width')

plt.suptitle('Data Distribution using Histogram')
plt.tight_layout()
plt.show()


In [None]:
# BoxPlot of Length and Width for each Species

pallete = sns.color_palette('Set2', 3)

fig, axs = plt.subplots(2, 2, figsize=(10, 8))

sns.boxplot(data=df, x = 'species', y = 'sepal_length',hue = 'species', legend=False, ax=axs[0,0], palette=pallete)
axs[0,0].set_title('Sepal Length for each Species')

sns.boxplot(data=df, x = 'species', y = 'sepal_width',hue = 'species', legend=False, ax=axs[0,1], palette=pallete)
axs[0,1].set_title('Sepal Width for each Species')

sns.boxplot(data=df, x = 'species', y = 'petal_length',hue = 'species', legend=False, ax=axs[1,0], palette=pallete)
axs[1,0].set_title('Petal Length for each Species')

sns.boxplot(data=df, x = 'species', y = 'petal_width',hue = 'species', legend=False, ax=axs[1,1], palette=pallete)
axs[1,1].set_title('Petal Width for each Species')

plt.suptitle('Boxplot of Length and Width for each Species')

plt.tight_layout()
plt.show()

In [None]:
# Pairwise relationship

sns.pairplot(df, hue='species')

plt.show()