# Exploratory Data Analysis (EDA) on the Iris Dataset

## Background

The Iris dataset is one of the most famous datasets used for practicing data analysis and machine learning. It contains measurements of iris flowers of three different species. The dataset includes features like sepal length, sepal width, petal length, and petal width. This exercise focuses on exploring this dataset to find patterns and insights.

## Dataset Overview

The Iris dataset typically includes the following columns:

- SepalLengthCm: Length of the sepal in centimeters.
- SepalWidthCm: Width of the sepal in centimeters.
- PetalLengthCm: Length of the petal in centimeters.
- PetalWidthCm: Width of the petal in centimeters.
- Species: Species of the iris flower (Setosa, Versicolour, Virginica).


## 1. Data Loading
Load the Iris dataset into a DataFrame using a data analysis library like pandas in Python.

In [1]:
import pandas as pd # for data manipulation
from sklearn.datasets import load_iris # load the iris dataset

In [2]:
# set df to the iris dataset
df = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names)

## 2. Data Inspection

- Use methods like .head(), .info(), and .describe() to get an overview of the dataset.
- Identify the unique species present in the dataset and their distribution.

In [3]:
# check the first 5 rows of the dataset
display(df.head())

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [4]:
# check dataset info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB


In [5]:
# describe the dataset
df.describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5
