---
title: "Iris Flower Dataset Analysis"
author: "Azizbek Ganiev"
date: today
format:
  html:
    toc: true
    toc-depth: 2
    toc-expand: 2
    toc-title: "Contents"
    toc-location: body
    smooth-scroll: true
    theme: superhero
    code-fold: true
    code-summary: "Show/hide code"
    embed-resources: true
number-sections: true
number-depth: 2
execute:
  echo: true
  warning: false
  freeze: auto
code-annotations: below
title-block-banner: true
---

## Introduction

The Iris flower dataset is a multivariate dataset introduced by the British statistician and biologist Ronald Fisher in his 1936 paper. It is one of the most famous databases in pattern recognition.

This dataset includes 150 samples from each of three species of Iris flowers (*Iris setosa*, *Iris virginica*, and *Iris versicolor*). Four features were measured from each sample: the lengths and the widths of the sepals and petals.


In [None]:
import pandas as pd
from sklearn.datasets import load_iris

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.head()

## Basic Statistics

In [None]:
df.describe()

## Visualize Features

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.pairplot(df, hue='species')
plt.show()

## Observations

- *Iris setosa* is linearly separable from the other two species.
- *Iris versicolor* and *Iris virginica* have overlapping measurements.
- Petal measurements are more effective for classification than sepal measurements.
