# Iris Dataset Data Exploration

**Author:** Data Science Essentials Project  
**Date:** September 19, 2025  
**Purpose:** Basic data loading and exploration of the Iris dataset

This notebook demonstrates how to use our custom `PandasDataReader` class to load and explore the famous Iris dataset.

## Prerequisites

**Before running this notebook**, make sure you have set up the project structure and downloaded the required data:

```bash
# From the project root directory
python setup.py
```

This will:
- Create the `data/` directory structure  
- Download the Iris dataset to `data/raw/iris.csv`
- Install required dependencies

---

## 1. Setup and Data Loading

In [4]:
# Add project root to Python path
import sys
import os
sys.path.append('../../')

from source.data_readers import PandasDataReader

# Check if iris.csv exists, if not provide helpful message
iris_path = '../../data/raw/iris.csv'
if not os.path.exists(iris_path):
    print("Data file not found!")
    print("It looks like the project hasn't been set up yet.")
    print()
    print("To download the required data files, run:")
    print("   cd ../../")
    print("   python setup.py")
    print()
    print("This will create the data/ directory structure and download iris.csv")
    raise FileNotFoundError(f"Data file not found: {iris_path}")

# Load Iris dataset from data/raw/ directory
data_reader = PandasDataReader(
    file_path=iris_path,
    separator=',',
    decimal='.',
    header=False,
    names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'target']
)

Data file not found!
It looks like the project hasn't been set up yet.

To download the required data files, run:
   cd ../../
   python setup.py

This will create the data/ directory structure and download iris.csv


FileNotFoundError: Data file not found: ../../data/raw/iris.csv

## 2. Basic Data Exploration

In [None]:
# Display first 5 rows of the dataset
data_reader.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [None]:
# Display last 5 rows of the dataset
data_reader.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [None]:
# Display first 2 rows of the dataset
data_reader.head(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa


In [None]:
# Display column names
data_reader.df.columns.tolist()

['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'target']

In [None]:
# Generate descriptive statistics
data_reader.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5
