# Iris Dataset Data Exploration

**Author:** Data Science Essentials Project  
**Date:** September 22, 2025  
**Purpose:** Basic data loading and exploration of the Iris dataset

This notebook demonstrates how to use our custom `PandasSource` class to load and explore the famous Iris dataset.

## Prerequisites

**Before running this notebook**, make sure you have set up the project structure and downloaded the required data:

```bash
# From the project root directory
python setup.py
```

This will:
- Create the `data/` directory structure  
- Download the Iris dataset to `data/raw/iris.csv`
- Install required dependencies

---

## 1. Setup and Data Loading

In [None]:
# Add project root to Python path
import sys
import os
sys.path.append('../../')

from src.data.sources import PandasSource

# Check if iris.csv exists, if not provide helpful message
iris_path = '../../data/raw/iris.csv'
if not os.path.exists(iris_path):
    print("Data file not found!")
    print("It looks like the project hasn't been set up yet.")
    print()
    print("To download the required data files, run:")
    print("   cd ../../")
    print("   python setup.py")
    print()
    print("This will create the data/ directory structure and download iris.csv")
    raise FileNotFoundError(f"Data file not found: {iris_path}")

# Load Iris dataset from data/raw/ directory
data_source = PandasSource(
    file_path=iris_path,
    separator=',',
    header=False,
    names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'target']
)

## 2. Basic Data Exploration

In [None]:
# Display first 5 rows of the dataset
data_source.head()

In [None]:
# Display last 5 rows of the dataset
data_source.tail()

In [None]:
# Display first 2 rows of the dataset
data_source.head(2)

In [None]:
# Display column names
data_source.df.columns.tolist()

In [None]:
# Generate descriptive statistics
data_source.describe()

## 3. Metadata Information

Explore metadata information about the dataset using the new PandasSource API.

In [None]:
# Display dataset metadata
data_source.metadata