# Exploratory Data Analysis (EDA) on Iris Dataset
**Author**: Elen Tesfai  
**Date**: [September 14, 2024]  
**Purpose**: This notebook contains exploratory data analysis on the Iris dataset. The aim is to visualize and understand the distribution and relationships within the dataset.

## Importing Dependencies

## Overview of Steps
1. **Importing Dependencies**  
   - Load necessary libraries.

2. **Data Acquisition**  
   - Load the Iris dataset and display initial rows.

3. **Initial Data Inspection**  
   - Examine the shape and data types.

4. **Descriptive Statistics**  
   - Calculate summary statistics.

5. **Data Visualization**  
   - Visualize distributions and relationships.

6. **Data Cleaning**  
   - Identify and handle missing values if needed.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

# Step 2: Data Acquisition
def load_data():
    iris = load_iris()
    iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
    iris_df['species'] = iris.target
    iris_df['species'] = iris_df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
    return iris_df

# Step 3: Initial Data Inspection
def inspect_data(df):
    print(df.head())
    print("Shape:", df.shape)
    print("Data Types:\n", df.dtypes)

# Step 4: Descriptive Statistics
def descriptive_statistics(df):
    print("Summary Statistics:\n", df.describe())

# Step 5: Data Visualization
def visualize_data(df):
    sns.pairplot(df, hue='species')
    plt.show()

# Step 6: Data Cleaning
def check_missing_values(df):
    print("Missing Values:\n", df.isnull().sum())

# Step 7: Data Transformation
def transform_data(df):
    # Example: Create a new column for sepal area
    df['sepal_area'] = df['sepal length (cm)'] * df['sepal width (cm)']
    return df

# Update the main execution flow
if __name__ == "__main__":
    iris_df = load_data()
    inspect_data(iris_df)
    descriptive_statistics(iris_df)
    visualize_data(iris_df)
    check_missing_values(iris_df)
    iris_df = transform_data(iris_df)  # Apply the transformation
    final_visualizations(iris_df)

# Step 8: Final Transformations and Visualizations

def final_visualizations(df):
    # Example: Create a new column for sepal area if not done in Step 7
    if 'sepal_area' not in df.columns:
        df['sepal_area'] = df['sepal length (cm)'] * df['sepal width (cm)']
    
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=df, x='species', y='sepal area')
    plt.title('Sepal Area by Species')
    plt.show()