# Required Imports for This Notebook
- **Pandas**: Used for data handling, exploration, and manipulation.
- **Scikit-learn**: Provides tools for loading datasets and applying machine learning techniques.
- **Matplotlib**: Used for visualizing data with plots and charts.

In [None]:
import pandas as pd
import sklearn as skl
import matplotlib.pyplot as plt

# Task 1: Source the Data Set

## Importing the Iris Dataset
We will import the Iris dataset from the `sklearn.datasets` module using the `load_iris()` function.

### Understanding `load_iris()`
- The `load_iris()` function returns a dictionary-like object called a **Bunch**.
- The **Bunch** contains attributes that allow access to both the data and metadata of the dataset.
- The dataset consists of **numerical features** (sepal length, sepal width, petal length, petal width) and  **target classes** representing the species (setosa, versicolor, virginica).

## Resources
- [Scikit-learn datasets documentation](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html).


In [None]:
# Load the iris dataset
iris_dataset = skl.datasets.load_iris()

# Task 2: Explore the Data Structure

In this task, we examined the structure of the Iris dataset by performing the following steps:

- **Printed the shape of the dataset** to determine the number of samples (rows) and features (columns).
- **Displayed the first 5 rows** to get an initial view of the data.
- **Displayed the last 5 rows** to check the end of the dataset.
- **Listed the feature names** to understand the measured attributes (sepal and petal dimensions).
- **Listed the target class names** to identify the species classifications.

## Resources
- [Pandas dataframe documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

In [None]:
# Convert the dataset into a Pandas DataFrame
iris_dataframe = pd.DataFrame(iris_dataset.data, columns=iris_dataset.feature_names)

# Add the target column to the DataFrame
iris_dataframe["target"] = iris_dataset.target

# 1. Print the shape of the dataset
print("Shape of the dataset:", iris_dataframe.shape)

# 2. Print the first 5 rows of the dataset
print("First 5 rows of the dataset:")
display(iris_dataframe.head())  # Use display() in Jupyter for better formatting

# 3. Print the last 5 rows of the dataset
print("Last 5 rows of the dataset:")
display(iris_dataframe.tail())

# 4. Print the feature names (column names)
print("Feature Names:", iris_dataset.feature_names)

# 5. Print the target class names (species)
print("Target Classes:", iris_dataset.target_names)

# Task 3: Summarize the Data
For each feature calculating key statistical metrics (mean, min, max, standard deviation, and median) for the `iris_dataframe` and stores them in a new DataFrame.  
It then applies styling to align the table content and headers to the left before displaying the formatted table.

## Resources
[Pandas dataframes stats](https://pandas.pydata.org/docs/reference/frame.html#computations-descriptive-stats)

In [None]:
# Create a DataFrame to store statistics
iris_stats_dataframe = pd.DataFrame({
    "Mean": iris_dataframe.mean(),
    "Minimum": iris_dataframe.min(),
    "Maximum": iris_dataframe.max(),
    "Standard Deviation": iris_dataframe.std(),
    "Median": iris_dataframe.median()
})

## Apply table styling to align text to the left
feature_stats_styled = iris_stats_dataframe.style.set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'left')]},  # Align column headers to the left
    {'selector': 'td', 'props': [('text-align', 'left')]}   # Align table content to the left
])
# Display the statistics in a well-formatted table
display(feature_stats_styled)

# Task 4: Visualize Features

This code plots histograms for the four features of `iris_dataframe` in a 2×2 grid, setting the figure size to 12×8 inches.  
Each histogram has 20 bins, black edges, and 70% opacity, with the feature name as the title, the feature name as the x-axis label, and "Frequency" as the y-axis label.  
The layout is adjusted to prevent overlapping.

In [None]:
# Set figure size
plt.figure(figsize=(12, 8))

# Iterate through each column in the DataFrame
for i, column in enumerate(iris_dataframe.columns[:4], 1):
    # Create subplots (adjust grid size as needed)
    plt.subplot(2, 2, i)
    # Histogram with 20 bins, black edges, and 70% opacity.
    plt.hist(iris_dataframe[column], bins=20, edgecolor="black", alpha=0.7)
    # Title
    plt.title(f"Histogram of {column}")
    # X-axis label
    plt.xlabel(column)
    # Y-axis label
    plt.ylabel("Frequency")

# Adjust layout to prevent overlapping
plt.tight_layout()
plt.show()