# Iris Dataset Analysis & Visualization
This notebook performs data loading, exploration, analysis, and visualization of the Iris dataset using **pandas**, **matplotlib**, and **seaborn**.

## Step 0: Save the Iris Dataset as CSV
We first save the dataset as a CSV file so it can be loaded later.

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load Iris dataset
iris = load_iris()
df_iris = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df_iris['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Save to CSV
csv_file = "iris_dataset.csv"
df_iris.to_csv(csv_file, index=False)
print(f"Iris dataset saved as '{csv_file}'")

## Step 1: Load and Explore the Dataset

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

file_path = "iris_dataset.csv"

try:
    df = pd.read_csv(file_path)
    print("Dataset loaded successfully!\n")
except FileNotFoundError:
    print(f"Error: File '{file_path}' not found.")
    exit()
except pd.errors.EmptyDataError:
    print(f"Error: File '{file_path}' is empty.")
    exit()
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    exit()

# Inspect first 5 rows
print("First 5 rows of the dataset:")
print(df.head())

# Dataset info
print("\nDataset info:")
print(df.info())

# Check for missing values
print("\nMissing values per column:")
print(df.isnull().sum())

# Fill missing values (not needed for Iris, but included for completeness)
df.fillna(method='ffill', inplace=True)

## Step 2: Basic Data Analysis

In [None]:
# Basic statistics
print("\nBasic statistics:")
print(df.describe())

# Group by species and compute mean
grouped = df.groupby('species').mean()
print("\nMean values per species:")
print(grouped)

# Observations
print("\nObservations:")
print("- Setosa species generally has smaller measurements.")
print("- Versicolor species has intermediate sizes.")
print("- Virginica species tends to have the largest measurements.")

## Step 3: Data Visualization
We will create four different plots to visualize the dataset.

In [None]:
sns.set(style="whitegrid")  # Seaborn style

# 1) Line chart: Sepal length trend across samples
plt.figure(figsize=(8,5))
plt.plot(df.index, df['sepal length (cm)'], color='blue', label='Sepal Length')
plt.title('Sepal Length Trend Across Samples')
plt.xlabel('Sample Index')
plt.ylabel('Sepal Length (cm)')
plt.legend()
plt.show()

In [None]:
# 2) Bar chart: average petal length per species
plt.figure(figsize=(6,4))
grouped['petal length (cm)'].plot(kind='bar', color=['green','orange','red'])
plt.title('Average Petal Length per Species')
plt.xlabel('Species')
plt.ylabel('Petal Length (cm)')
plt.show()

In [None]:
# 3) Histogram: distribution of petal width
plt.figure(figsize=(6,4))
plt.hist(df['petal width (cm)'], bins=10, color='purple', edgecolor='black')
plt.title('Distribution of Petal Width')
plt.xlabel('Petal Width (cm)')
plt.ylabel('Frequency')
plt.show()

In [None]:
# 4) Scatter plot: sepal length vs petal length colored by species
plt.figure(figsize=(6,4))
sns.scatterplot(data=df, x='sepal length (cm)', y='petal length (cm)', hue='species', palette='Set1')
plt.title('Sepal Length vs Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend()
plt.show()

### All tasks are completed successfully!
- Dataset loaded and inspected
- Basic analysis performed
- Four visualizations created with titles, labels, and legends
- File `iris_dataset.csv` is included for submission