In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment: Pandas & Matplotlib Data Analysis\n",
    "**Objective**: Load, explore, analyze, and visualize the Iris dataset using pandas and matplotlib.\n",
    "\n",
    "**Student Name**: [Your Name Here]\n",
    "**Date**: November 2025"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "from sklearn.datasets import load_iris\n",
    "\n",
    "# Set style for better-looking plots\n",
    "sns.set(style=\"whitegrid\", palette=\"muted\")\n",
    "plt.rcParams['figure.figsize'] = (10, 6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 1: Load and Explore the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load Iris dataset\n",
    "try:\n",
    "    iris = load_iris()\n",
    "    df = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n",
    "    df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)\n",
    "    print(\"Dataset loaded successfully!\")\n",
    "except Exception as e:\n",
    "    print(f\"Error loading dataset: {e}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display first 10 rows\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check structure and data types\n",
    "print(\"Dataset Info:\")\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check for missing values\n",
    "print(\"\\nMissing Values:\")\n",
    "print(df.isnull().sum())\n",
    "\n",
    "# No missing values in Iris dataset – already clean!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 2: Basic Data Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic statistics\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Group by species and compute mean\n",
    "print(\"Average measurements by species:\")\n",
    "grouped = df.groupby('species').mean()\n",
    "grouped"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Findings from Grouping:\n",
    "- **Virginica** has the largest average petal length and width.\n",
    "- **Setosa** has the smallest petal dimensions but largest sepal width.\n",
    "- **Versicolor** is in between – good for classification."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 3: Data Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Line Chart: Trend of Sepal Length Across Samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 5))\n",
    "for species in df['species'].unique():\n",
    "    subset = df[df['species'] == species]\n",
    "    plt.plot(subset.index, subset['sepal length (cm)'], label=species, marker='o', alpha=0.7)\n",
    "\n",
    "plt.title('Trend of Sepal Length Across Iris Samples', fontsize=16)\n",
    "plt.xlabel('Sample Index')\n",
    "plt.ylabel('Sepal Length (cm)')\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Bar Chart: Average Petal Length by Species"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8, 6))\n",
    "avg_petal = df.groupby('species')['petal length (cm)'].mean()\n",
    "avg_petal.plot(kind='bar', color=['#FF9999', '#66B2FF', '#99FF99'])\n",
    "\n",
    "plt.title('Average Petal Length by Species', fontsize=16)\n",
    "plt.ylabel('Petal Length (cm)')\n",
    "plt.xticks(rotation=0)\n",
    "for i, v in enumerate(avg_petal):\n",
    "    plt.text(i, v + 0.05, f\"{v:.2f}\", ha='center', fontweight='bold')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Histogram: Distribution of Sepal Width"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8, 6))\n",
    "plt.hist(df['sepal width (cm)'], bins=20, color='skyblue', edgecolor='black', alpha=0.7)\n",
    "plt.title('Distribution of Sepal Width', fontsize=16)\n",
    "plt.xlabel('Sepal Width (cm)')\n",
    "plt.ylabel('Frequency')\n",
    "plt.axvline(df['sepal width (cm)'].mean(), color='red', linestyle='--', label='Mean')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Scatter Plot: Petal Length vs Petal Width"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(9, 7))\n",
    "colors = {'setosa': 'red', 'versicolor': 'blue', 'virginica': 'green'}\n",
    "for species in df['species'].unique():\n",
    "    subset = df[df['species'] == species]\n",
    "    plt.scatter(subset['petal length (cm)'], subset['petal width (cm)'],\n",
    "                label=species, c=colors[species], s=80, alpha=0.8, edgecolors='k')\n",
    "\n",
    "plt.title('Petal Length vs Petal Width by Species', fontsize=16)\n",
    "plt.xlabel('Petal Length (cm)')\n",
    "plt.ylabel('Petal Width (cm)')\n",
    "plt.legend(title='Species')\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Observations & Insights\n",
    "\n",
    "1. **Clear Separation**: The scatter plot shows **three distinct clusters** – Iris species are easily separable using petal measurements.\n",
    "2. **Strong Correlation**: Petal length and width have a **very strong positive correlation**, especially for Virginica.\n",
    "3. **Setosa is Unique**: It has the smallest petals and widest sepals – biologically distinct.\n",
    "4. **Normal Distribution**: Sepal width follows a near-normal distribution centered around 3.0 cm.\n",
    "\n",
    "**Conclusion**: The Iris dataset is ideal for classification algorithms (like KNN, Decision Trees) due to clear patterns in petal dimensions.\n",
    "\n",
    "---\n",
    "**Assignment Complete** – All tasks fulfilled with error handling, cleaning, analysis, and 4 customized visualizations."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}