# 📊 Data Analysis and Visualization Assignment

## Objective
- Load and analyze a dataset using **pandas**.
- Perform simple data exploration and cleaning.
- Create plots using **matplotlib** and **seaborn**.
- Document findings and observations.


## Task 1: Load and Explore the Dataset
We will use the **Iris dataset**, a classic dataset for classification problems.


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Make plots look nicer
sns.set(style="whitegrid")

# Load dataset
iris = load_iris(as_frame=True)
df = iris.frame
df.head()

ModuleNotFoundError: No module named 'sklearn'

In [None]:
# Inspect dataset structure
print("\nDataset Info:")
print(df.info())

print("\nMissing values:")
print(df.isnull().sum())

# Drop missing values if any
df = df.dropna()

## Task 2: Basic Data Analysis
We compute summary statistics and perform a grouping operation.


In [None]:
# Basic statistics
df.describe()

In [None]:
# Grouping: mean petal length per species
grouped = df.groupby("target")["petal length (cm)"].mean()
grouped

## Task 3: Data Visualization
We will create four different plots:
1. Line chart
2. Bar chart
3. Histogram
4. Scatter plot


In [None]:
# 1. Line chart - Cumulative petal length
df["cumulative_petal_length"] = df["petal length (cm)"].cumsum()
plt.figure(figsize=(8,5))
plt.plot(df.index, df["cumulative_petal_length"], label="Cumulative Petal Length")
plt.title("Line Chart - Cumulative Petal Length")
plt.xlabel("Index")
plt.ylabel("Cumulative Petal Length")
plt.legend()
plt.show()

In [None]:
# 2. Bar chart - Average petal length per species
plt.figure(figsize=(8,5))
grouped.plot(kind="bar", color=["skyblue", "lightgreen", "salmon"])
plt.title("Bar Chart - Average Petal Length per Species")
plt.xlabel("Species")
plt.ylabel("Average Petal Length")
plt.show()

In [None]:
# 3. Histogram - Sepal Length Distribution
plt.figure(figsize=(8,5))
plt.hist(df["sepal length (cm)"], bins=15, color="purple", edgecolor="black")
plt.title("Histogram - Sepal Length Distribution")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Frequency")
plt.show()

In [None]:
# 4. Scatter plot - Sepal Length vs Petal Length
plt.figure(figsize=(8,5))
sns.scatterplot(
    x="sepal length (cm)", 
    y="petal length (cm)", 
    hue="target", 
    palette="deep", 
    data=df
)
plt.title("Scatter Plot - Sepal Length vs Petal Length")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.legend(title="Species")
plt.show()

NameError: name 'plt' is not defined

## Findings / Observations
- Different species show distinct average petal lengths, which is useful for classification.
- Sepal length distribution is fairly normal.
- Scatter plot reveals clear separation between species based on petal and sepal lengths.
- The cumulative line chart increases steadily as expected.
