# 📊 Assignment: Data Analysis & Visualization with Pandas & Matplotlib


## 🎯 Objectives
- Load and analyze a dataset using **pandas** in Python  
- Perform basic data exploration and statistical analysis  
- Create simple plots with **matplotlib** (and seaborn for styling)  
- Present findings and observations  


## ✅ Task 1: Load and Explore the Dataset

In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load dataset
iris_data = load_iris()

# Convert to pandas DataFrame
df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)
df["species"] = pd.Categorical.from_codes(iris_data.target, iris_data.target_names)

# Display first rows
df.head()


In [None]:

# Check dataset info
print("Dataset Info:")
print(df.info())

# Check missing values
print("\nMissing Values:")
print(df.isnull().sum())


## ✅ Task 2: Basic Data Analysis

In [None]:

# Descriptive statistics
df.describe()


In [None]:

# Group by species and calculate mean
df.groupby("species").mean()



**Observations:**
- Dataset has 150 samples, 4 numerical features, and 1 categorical column (species).  
- Setosa flowers have smaller petal sizes compared to Versicolor and Virginica.  
- Petal length/width strongly differentiate the species.  


## ✅ Task 3: Data Visualization

In [None]:

# 1. Line Chart
plt.figure(figsize=(8,5))
plt.plot(df.index, df["sepal length (cm)"], label="Sepal Length")
plt.plot(df.index, df["petal length (cm)"], label="Petal Length")
plt.title("Line Chart: Sepal vs Petal Length across Samples")
plt.xlabel("Sample Index")
plt.ylabel("Length (cm)")
plt.legend()
plt.show()


In [None]:

# 2. Bar Chart (average petal length per species)
df.groupby("species")["petal length (cm)"].mean().plot(kind="bar", color=["#4CAF50", "#2196F3", "#FF5722"], figsize=(6,4))
plt.title("Average Petal Length by Species")
plt.xlabel("Species")
plt.ylabel("Average Petal Length (cm)")
plt.show()


In [None]:

# 3. Histogram (Sepal Length)
df["sepal length (cm)"].hist(bins=20, color="skyblue", edgecolor="black", figsize=(6,4))
plt.title("Histogram of Sepal Length")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Frequency")
plt.show()


In [None]:

# 4. Scatter Plot (Sepal Length vs Petal Length)
sns.scatterplot(data=df, x="sepal length (cm)", y="petal length (cm)", hue="species", palette="deep")
plt.title("Scatter Plot: Sepal Length vs Petal Length")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.legend(title="Species")
plt.show()



## 📌 Final Findings
1. The Iris dataset is clean with no missing values.  
2. Petal length and width are the most distinguishing features among species.  
3. Setosa is clearly separable due to its small petals.  
4. Versicolor and Virginica overlap, but Virginica generally has larger petals.  
