# Pandas Intro EDA (Iris dataset)
This notebook does a quick exploratory data analysis using the classic Iris dataset.
- Loads data into a pandas DataFrame
- Shows head/info/describe
- Groupby aggregates
- A few Matplotlib charts

You can replace the dataset later with your own CSVs.

In [1]:
# Imports
import pandas as pd
import matplotlib.pyplot as plt

# Ensure plots show inside the notebook
%matplotlib inline

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
# Load the Iris dataset from sklearn (no internet required).
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
df = iris.frame.copy()
df.rename(columns={'target': 'species_index'}, inplace=True)
df['species'] = df['species_index'].map(dict(enumerate(iris.target_names)))
df.head()

In [None]:
# Basic structure
print("Shape:", df.shape)
print("\nInfo:")
print(df.info())
display(df.describe(include='all'))

In [None]:
# Groupby example: mean measurements by species
grouped = df.groupby('species').agg(['mean','min','max'])
grouped

In [None]:
# Histogram: sepal length distribution
plt.figure()
df['sepal length (cm)'].plot(kind='hist', bins=20)
plt.title('Sepal Length Distribution')
plt.xlabel('sepal length (cm)')
plt.ylabel('count')
plt.show()

In [None]:
# Scatter: sepal length vs sepal width, colored by species (legend via labels)
plt.figure()
for sp, sub in df.groupby('species'):
    plt.scatter(sub['sepal length (cm)'], sub['sepal width (cm)'], label=sp, alpha=0.7)
plt.title('Sepal Length vs Width by Species')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.legend()
plt.show()

## Next steps
- Swap in your own CSV with `pd.read_csv('yourfile.csv')`.
- Try `groupby` on a categorical column you care about.
- Add more plots (one per cell) such as boxplots or scatter of other feature pairs.
- Save cleaned data to CSV for later modeling.