This project explores the classic Iris dataset using Python. It demonstrates foundational data science skills including data loading, cleaning, statistical analysis, and visualization. The goal is to uncover patterns across iris species and present insights through expressive plots and structured code.
- Loaded the Iris dataset via
sklearn.datasets.load_iris()
- Converted it into a pandas DataFrame
- Inspected structure, data types, and missing values
- Verified dataset cleanliness (no missing values)
- Computed descriptive statistics (mean, median, std)
- Grouped data by species to compare feature averages
- Identified key patterns in petal and sepal dimensions
- ๐ Line chart: Sepal vs Petal Length across index
- ๐ Bar chart: Average Petal Length per species
- ๐ Histogram: Sepal Width distribution
- ๐ฌ Scatter plot: Sepal Length vs Petal Length by species
All plots are customized with titles, axis labels, and legends using matplotlib
and seaborn
.
- Setosa species has the smallest petal and sepal dimensions.
- Virginica shows the largest petal length and width.
- Petal dimensions are highly discriminative across speciesโideal for classification tasks.
- Visualizations reveal strong clustering potential.
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn