# Iris Dataset Analysis Assignment
**Student Name:** [Your Name]  
**Course:** Data Analysis with Python  
**Date:** [Insert Date]


This assignment involves analyzing the famous Iris dataset. The tasks include loading and cleaning the dataset, performing basic statistical analysis, and creating various visualizations to understand the structure and relationships within the data.


In [None]:

import pandas as pd
from sklearn.datasets import load_iris

# Load the dataset from sklearn and convert to DataFrame
iris_data = load_iris()
df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)
df['species'] = iris_data.target
df['species'] = df['species'].map(dict(zip(range(3), iris_data.target_names)))

# Display the first few rows
df.head()


In [None]:

# Checking the structure of the dataset
df.info()
df.isnull().sum()


In [None]:

# Drop any missing values if they exist (though this dataset has none)
df.dropna(inplace=True)


In [None]:

# Basic statistics of numerical columns
df.describe()


In [None]:

# Group by species and compute average measurements
df.groupby('species').mean()



From the grouped statistics, it is evident that *Iris-virginica* generally has the largest petal dimensions, while *Iris-setosa* tends to have the smallest.


In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

# Line chart showing average sepal length by species
df_grouped_time = df.groupby('species').mean().reset_index()
plt.figure(figsize=(8, 5))
plt.plot(df_grouped_time['species'], df_grouped_time['sepal length (cm)'], marker='o')
plt.title('Average Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Sepal Length (cm)')
plt.grid(True)
plt.show()


In [None]:

# Bar chart: Average petal length per species
plt.figure(figsize=(8, 5))
sns.barplot(x='species', y='petal length (cm)', data=df, estimator='mean')
plt.title('Average Petal Length per Species')
plt.xlabel('Species')
plt.ylabel('Petal Length (cm)')
plt.show()


In [None]:

# Histogram of sepal width
plt.figure(figsize=(8, 5))
plt.hist(df['sepal width (cm)'], bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Sepal Width')
plt.xlabel('Sepal Width (cm)')
plt.ylabel('Frequency')
plt.show()


In [None]:

# Scatter plot: Sepal length vs Petal length
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x='sepal length (cm)', y='petal length (cm)', hue='species')
plt.title('Sepal Length vs Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend(title='Species')
plt.show()


In [None]:

# Example of error handling while reading a file
try:
    df = pd.read_csv("non_existent_file.csv")
except FileNotFoundError:
    print("Error: File not found. Please check the path.")



### Conclusion

The Iris dataset offers a straightforward yet insightful way to practice data analysis. Through this assignment, I was able to explore data cleaning, perform basic statistical analysis, and visualize trends and relationships in the dataset. The visualizations particularly helped in understanding species differences in petal and sepal dimensions.
