
# Analyzing Data with Pandas and Visualizing Results with Matplotlib

This notebook demonstrates basic data analysis and visualization using the Iris dataset. It uses the Pandas library for data manipulation and Matplotlib/Seaborn for creating insightful charts.


In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load the Iris dataset
try:
    iris_data = load_iris()
    df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)
    df['species'] = iris_data.target
    df['species'] = df['species'].map(dict(zip(range(3), iris_data.target_names)))
    print("Dataset loaded successfully.")
except Exception as e:
    print(f"Error loading dataset: {e}")

df.head()


In [None]:

# Check data types and missing values
print("Data Types:")
print(df.dtypes)

print("\nMissing Values:")
print(df.isnull().sum())


In [None]:

# No missing values found in this dataset, but here’s how you’d handle them
# df = df.dropna()  # OR df.fillna(method='ffill', inplace=True)


In [None]:

# Descriptive statistics
df.describe()


In [None]:

# Grouping by species and calculating mean
grouped = df.groupby('species').mean()
print("Average measurements per species:")
print(grouped)


In [None]:

# Line chart – Mean measurements per species
grouped.T.plot(kind='line', marker='o')
plt.title('Mean Feature Measurements per Species')
plt.xlabel('Features')
plt.ylabel('Measurement (cm)')
plt.legend(title='Species')
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:

# Bar chart – Average petal length per species
sns.barplot(x='species', y='petal length (cm)', data=df, estimator='mean', palette='Set2')
plt.title('Average Petal Length per Species')
plt.ylabel('Petal Length (cm)')
plt.xlabel('Species')
plt.tight_layout()
plt.show()


In [None]:

# Histogram – Distribution of sepal length
plt.hist(df['sepal length (cm)'], bins=15, color='skyblue', edgecolor='black')
plt.title('Distribution of Sepal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()


In [None]:

# Scatter Plot – Sepal Length vs Petal Length
sns.scatterplot(x='sepal length (cm)', y='petal length (cm)', hue='species', data=df, palette='Set1')
plt.title('Sepal Length vs Petal Length by Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend(title='Species')
plt.tight_layout()
plt.show()



### Observations

- **Iris setosa** has noticeably smaller petal lengths compared to the other species.
- **Petal length and sepal length** show a strong positive correlation, especially for virginica and versicolor.
- The **distribution of sepal length** is fairly normal, with most values around 5.0–6.0 cm.
- Group-wise analysis shows clear separation in feature averages among the three species.
