# Data Analysis with Pandas and Matplotlib
## Using the Iris Dataset

This notebook demonstrates data analysis and visualization using Python libraries pandas and matplotlib.

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Set style for better visualizations
plt.style.use('seaborn')

## Task 1: Load and Explore the Dataset

First, we'll load the Iris dataset and explore its basic structure.

In [None]:
# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Display the first few rows
print("First 5 rows of the dataset:")
print(df.head())

print("\nDataset Info:")
print(df.info())

print("\nBasic Statistics:")
print(df.describe())

## Task 2: Basic Data Analysis

Let's analyze the data by species and look at some key statistics.

In [None]:
# Group by species and calculate mean
species_means = df.groupby('species').mean()
print("Mean values by species:")
print(species_means)

# Check for any missing values
print("\nMissing values in dataset:")
print(df.isnull().sum())

## Task 3: Data Visualization

Let's create various plots to visualize the data.

In [None]:
# Create a figure with subplots
plt.figure(figsize=(15, 10))

# 1. Line plot
plt.subplot(2, 2, 1)
for species in iris.target_names:
    species_data = df[df['species'] == species]
    plt.plot(species_data['sepal length (cm)'].values, label=species)
plt.title('Sepal Length Patterns')
plt.xlabel('Sepal Index')
plt.ylabel('Sepal Length (cm)')
plt.legend()

# 2. Bar plot
plt.subplot(2, 2, 2)
species_means['petal length (cm)'].plot(kind='bar')
plt.title('Average Petal Length by Species')
plt.xlabel('Species')
plt.ylabel('Petal Length (cm)')

# 3. Histogram
plt.subplot(2, 2, 3)
plt.hist(df['sepal width (cm)'], bins=20)
plt.title('Distribution of Sepal Width')
plt.xlabel('Sepal Width (cm)')
plt.ylabel('Frequency')

# 4. Scatter plot
plt.subplot(2, 2, 4)
for species in iris.target_names:
    species_data = df[df['species'] == species]
    plt.scatter(species_data['sepal length (cm)'], 
               species_data['petal length (cm)'],
               label=species)
plt.title('Sepal Length vs Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend()

plt.tight_layout()
plt.show()

## Findings and Observations

1. The dataset contains 150 samples with 4 features and no missing values.
2. There are three species of Iris flowers with 50 samples each.
3. The scatter plot shows clear separation between Setosa and the other species.
4. Petal length shows the most significant variation between species.
5. Sepal width has a roughly normal distribution across all species.