In [None]:
# 1. Loading and Exploring the Dataset

import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Display the first few rows
print(df.head())

# 2. Checking Data Types and Missing Values

# Check data types
print(df.dtypes)

# Check for missing values
print(df.isnull().sum())

# 3. Cleaning the Dataset
# Drop rows with missing values (if any)
df.dropna(inplace=True)

# 4. Basic Data Analysis
# Compute basic statistics
print(df.describe())

# Group by species and compute the mean
grouped_df = df.groupby('species').mean()
print(grouped_df)

# 5. Creating Visualizations
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

# Line Chart
df.plot.line()
plt.title('Line Chart of Iris Data')
plt.xlabel('Index')
plt.ylabel('Value')
plt.show()

# Bar Chart
grouped_df['petal length (cm)'].plot.bar()
plt.title('Average Petal Length per Species')
plt.xlabel('Species')
plt.ylabel('Average Petal Length (cm)')
plt.show()

# Histogram
df['sepal length (cm)'].plot.hist()
plt.title('Histogram of Sepal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Frequency')
plt.show()

# Scatter Plot
df.plot.scatter(x='sepal length (cm)', y='petal length (cm)')
plt.title('Sepal Length vs. Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.show()



## Findings and Observations

### Basic Data Analysis

- The mean, median, and standard deviations provide an understanding of the central tendency and variability of the dataset.
- Grouping by species revealed distinct differences in petal length and sepal length among the different species in the Iris dataset.

### Visualizations

- **Line Chart**: Showed how values varied across the dataset.
- **Bar Chart**: Demonstrated that species 2 had the highest average petal length.
- **Histogram**: Showed a normal distribution with a slight right skew.
- **Scatter Plot**: Revealed a positive correlation between sepal length and petal length.

### Key Insights

- There is a clear differentiation in the physical characteristics of the Iris species, which can be useful for classification.
- The visualizations help in understanding the distribution and relationships between different features of the dataset.
