# Exploratory Data Analysis (EDA) on Iris Dataset

## What is Exploratory Data Analysis (EDA)?

EDA is the process of analyzing data sets to summarize their main characteristics, often using visual methods. It helps to uncover patterns, detect anomalies, test hypotheses, and check assumptions.

## Why is EDA Important?
- Prevents wrong model selection.
- Avoids building right models on wrong data.
- Helps in the selection of correct features.

## How to perform EDA?
- Programming Languages: Python, R
- Visualization Tools: Tableau, Power BI, Infogram, Plotly

## About the Iris Dataset
- 150 rows and 4 columns: sepal length, sepal width, petal length, petal width
- Target label: `species` (setosa, versicolor, virginica)
- Balanced dataset (equal distribution among classes)

---

## Import Necessary Libraries
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
```

## Load the Dataset
```python
iris = pd.read_csv("iris.csv")
```

## Display First and Last 5 Rows
```python
iris.head()
iris.tail()
```

## Check Dataset Shape
```python
iris.shape
```

## Display Column Names
```python
iris.columns
```

## Class Distribution
```python
iris['species'].value_counts()
```

## Basic Info
```python
iris.info()
```

---

# Visualizing the Data

## Species Distribution Plot
```python
plt.plot(iris["species"])
plt.xlabel("No. of data points")
plt.ylabel("Species")
plt.title("Species Distribution")
plt.show()
```

## Histogram of Species
```python
plt.hist(iris["species"], color="green")
plt.title("Species Histogram")
plt.show()
```

## Statistical Summary
```python
display(iris.describe())
```

---

# Scatter Plot and Pair Plot

## Scatter Plot: Sepal Length vs Sepal Width
```python
sb.set_style('whitegrid')
sb.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species')
plt.title("Sepal Length vs Sepal Width")
plt.show()
```

## Pair Plot
```python
sb.pairplot(iris, hue='species', height=3)
plt.show()
```

---

# 1D Scatter Plot and Histograms

## 1D Scatter Plot for Petal Length
```python
iris_setosa = iris.loc[iris['species'] == 'setosa']
iris_versicolor = iris.loc[iris['species'] == 'versicolor']
iris_virginica = iris.loc[iris['species'] == 'virginica']

plt.plot(iris_setosa['petal_length'], np.zeros_like(iris_setosa['petal_length']), 'o', label='setosa')
plt.plot(iris_virginica['petal_length'], np.zeros_like(iris_virginica['petal_length']), 'o', label='virginica')
plt.plot(iris_versicolor['petal_length'], np.zeros_like(iris_versicolor['petal_length']), 'o', label='versicolor')
plt.xlabel("Petal Length")
plt.legend()
plt.grid()
plt.title("1D Scatter Plot of Petal Length")
plt.show()
```

## Histogram and Density Plot
```python
sb.FacetGrid(iris, hue="species").map(sb.histplot, 'petal_length', kde=True).add_legend()
plt.title("Histogram and PDF of Petal Length")
plt.show()
```

---

# PDF and CDF Analysis

## PDF and CDF for Setosa Petal Length
```python
counts, bin_edges = np.histogram(iris_setosa['petal_length'], bins=10, density=True)
pdf = counts / sum(counts)
cdf = np.cumsum(pdf)

plt.plot(bin_edges[1:], pdf, label='PDF')
plt.plot(bin_edges[1:], cdf, label='CDF')
plt.xlabel('Petal Length')
plt.title("PDF and CDF for Setosa Petal Length")
plt.legend()
plt.grid()
plt.show()
```

## PDF and CDF for All Species
```python
for species_data, label in zip([iris_setosa, iris_versicolor, iris_virginica], ['setosa', 'versicolor', 'virginica']):
    counts, bin_edges = np.histogram(species_data['petal_length'], bins=10, density=True)
    pdf = counts / sum(counts)
    cdf = np.cumsum(pdf)
    plt.plot(bin_edges[1:], pdf, label=f'{label} PDF')
    plt.plot(bin_edges[1:], cdf, label=f'{label} CDF')

plt.xlabel('Petal Length')
plt.legend()
plt.title("PDF and CDF of Petal Length for All Species")
plt.grid()
plt.show()
```

---

# Statistical Measures

## Mean, Standard Deviation, and Median
```python
print("Means:")
print("Setosa:", np.mean(iris_setosa['petal_length']))
print("Versicolor:", np.mean(iris_versicolor['petal_length']))
print("Virginica:", np.mean(iris_virginica['petal_length']))

print("\nStandard Deviations:")
print("Setosa:", np.std(iris_setosa['petal_length']))
print("Versicolor:", np.std(iris_versicolor['petal_length']))
print("Virginica:", np.std(iris_virginica['petal_length']))

print("\nMedians:")
print("Setosa:", np.median(iris_setosa['petal_length']))
print("Versicolor:", np.median(iris_versicolor['petal_length']))
print("Virginica:", np.median(iris_virginica['petal_length']))
```

## 90th Percentile
```python
print("\n90th Percentiles:")
print("Setosa:", np.percentile(iris_setosa['petal_length'], 90))
print("Versicolor:", np.percentile(iris_versicolor['petal_length'], 90))
print("Virginica:", np.percentile(iris_virginica['petal_length'], 90))
```

---

# Advanced Visualizations

## Boxplot
```python
sb.boxplot(x='species', y='petal_length', data=iris)
plt.title("Boxplot of Petal Length by Species")
plt.grid()
plt.show()
```

## Violin Plot
```python
sb.violinplot(x='species', y='petal_length', data=iris)
plt.title("Violin Plot of Petal Length by Species")
plt.grid()
plt.show()
```

## 3D Scatter Plot
```python
fig = px.scatter_3d(iris, x='sepal_length', y='sepal_width', z='petal_width', color='species')
fig.show()
```