# Iris Dataset Classification with Feature Normalization

This notebook demonstrates different feature normalization techniques on the famous Iris dataset for classification tasks.

## Dataset Overview
- **Features**: 4 numerical features (sepal length, sepal width, petal length, petal width)
- **Target**: 3 species classes (Iris-setosa, Iris-versicolor, Iris-virginica)
- **Samples**: 150 (50 per class)


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")


## 1. Load and Explore the Dataset


In [None]:
# Load the Iris dataset
df = pd.read_csv('Iris.csv')

print(f"Dataset shape: {df.shape}")
print(f"\nColumn information:")
df.info()

print(f"\nFirst 5 rows:")
df.head()


## 2. Feature Normalization Demonstration


In [None]:
# Separate features and target
feature_columns = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']
X = df[feature_columns].copy()
y = df['Species'].copy()

print("Original feature statistics:")
print(X.describe())

# Apply different normalization techniques
standard_scaler = StandardScaler()
minmax_scaler = MinMaxScaler()
robust_scaler = RobustScaler()

X_standard = pd.DataFrame(standard_scaler.fit_transform(X), columns=feature_columns)
X_minmax = pd.DataFrame(minmax_scaler.fit_transform(X), columns=feature_columns)
X_robust = pd.DataFrame(robust_scaler.fit_transform(X), columns=feature_columns)

# Visualize the differences
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

X.boxplot(ax=axes[0, 0])
axes[0, 0].set_title('Original Data')
axes[0, 0].tick_params(axis='x', rotation=45)

X_standard.boxplot(ax=axes[0, 1])
axes[0, 1].set_title('StandardScaler (mean=0, std=1)')
axes[0, 1].tick_params(axis='x', rotation=45)

X_minmax.boxplot(ax=axes[1, 0])
axes[1, 0].set_title('MinMaxScaler (range 0-1)')
axes[1, 0].tick_params(axis='x', rotation=45)

X_robust.boxplot(ax=axes[1, 1])
axes[1, 1].set_title('RobustScaler (median/IQR)')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n=== NORMALIZATION SUMMARY ===")
print("1. StandardScaler: Centers data around mean=0, std=1")
print("2. MinMaxScaler: Scales features to range [0,1]") 
print("3. RobustScaler: Uses median and IQR, robust to outliers")
print("\nFor the Iris dataset, all methods work well due to clean, well-distributed data.")
print("StandardScaler is typically recommended for neural networks and SVM.")
print("MinMaxScaler is good for distance-based algorithms like KNN.")
