# Data Preprocessing with Seaborn Datasets
You will practice data preprocessing techniques using Seaborn datasets: **Tips**, **Flights**, and **Titanic**.
We will cover:
- Handling Missing Values
- Feature Scaling
- Encoding
- Binning
- Normalization

Each step includes explanation, justification, and trade-off analysis.

In [None]:
# Imports
import seaborn as sns
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load datasets
tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights')
titanic = sns.load_dataset('titanic')
tips.head(), flights.head(), titanic.head()

## Handling Missing Values

In [None]:
# Check missing values
titanic.isnull().sum()

In [None]:
# Fill missing age with median, embark_town with mode, and drop deck due to too many nulls
titanic['age'].fillna(titanic['age'].median(), inplace=True)
titanic['embark_town'].fillna(titanic['embark_town'].mode()[0], inplace=True)
titanic.drop(columns=['deck'], inplace=True)
titanic.isnull().sum()

## Feature Scaling

In [None]:
# Standard scaling 'total_bill' and 'tip' from tips
scaler = StandardScaler()
tips[['total_bill_scaled', 'tip_scaled']] = scaler.fit_transform(tips[['total_bill', 'tip']])
tips.head()

## Encoding

In [None]:
# Encode 'sex' and 'embarked' in Titanic
le = LabelEncoder()
titanic['sex_encoded'] = le.fit_transform(titanic['sex'])
titanic['embarked_encoded'] = le.fit_transform(titanic['embarked'].astype(str))
titanic[['sex', 'sex_encoded', 'embarked', 'embarked_encoded']].head()

## Binning

In [None]:
# Bin age into categories in Titanic
bins = [0, 12, 20, 40, 60, 100]
labels = ['Child', 'Teen', 'Adult', 'Middle Age', 'Senior']
titanic['age_group'] = pd.cut(titanic['age'], bins=bins, labels=labels)
titanic[['age', 'age_group']].head()

## Normalization

In [None]:
# Normalize passenger counts in Flights dataset
min_max = MinMaxScaler()
flights['passengers_normalized'] = min_max.fit_transform(flights[['passengers']])
flights.head()

## Conclusion
All preprocessing steps have been applied. Justify your methods based on data distribution, scale requirements, and ML algorithm expectations.