# 🧠 Session 3: Data Preparation Essentials

## 🕒 00:00–00:10 – Why Data Preparation Matters
Data quality is critical in ML. Most of the work in real-world projects involves cleaning and preparing data.

**Common problems:**
- Missing values
- Inconsistent formats
- Duplicates
- Outliers

**Discussion:** What might happen if we skip this step?

## 🕒 00:10–00:25 – Core Techniques in Data Preparation
**1. Handling Missing Values**
- Drop rows/columns
- Impute with mean/median/mode
- Model-based imputation

**2. Encoding Categorical Variables**
- One-hot encoding
- Label encoding

**3. Feature Scaling**
- StandardScaler (Z-score)
- MinMaxScaler (0–1 range)

**4. Removing Duplicates and Fixing Data Types**
- Use `.duplicated()`, `.drop_duplicates()`
- Use `.astype()` for type conversion

In [None]:
# TODO: Load your dataset and display basic info
import pandas as pd
# Example:
# df = pd.read_csv('your_data.csv')
# df.info()

In [None]:
# TODO: Check for missing values
# df.isnull().sum()

In [None]:
# TODO: Encode categorical variables if needed
# Use pd.get_dummies() or sklearn.preprocessing.LabelEncoder

In [None]:
# TODO: Scale numerical features if needed
# from sklearn.preprocessing import StandardScaler, MinMaxScaler

## 🕒 00:25–00:40 – Feature Engineering Overview
**Feature Engineering Techniques:**
- Date/time decomposition
- Text statistics (length, keywords)
- Interaction terms (feature combinations)
- Binning (e.g., age groups)

**Prompt:**
_“If you had a dataset of bike rentals with timestamps and weather, what features would you create?”_

In [None]:
# TODO: Create new features from existing ones
# Example: df['hour'] = pd.to_datetime(df['timestamp']).dt.hour

## 🕒 00:40–00:45 – Recap & Prep for Next Session
**Recap:**
- Data cleaning is essential
- Proper preparation improves model performance

**Next:**
Preview your dataset using the following commands.

In [None]:
# TODO: Preview your cleaned data
# df.describe()

## ✅ Exit Reflection
_Which part of your own project data might require the most cleaning?_