# Preprocessing Overview

## Introduction

In our previous sessions, we explored how to implement classifiers and regressors using both custom code and sklearn. However, in real-world scenarios, data is rarely clean and ready for modeling. This is where preprocessing comes into play. 

This notebook provides an overview of key preprocessing concepts that we'll explore in depth in the following notebooks.

## Preprocessing

Preprocessing involves transforming raw data into a format that's more suitable for modeling. Key preprocessing steps include:

1. **Normalization**: Scaling features to a common range to ensure no single feature dominates the model.
   - Standard Scaling: Transforming features to have mean=0 and variance=1.
   - Min-Max Scaling: Scaling features to a fixed range, usually [0, 1].
   - Log Scaling: Not actually a "normalization" method per se, but and important step for power law data.

2. **Handling Outliers**: Identifying and dealing with extreme values that could skew our model.

3. **Encoding Categorical Variables**: Converting non-numeric data into a format our model can understand.
   - One-Hot Encoding: Creating binary columns for each category.
   - Ordinal Encoding: Assigning integer values to categories.

4. **Imputation**: Dealing with missing data by filling in values.
   - Simple strategies include using the mean, median, or mode of the feature.

