# Week 2. Feature Engineering

---

- Making data useful before training a model
- Representating data in forms that help models learn
- Increasing predictive quality
- Reducing dimensionality with feature engineering

<img src = "https://i.gyazo.com/46f5217676f0e03f5c6dbf7216b726e9.png" width = "400px">

### Key points

- Feature engineering can be difficult and time consuming, but also very important to success
- Squeezing the most out of data through feature engineering enables models to learn better
- Concentrating predictive information in fewer features enables more efficient use of compute resources
- Feature engineering during training must also be applied correctly during serving

## Preprocessing Operations

---
- Main preprocessing operations
- Mapping raw data into features
- Mapping numeric values
- Mapping categorical values
- Empirical knowledge of data

### Main preprocessing operations
- Data cleansing: Correcting erroneous data
- Feature tuning
- Representation transformation
- Feature extraction
- Feature construction

## Empirical knowledge of data
- **Text:** stemming, lemmatization, TF-IDF, n-grams, embedding lookup
- **Images:** clipping, resizing, cropping, blur, Canny filters, sobel filters, photometric distortions

## Key points
Data preoprocessing: transforms raw data into a clean and training-ready dataset
Feature engineering maps:
- Raw data into feature vectors
- Integer values to floating-points values
- Normalizes numerical values
- Strings and categorial values to vectors of numeric values
- Data from one space into a different space

## Feature Engineering Techniques

---

- Feature Scaling
- Normalization and Standardization
- Bucketizing / Binning
- Other techniques (dimensionality reduction)

<img src = "https://i.gyazo.com/ab0624bb50379deba8d12e9d943affa1.png" width = "300px">

### Scaling
- Converts values from their natural range into a prescribe range
- Helps neural nets converge faster
- Do away with NaN error during training
- For each feature the models learns the right weights

<code>image = (image - 127.5)/127.5</code> for grayscale image pixel intensity with scale $[0, 255]$ is now rescaled to $[-1, 1]$

### Normalization
Commonly use for not gaussian distributions

$$X_{norm} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}}, \quad X_{norm} \in [0, 1] $$

### Standardization (z-score)
Z-score relates the number of standard deviations away from the mean
$$X_{std} = \frac{X - \mu}{\sigma} \sim \mathcal{N}(\mu, \sigma)$$

### Bucketizing / Binning

<img src = "https://i.gyazo.com/41d32c703c5e8b4bf4e910108d553dc9.png" width = "500px">

### Other techniques

Dimensionality reduction in embeddings
- Principal componenent analysis (PCA)
- t-Distributed stochastic neighbor embedding (t-SNE)
- Uniform manifold approximation and projection (UMAP)

### Keypoints

Feature engineering
- Prepares, tunes, transforms, extracts and constructs features

Feature engineering is a key for model refinement

## Feature Crosses

---

- Combines multiple features together into a new feature
- Encodes nonlinearity in the feature space, or encodes th esame information in fewer features

### Key points
- Feature crossing: synthetic feature encoding nonlinearity in feature space
- Feature coding: transforming categorical to a continuos variable