# Feature Engineering and Variable Transformation

## Transforming Data: Background 

Models used in Machine Learning Workflows often make assumptions about the data.

A common example is the **Linear regression model**. This assumes a linear relationship between observations and the target (outcome) variable.

---
### Transforming Data: Background

![image.png](attachment:image.png)

### Transformation of Data Distributions 

Predictions from linear regression models assume residuals are **normally distributed**

Features and predicted data are often **skewed** (distorted away from the center)

**Data Transformation** can solve this issue 

![image-2.png](attachment:image-2.png)

    
```python
# Useful transformation functions 
from numpy import log, log1p
from scipy.stats import boxcox
```
![image-3.png](attachment:image-3.png)

---
### Log Transformation Example

```python
# plot a histogram and density plot
sns.distplot(data, bins = 20);
```
![image-4.png](attachment:image-4.png)

```python
import math
log_data = (math.log(d) for d in data ['Unemployment'])

# plot transformed plots
sns.distplot(log_data, bins = 20);
```
![image-5.png](attachment:image-5.png)

---

### Transformations: Log Features
![image-6.png](attachment:image-6.png)

---
### Transformations: Polynomial Features

![image-7.png](attachment:image-7.png)

![image-8.png](attachment:image-8.png)

---
### Polynomail Features: Syntax
```python
# Import the class containing the transformation method
from sklearn.preprocessing import PolynomialFeatures

# Create an instance of the class (choose number of degrees)
polyFeat = PolynomialFeatures(degree = 2)

# Create the polynomial features and then transform the data
polyFeat = polyFeat.fit(X_data)
X_poly = polyFeat.transform(X_data)
```
---
### Variable Selection: Background 

**Varaible selection** involves choosing the set of features to include in the model-

Variables must often be transformed before they can be included in models. In addition to log and polynomial transformations, this can involve:

- **Encoding**: Converting non-numeric feature to numeric features.
- **Scaling**: Converting the scale of numeric data so they are comparable.

The appropriate method of scaling or encoding depends on the type of feature. 

### Feature Enconding : Types of Features 

**Encoding** is often applied to **Categorical Features**, that take non-numerical values.

Two primary types: 

- **Nominal**: Categorical variables take values in unordered categories (e.g. Red, Blue, Green, True, False)

- **Ordinal**: Categorical variable take values in ordered categories (e.g. High, Medium, Low)

### Feature Encoding : Approaches

There are several common approaches to encoding variables: 

- **Binary encoding**: Converts variables to either 0 or 1 and is suitable for variable that take two possible values (e.g. True, False)

- **One-hot encoding**: Converts variable that take multiple values into a binary (0 or 1) variables one for each category. This creates several new varibles. 

- **Ordinal encoding**: Involves converting ordered categories to numerical values, usually by creating one varible that takes integer equal to the number of categories (e.g. 0,1,2,3...)

---
### Feature Scaling: Background

**Feature Scaling** involves adjusting a variable's scale. This allows comparison of varibles with different scales.

Different continuos (numeric) features often have different scales.

Why might this be an issue?

### Feature Scaling: Example

![image-9.png](attachment:image-9.png)

### Feature Scaling: Appraoches

Therea are many approaches to scaling features:
Some of the more common approaches include:

- **Standar Scaling**: Converts features to **Standar normal** variable (by subtracting the mean and dividing by the standard error)

- **Min-Max Scaling**: Converts variables to continuos variables in the (0,1) interval by mapping the minimum value to 0 and the maximum value to 1. _This type of scaling is sensitive to outliers_

- **Robus Scaling**: Is similar to min-max scaling, but instead maps the **Interquartile range** (The 75 percentile value minus the 25 percentile value) to (0,1). This means the variable itself takes values outside of the (0,1) interval.

---
### Common Variable Transformations

| Feature type                                             | Transformation                    | code                                                                           |
| ------------                                             | --------------                    | ----                                                                           |
| Continuous: Numerical values                             | Standar, Min Max, Robust Scaling  | `from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler` |
| Nominal: Categorical, Unordered features (True or False) | Binary, one-hot Encoding (0,1)    | `from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder`, `from pandas import get_dummies` |
| Ordinal: Categorical, Ordered features (Movie ratings)   | Ordinal Encoding (0,1,2,3...)     | `from sklearn.preprocessing import OrdinalEncoder`, `from sklearn.feature_extraction import DictVectorizer`                             |