# Feature prerprocessing and generation with respect to models
### - Numeric features

## SUMMARY

1. Numeric feature preprocessing is different for tree and non-tree models:
  * Tree-based models don't depend on scaling
  * Non-tree-based models hugely depend on scaling
2. Most often used preprocessings:
  * `MinMaxScaler` - `[0, 1]`
  * `StandardScaler` - `mean=0, std=1`
  * Rank - set spaces among sorted values to be equal
  * `npl.log(1+x)` and `np.sqrt(1+x)`
3. Feature generation is powered by:
  * Prior knowledge
  * Explanatory data analysis


## Models which depend on feature scale or not
* kNN models get impact from scaling since it uses distance metric
* **Linear models also have difficulties with differently scaled features** - regularization impact turns out to be proportional to feature scale.
* Gradient descent methods can go crazy without a proper scaling.
* Nueral nets also require feature scaling just like linear models.

<br>

![feature-pre-gen-1](img/feature-pre-gen-1.png)

* Preprocessing
  - Tree-based models
    * Find the most useful split for each feature
    * It won't change its behavior and its predictions
    * It can multiply the feature by a constant and retrain the model
  - Non-tree-based models (kNN, linear, neural nets)
  
* Feature generation



## Preprocessing: scaling

`sklearn.preprocessing.MinMaxScaler`
$$X = \frac{(X - X.min())}{(X.max() - X.min())}$$

`sklearn.preprocessing.StandardScaler`
$$X = \frac{(X - X.mean())}{X.std()}$$



### We can optimize scaling parameter to boost features which seems to be more important for us!

## Preprocessing: outliers

* To protect linear models from the massive impacts of outliers, **we can clip feature values between two chosen values of lower and upper bound.**
  * or by **percentiles** (e.g. zero to 99 percentiles)!
  
![](img/feature-pre-gen-2.png)

## Preprocessing: rank

Can be a better option than MinMaxScaler if we have outlies
* because rank transformation will move the outlies closer to our objects.

**example**
- If we apply a rank to the source of array, it will change values to their indices. 
- If we apply a rank to the not-sorted array, it will sort the array; It applies mapping between values and indices in the source of array to the array.
```
rank([-100, 0, 1e5) == [0, 1, 2]
rank([1000, 1, 10]) = [2, 0, 1]
```

### Linear models, kNN, nueral nets can benefit from this kind of transformation if we have no time to handle outliers manually.

`scipy.stats.rankdata`
* You need to store the creative mapping from features values to their rank values.
* Alternatively, you can concatenate train and test data before applying rank transformation.

## Preprocessing: others
Especially useful for neural nets - they drive too big values closer to the features' average value. Along with this, values near zero are more distinguishable. 

1. Log transform : `np.log(1 + x)`
2. Raising to the power < 1: `np.sqrt(x + 2/3)`

### Another important moment which holds `TRUE` for all prerpocessings:
Sometimes it is beneficial 
* to **train a model on concatenated data frames produced by different preprocessings**
* to **mix models training differently-preprocessed data**.

KNN, linear models, neural nets can benefit hugely from this!

## Feature Generation

Ways to proceed:
* prior knowledge
* EDA

![feature-gen-ex1](img/feature-gen-ex1.png)

![feature-gen-ex2](img/feature-gen-ex2.png)

### It is useful to now that adding, multiplcations, divisions and other features interactions can be of help not only for linear models.
* Gradient Boost Decision Tree models are powerful, but it still experiences **difficulties with approximation of multiplications and divisions**.
* Adding size feature explicitly can lead to a more robust model with less amount of trees.

![feature-gen-ex3](img/feature-gen-ex3.png)

