# Feature Preprocessing and Engineering

## Numerical Features

1. Scaling and Rank for numeric features

    a. Tree-based models doesn't depend on them (GBDT, XgBoost, LightGBM)
    
    b. Non-tree based models hugely depends on them. (Linear, kNN, Neural Nets)
    
    
2. Most often used preprocessing methods

    a. `MinMaxScalar`, after scaling value range between [0, 1]. This method saves value distribution
    
    b. `StandardScalar`, normalizes distribution with mean = 0, and std = 1
    
    c. `Rank`, set spaces between sorted values to be equal.
        
        Example:
        before scaling [0, 10, 18]
        after scaling [0, 1, 2]
        
    d. np.log(1+x) and np.sqrt(1+x)
        
        np.log() used to distinguish values near 0
        np.sqrt() to reduce magnitude
        
    e. Clip outliers
        
        Example:
        np.clip(1, 99), only saves values between 1 and 99 percentile
        
3. Feature generation is powered by:
        a. Prior knowledge
        b. Exploratory Data Analysis (EDA)

## Categorical Features

1. Values in `ordinal` features are sorted in some meaningful order


2. `Label encoding` maps categorical values to numerical. Mostly used in tree based models
    
        Example:
        |city|     |city_label_encoded|
        |Boston|   |0|
        |Boston|   |0|
        |NYC|      |1|
        |Chicago|  |2|
        |Boston|   |0|
        |NYC|      |1|
        |NYC|      |1|

3. `Frequency encoding` maps categorical to their frequencies. Mostly used in tree based models


4. `One hot encoding` creates new column for every unique row in categorical feature. Mostly used in non-tree based models (Linear, kNN, Neural Nets)


5. Interaction of categorical features can help linear models and kNN. Simply speaking add two categorical columns and one-hot encode it.

## Datetime and Coordinates

1. Datetime

    a. `Periodicity` (day, week, year)
    
    b. `Time since row dependent/independent event`. 
        
        Example:
        time since last purchase can be very usefull feature.
    
    c. `Difference between dates`. 
        
        Example: 
        date of cancel - date of buy can be very usefull
        
2. Coordinates

    a. Distance between interesting places and train/test data.
    
        Example:
        If we predict house price, houses closer to shopping malls, squares will be more expensive. 
    
    b. Centers of clusters
        
        Example:
        Places can form clusters and we can compute distances between clusters, and it might be usefull feature.
    
    c. Aggregated statistics
        
        Example:
        Average price in neighborhood
    