# Feature engineering examples

This notebook contains some examples of feature engineering using SAM.


We use the following example dataset:

In [1]:
import pandas as pd
from sam.datasets import load_rainbow_beach

data = load_rainbow_beach()

Unnamed: 0_level_0,batttery_life,transducer_depth,turbidity,water_temperature,wave_height,wave_period
TIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-06-15 00:00:00,11.6,1.495,0.85,16.6,0.136,3.0
2014-06-15 01:00:00,11.6,1.42,0.87,16.3,0.117,4.0
2014-06-15 02:00:00,11.6,1.478,0.79,16.1,0.114,7.0
2014-06-15 03:00:00,11.6,1.518,0.76,15.9,0.111,3.0
2014-06-15 04:00:00,11.6,1.507,0.77,15.7,0.107,3.0


## Simple feature engineering for timeseries data

The class `sam.feature_engineering.SimpleFeatureEngineering` is used to create common features for timeseries data: rolling features and time components.

A rolling feature can be parameterized by a tuple of the form `(column_name, method, rolling_window)`, where `column_name` is the name of the column to be used as the time series, `method` is the type of rolling feature (e.g. "mean", "lag", "max"), and `rolling_window` is the size of the rolling window.

A time component can be either be descibed by a dummy variable (one hot encoding) or a cyclic variable (sin/cos). To parameterize a cyclic variable, a tuple of the form `(period, component_type)` is used.


======================

For this example we will create the following set of features:
- Mean water_temperature of the past day
- Mean water_temperature of the past two days
- Maximum turbidity of the past day
- Maximum turbidity of the past two days
- Cyclical features (sin/cos) of hour of the day
- Cyclical features (sin/cos) of day of the week

The following code shows how SAM can be used to create a feature engineering pipeline.

In [2]:
from sam.feature_engineering import SimpleFeatureEngineer


sfe = SimpleFeatureEngineer(
    rolling_features=[
        ("water_temperature", "mean", "1D"),
        ("water_temperature", "mean", "2D"),
        ("turbidity", "max", "1D"),
        ("turbidity", "max", "2D"),
    ],
    time_features=[
        ("hour_of_day", "cyclical"),
        ("day_of_week", "cyclical"),
    ],
)

sfe.fit_transform(data).head()

Unnamed: 0_level_0,water_temperature_mean_1D,water_temperature_mean_2D,turbidity_max_1D,turbidity_max_2D,hour_of_day_cyclical_sin,hour_of_day_cyclical_cos,day_of_week_cyclical_sin,day_of_week_cyclical_cos
TIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-06-15 00:00:00,16.6,16.6,0.85,0.85,0.0,1.0,-0.781831,0.62349
2014-06-15 01:00:00,16.45,16.45,0.87,0.87,0.258819,0.965926,-0.781831,0.62349
2014-06-15 02:00:00,16.333333,16.333333,0.87,0.87,0.5,0.866025,-0.781831,0.62349
2014-06-15 03:00:00,16.225,16.225,0.87,0.87,0.707107,0.707107,-0.781831,0.62349
2014-06-15 04:00:00,16.12,16.12,0.87,0.87,0.866025,0.5,-0.781831,0.62349


## Custom feature engineering function

If you want more freedom and customize your feature engineering, you can use `sam.feature_engineering.FeatureEngineering` to create your own feature engineering transformer from a feature engineering function. This class provides methods to make sure the interface is compatible with sam models.


In [3]:
from sam.feature_engineering import FeatureEngineer

def my_feature_engineering(X, y=None):
    """Don't forget documentation
    """
    X_out = X.copy()
    X_out = X_out[["water_temperature", "turbidity"]]
    X_out['my_feature'] = X_out['water_temperature'].rolling(window=24).mean().pow(2)
    return X_out

my_fe = FeatureEngineer(my_feature_engineering)

my_fe.fit_transform(data)

Unnamed: 0_level_0,water_temperature,turbidity,my_feature
TIME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-06-15 00:00:00,16.6,0.85,
2014-06-15 01:00:00,16.3,0.87,
2014-06-15 02:00:00,16.1,0.79,
2014-06-15 03:00:00,15.9,0.76,
2014-06-15 04:00:00,15.7,0.77,
...,...,...,...
2014-07-15 16:00:00,18.7,2.48,342.250000
2014-07-15 17:00:00,18.7,2.48,341.325625
2014-07-15 18:00:00,18.7,2.48,340.556267
2014-07-15 19:00:00,18.8,2.80,340.248767


## Customized feature engineering class

If a single function does not fit your needs, you can create your own feature engineering class. By creating a subclass of `sam.feature_engineering.BaseFeatureEngineer`, you can implement your own feature engineering function. You only need to implement the `feature_engineer_` method. If you want to fit certain parameters, you can implement the `fit` method as well. Check the current implementation of `BaseFeatureEngineer` or `SimpleFeatureEngineer` for an example.

