# Feature engineering examples

This notebook contains some examples of feature engineering using SAM.

In [35]:
import pandas as pd

data = pd.read_parquet('../data/rainbow_beach.parquet')

# # Battery Life	Transducer Depth	Turbidity	Water Temperature	Wave Height	Wave Period
# data = data.rename(columns={
#     'Battery Life': 'batttery_life',
#     'Transducer Depth': 'transducer_depth',
#     'Turbidity': 'turbidity',
#     'Water Temperature': 'water_temperature',
#     'Wave Height': 'wave_height',
#     'Wave Period': 'wave_period'
# })

# data.columns.name = None

# data.to_parquet('../data/rainbow_beach.parquet')

data.head()

Unnamed: 0,TIME,batttery_life,transducer_depth,turbidity,water_temperature,wave_height,wave_period
0,2014-06-15 00:00:00,11.6,1.495,0.85,16.6,0.136,3.0
1,2014-06-15 01:00:00,11.6,1.42,0.87,16.3,0.117,4.0
2,2014-06-15 02:00:00,11.6,1.478,0.79,16.1,0.114,7.0
3,2014-06-15 03:00:00,11.6,1.518,0.76,15.9,0.111,3.0
4,2014-06-15 04:00:00,11.6,1.507,0.77,15.7,0.107,3.0


## Simple feature engineering for timeseries data

The class `sam.feature_engineering.SimpleFeatureEngineering` is used to create common features for timeseries data: rolling features and time components.

A rolling feature can be parameterized by a tuple of the form `(column_name, method, rolling_window)`, where `column_name` is the name of the column to be used as the time series, `method` is the type of rolling feature (e.g. "mean", "lag", "max"), and `rolling_window` is the size of the rolling window.

A time component can be either be descibed by a dummy variable (one hot encoding) or a cyclic variable (sin/cos). To parameterize a cyclic variable, a tuple of the form `(period, component_type)` is used.

In [29]:
from sam.feature_engineering import SimpleFeatureEngineer


sfe = SimpleFeatureEngineer(
    rolling_features=[
        ("water_temperature", "mean", "1D"),
        ("water_temperature", "mean", "2D"),
        ("turbidity", "max", "1D"),
        ("turbidity", "max", "2D"),
    ],
    time_features=[
        ("hour_of_day", "cyclical"),
        ("day_of_week", "cyclical"),
    ],
    time_col="TIME",  # leave as None if using a time index
)

sfe.fit_transform(data)

Unnamed: 0,water_temperature_mean_1D,water_temperature_mean_2D,turbidity_max_1D,turbidity_max_2D,hour_of_day_cyclical_sin,hour_of_day_cyclical_cos,day_of_week_cyclical_sin,day_of_week_cyclical_cos
0,16.600000,16.600000,0.85,0.85,0.000000,1.000000,-0.781831,0.62349
1,16.450000,16.450000,0.87,0.87,0.258819,0.965926,-0.781831,0.62349
2,16.333333,16.333333,0.87,0.87,0.500000,0.866025,-0.781831,0.62349
3,16.225000,16.225000,0.87,0.87,0.707107,0.707107,-0.781831,0.62349
4,16.120000,16.120000,0.87,0.87,0.866025,0.500000,-0.781831,0.62349
...,...,...,...,...,...,...,...,...
661,18.604762,18.408889,4.91,4.91,-0.500000,-0.866025,0.781831,0.62349
662,18.576190,18.373333,4.91,4.91,-0.707107,-0.707107,0.781831,0.62349
663,18.538095,18.333333,4.91,4.91,-0.866025,-0.500000,0.781831,0.62349
664,18.452632,18.213953,4.91,4.91,-0.965926,0.258819,0.781831,0.62349


## Custom feature engineering function

If you want more freedom and customize your feature engineering, you can use `sam.feature_engineering.FeatureEngineering` to create your own feature engineering transformer from a feature engineering function. This class provides methods to make sure the interface is compatible with sam models.


In [30]:
from sam.feature_engineering import FeatureEngineer

def my_feature_engineering(X, y=None):
    """Don't forget documentation
    """
    X_out = X.copy()
    X_out = X_out[["battery_life", "water_temperature", "turbidity"]]
    X_out['my_feature'] = X_out['water_temperature'].rolling(window=24).mean().pow(2)
    return X_out

my_fe = FeatureEngineer(my_feature_engineering)

my_fe.fit_transform(data)

Unnamed: 0,battery_life,water_temperature,turbidity,my_feature
0,11.6,16.6,0.85,
1,11.6,16.3,0.87,
2,11.6,16.1,0.79,
3,11.6,15.9,0.76,
4,11.6,15.7,0.77,
...,...,...,...,...
661,11.4,18.7,2.22,345.030625
662,11.4,18.8,3.20,346.425156
663,11.4,18.7,2.48,347.045851
664,11.4,18.8,2.80,346.890625


## Customized feature engineering class

If a single function does not fit your needs, you can create your own feature engineering class. By creating a subclass of `sam.feature_engineering.BaseFeatureEngineer`, you can implement your own feature engineering function. You only need to implement the `feature_engineer_` method. If you want to fit certain parameters, you can implement the `fit` method as well. Check the current implementation of `BaseFeatureEngineer` or `SimpleFeatureEngineer` for an example.

