# Feature Engineering

## Libraries

In [1]:
import numpy as np
import pandas as pd
from cnr_methods import get_simplified_data 

# Feature Engineering Library for Time Series
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import make_forecasting_frame
from tsfresh.utilities.dataframe_functions import impute

from sklearn.ensemble import RandomForestRegressor
# Feature Selection Library
from boruta import BorutaPy

## Read Data

In [7]:
full_data = get_simplified_data()

TypeError: unsupported operand type(s) for -: 'int' and 'str'

In [6]:
full_data

NameError: name 'full_data' is not defined

In [None]:
full_data.head()

To simplify the work, we will generate features for just one Wind Farm. When doing modelling, the features, as the models, will be generated for all Wind Farms separately.

In [None]:
WF = 'WF1'
data = full_data[full_data['WF']==WF]

## Tsfresh

First, we use Tsfresh, a Python Library that automates Feature Engineering for Time Series Data. We generate new features for all the columns on the Simplified Data, as done below.

In [None]:
feature_data = pd.DataFrame()
for wf in full_data['WF'].unique():
    for variable in ['T', 'CLCT', 'Wind Speed 100m','Wind Direction 100m', 'Wind Speed 10m', 'Wind Direction 10m']:
        df_shift, y = make_forecasting_frame(data[variable],kind=variable,max_timeshift=20,rolling_direction=1)
        X = extract_features(df_shift, column_id="id", column_sort="time", column_value="value", impute_function=impute,show_warnings=False,n_jobs=3)
        X['Feature'] = variable
        feature_data = feature_data.append(X)

## Feature Selection

Here we do the Feature Selection using Borutapy, a Python Implementation of the Famous R Method. For the method we use a Random Forest Regressor.

In [None]:
rf = RandomForestClassifier(n_jobs=-1, class_weight='balanced', max_depth=5)

In [None]:
feat_selector = BorutaPy(rf, n_estimators='auto', verbose=2, random_state=1)

In [None]:
feat_selector.fit(X, y)

In [None]:
feat_selector.ranking_