# Create a Pipeline
We provide three ways of creating a pipeline.
* Functional API
* Imperative API
* Constructor API

In the following, we briefly describe all three APIs.

In [1]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_regression

# From pyWATTS the pipeline is imported
from pywatts.callbacks import LinePlotCallback
from pywatts_pipeline.core.util.computation_mode import ComputationMode
from pywatts_pipeline.core.pipeline import Pipeline
# All modules required for the pipeline are imported
from pywatts.modules import CalendarExtraction, CalendarFeature, ClockShift, LinearInterpolater, SKLearnWrapper, Sampler
from pywatts.modules.preprocessing.select import Select
from pywatts.summaries import RMSE

C:\Users\bi4372\.conda\envs\pywatts_workshop\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
C:\Users\bi4372\.conda\envs\pywatts_workshop\lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll


## Functional API

The functional API provides an easy way to create pipelines. However, it requires that the call dunder is implemented in the used transformers/modules, which is the case for pyWATTS transformers.
The API is inspired by the functional API of Keras. In general the notation is as follows:

```Transformer()(x=predeccessor, y=predecessor, ...)```

In the following, we show how a simple Pipeline can be created with the Functional API

In [2]:
functional_api_pipeline = Pipeline(path="../results")

# Extract dummy calendar features, using holidays from Germany
# NOTE: CalendarExtraction can't return multiple features.
calendar = CalendarExtraction(continent="Europe",
                              country="Germany",
                              features=[CalendarFeature.month, CalendarFeature.weekday,
                                        CalendarFeature.weekend]
                              )(x=functional_api_pipeline["load_power_statistics"])

# Deal with missing values through linear interpolation
imputer_power_statistics = LinearInterpolater(method="nearest", dim="time",
                                              name="imputer_power"
                                              )(x=functional_api_pipeline["load_power_statistics"])

# Scale the data using a standard SKLearn scaler
power_scaler = SKLearnWrapper(module=StandardScaler(), name="scaler_power")
scale_power_statistics = power_scaler(x=imputer_power_statistics)

# Create lagged time series to later be used as regressors
lag_features = Select(start=-2, stop=0, step=1)(x=scale_power_statistics)

scaler_target = SKLearnWrapper(module=StandardScaler(), name="scaler_power")
scaled_target = scaler_target(x=imputer_power_statistics)
target_multiple_output = Select(start=0, stop=24, step=1, name="sampled_data")(x=scaled_target)

# Select features based on F-statistic
selected_features = SKLearnWrapper(
    module=SelectKBest(score_func=f_regression, k=2), name="kbest"
)(
    lag_features=lag_features,
    calendar=calendar,
    target=scale_power_statistics,
)

# Create a linear regression that uses the lagged values to predict the current value
# NOTE: SKLearnWrapper has to collect all **kwargs itself and fit it against target.
#       It is also possible to implement a join/collect class
regressor_power_statistics = SKLearnWrapper(
    module=LinearRegression(fit_intercept=True)
)(
    features=selected_features,
    target=target_multiple_output,
    callbacks=[LinePlotCallback("linear_regression")],
)

# Rescale the predictions to be on the original time scale
inverse_power_scale = scaler_target(
    x=regressor_power_statistics, computation_mode=ComputationMode.Transform,
    method="inverse_transform", callbacks=[LinePlotCallback("rescale")]
)

# Calculate the root mean squared error (RMSE) between the linear regression and the true values
# save it as csv file
rmse = RMSE()(y_hat=inverse_power_scale, y=target_multiple_output)


In [3]:
functional_api_pipeline.steps

[(<pywatts_pipeline.core.steps.start_step.StartStep at 0x1eb3e6cbee0>,
  'load_power_statistics'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e77cb80>,
  'CalendarExtraction_1'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb02b1b400>, 'imputer_power_2'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7ba0a0>, 'scaler_power_3'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7ba6d0>, 'SampleModule_4'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7baa90>, 'scaler_power_5'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7baa60>, 'sampled_data_6'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7c0880>, 'kbest_7'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7c02e0>,
  'LinearRegression_8'),
 (<pywatts_pipeline.core.steps.step.Step at 0x1eb3e7c6730>,
  'scaler_power_5_9'),
 (<pywatts_pipeline.core.steps.summary_step.SummaryStep at 0x1eb3e7c6520>,
  'RMSE_10')]

## Imperative API

The imperative API is an alternative API for pyWATTS Pipelines. It can be used if the transformers do not implement a call dunder.
The general notation is as follows

```TODO```

In the following, we implement the same pipeline as above with a functional API.

In [4]:
imperative_api_pipeline = Pipeline()

imperative_api_pipeline.add(
    CalendarExtraction(continent="Europe", country="Germany",
                       features=[CalendarFeature.month, CalendarFeature.weekday,
                                 CalendarFeature.weekend]),
    "calendar",
    {"x": "load_power_statistics"}
)

imperative_api_pipeline.add(
    LinearInterpolater(method="nearest", dim="time", name="imputer_power"),
    "imputer",
    {"x": "load_power_statistics"}
)
power_scaler = SKLearnWrapper(module=StandardScaler(), name="scaler_power")

imperative_api_pipeline.add(
    power_scaler,
    "scaler",
    {"x": "imputer"}
)

imperative_api_pipeline.add(
    Select(start=-2, stop=0, step=1),
    "lag_features",
    {"x": "scaler"}
)

imperative_api_pipeline.add(
    Select(start=0, stop=24, step=1),
    "target",
    {"x": "scaler"}
)

imperative_api_pipeline.add(
    SKLearnWrapper(module=SelectKBest(score_func=f_regression, k=2), name="kbest"),
    "selected_features",
    {"lag_features": "lag_features",
     "calendar": "calendar",
     "target": "scaler"}
)

imperative_api_pipeline.add(
    SKLearnWrapper(module=LinearRegression(fit_intercept=True)),
    "regression",
    {"selected_features": "selected_features",
     "target": "target"}
)

imperative_api_pipeline.add(
    power_scaler,
    "inverse_scaler",
    {"x": "regression"},
    method="inverse_transform",
    callbacks=[LinePlotCallback("rescale")],
    computation_mode=ComputationMode.Transform
)

imperative_api_pipeline.add(
    RMSE(),
    "rmse",
    {"y_hat": "inverse_scaler",
     "y": "target"},
)


<pywatts_pipeline.core.steps.step_information.StepInformation at 0x1eb3e7c0e20>

In [5]:
pipeline = Pipeline(
    steps=[
        (CalendarExtraction(continent="Europe", country="Germany",
                            features=[CalendarFeature.month, CalendarFeature.weekday,
                                      CalendarFeature.weekend]),
         "calendar",
         {"x": "load_power_statistics"}, {}),
        (LinearInterpolater(method="nearest", dim="time", name="imputer_power"),
         "imputer",
         {"x": "load_power_statistics"}, {}),
        (power_scaler,
         "scaler",
         {"x": "imputer"}, {}),
        (Select(start=-2, stop=0, step=1),
         "lag_features",
         {"x": "scaler"}, {}),
        (Select(start=0, stop=24, step=1),
         "target",
         {"x": "scaler"}, {}),
        (SKLearnWrapper(module=SelectKBest(score_func=f_regression, k=2), name="kbest"),
         "selected_features",
         {"lag_features": "lag_features",
          "calendar": "calendar",
          "target": "scaler"}, {}),
        (SKLearnWrapper(module=LinearRegression(fit_intercept=True)),
         "regression",
         {"selected_features": "selected_features",
          "target": "target"}, {}),
        (power_scaler,
         "inverse_scaler",
         {"x": "regression"},
         {"method": "inverse_transform",
          "callbacks": [LinePlotCallback("rescale")],
          "computation_mode": ComputationMode.Transform}),
        (RMSE(),
         "rmse",
         {"y_hat": "inverse_scaler",
          "y": "target"}, {})
    ]
)


In [6]:

# Now, the pipeline is complete, so we can run it and explore the results
# Start the pipeline
data = pd.read_csv("../data/getting_started_data.csv",
                   index_col="time",
                   parse_dates=["time"],
                   infer_datetime_format=True,
                   sep=",")
train = data.iloc[:6000, :]
test = data.iloc[6000:, :]


In [7]:
pipeline.train(data=train)
pipeline.test(data=test)

  y = column_or_1d(y, warn=True)


({'inverse_scaler': <xarray.DataArray (time: 2758, dim_0: 24)>
  array([[43784.37968947, 44585.98767846, 46033.62055118, ...,
          49929.25950573, 48459.51745291, 47531.47191973],
         [43863.43546587, 45095.85801789, 46830.32152086, ...,
          48623.11376573, 47609.78597417, 47286.44651005],
         [44481.44016144, 46053.34130171, 47980.73351623, ...,
          47835.39356861, 47265.71193995, 47493.23873665],
         ...,
         [48946.71547956, 47752.69557672, 47542.44980835, ...,
          58382.33128245, 55634.01406664, 52644.64087873],
         [48881.27605756, 48852.27352017, 49434.30822163, ...,
          54728.87453212, 53172.59737633, 51781.11801085],
         [49325.56967491, 49982.3026384 , 51001.92720637, ...,
          52788.26133735, 52006.60213568, 51620.25933718]])
  Coordinates:
    * time     (time) datetime64[ns] 2018-09-08T02:00:00 ... 2018-12-31T23:00:00
    * dim_0    (dim_0) int32 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23},
 '# Summa

In [8]:
functional_api_pipeline.train(data=train)
functional_api_pipeline.test(data=test)

  y = column_or_1d(y, warn=True)


({'scaler_power_5_9': <xarray.DataArray (time: 2758, dim_0: 24)>
  array([[43784.37968947, 44585.98767846, 46033.62055118, ...,
          49929.25950573, 48459.51745291, 47531.47191973],
         [43863.43546587, 45095.85801789, 46830.32152086, ...,
          48623.11376573, 47609.78597417, 47286.44651005],
         [44481.44016144, 46053.34130171, 47980.73351623, ...,
          47835.39356861, 47265.71193995, 47493.23873665],
         ...,
         [48946.71547956, 47752.69557672, 47542.44980835, ...,
          58382.33128245, 55634.01406664, 52644.64087873],
         [48881.27605756, 48852.27352017, 49434.30822163, ...,
          54728.87453212, 53172.59737633, 51781.11801085],
         [49325.56967491, 49982.3026384 , 51001.92720637, ...,
          52788.26133735, 52006.60213568, 51620.25933718]])
  Coordinates:
    * time     (time) datetime64[ns] 2018-09-08T02:00:00 ... 2018-12-31T23:00:00
    * dim_0    (dim_0) int32 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23},
 '# Sum

In [9]:
imperative_api_pipeline.train(data=train)
imperative_api_pipeline.test(data=test)

  y = column_or_1d(y, warn=True)


({'inverse_scaler': <xarray.DataArray (time: 2758, dim_0: 24)>
  array([[43784.37968947, 44585.98767846, 46033.62055118, ...,
          49929.25950573, 48459.51745291, 47531.47191973],
         [43863.43546587, 45095.85801789, 46830.32152086, ...,
          48623.11376573, 47609.78597417, 47286.44651005],
         [44481.44016144, 46053.34130171, 47980.73351623, ...,
          47835.39356861, 47265.71193995, 47493.23873665],
         ...,
         [48946.71547956, 47752.69557672, 47542.44980835, ...,
          58382.33128245, 55634.01406664, 52644.64087873],
         [48881.27605756, 48852.27352017, 49434.30822163, ...,
          54728.87453212, 53172.59737633, 51781.11801085],
         [49325.56967491, 49982.3026384 , 51001.92720637, ...,
          52788.26133735, 52006.60213568, 51620.25933718]])
  Coordinates:
    * time     (time) datetime64[ns] 2018-09-08T02:00:00 ... 2018-12-31T23:00:00
    * dim_0    (dim_0) int32 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23},
 '# Summa