## Running a Gordo workflow locally:

This demonstrates the basic workflow of gordo, running locally.

---

### Import and initialize a Gordo dataset
In this case we shall be using the `DataLakeProvider` where `InfluxDataProvider` is also available.

In [1]:
import dateutil.parser
import yaml

from datetime import datetime

from gordo_components.dataset.datasets import TimeSeriesDataset
from gordo_components.data_provider.providers import DataLakeProvider
from gordo_components import serializer

data_provider = DataLakeProvider(storename="dataplatformdlsdev", interactive=True)
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2018-01-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2018-12-10T00:00:00+00:00'),
    tag_list=[
       "GRA-FIC -13-0041X.PV"
    ],
    asset="2000-emj",
    data_provider=data_provider)

[2019-11-26 14:51:04,254] INFO [gordo_components.dataset.sensor_tag.normalize_sensor_tags:140] Normalizing list of sensors in some format into SensorTags: ['GRA-FIC -13-0041X.PV']


'\ndata_provider = DataLakeProvider(storename="dataplatformdlsprod", interactive=True)\ndataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse(\'2017-01-01T00:10:00+00:00\'),\n    to_ts=dateutil.parser.isoparse(\'2017-01-10T00:00:00+00:00\'),\n    tag_list=[\n        \'asgb.19PST3925/DispMeasOut/PRIM\'\n    ],\n    data_provider=data_provider)\n'

### We'll need to login to Azure to authenticate the ability load data from the Data Lake

In [2]:
X, y = dataset.get_data()

[2019-11-26 14:51:04,266] INFO [gordo_components.data_provider.azure_utils.get_datalake_token:34] Attempting to use interactive azure authentication
[2019-11-26 14:51:04,267] DEBUG [adal-python.debug:121] 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c - Authority:Performing instance discovery: ...
[2019-11-26 14:51:04,268] DEBUG [adal-python.debug:121] 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c - Authority:Performing static instance discovery
[2019-11-26 14:51:04,268] DEBUG [adal-python.debug:121] 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c - Authority:Authority validated via static instance discovery
[2019-11-26 14:51:04,269] INFO [adal-python.info:114] 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c - CodeRequest:Getting user code info.
[2019-11-26 14:51:04,274] DEBUG [urllib3.connectionpool._new_conn:959] Starting new HTTPS connection (1): login.microsoftonline.com:443
[2019-11-26 14:51:04,800] DEBUG [urllib3.connectionpool._make_request:437] https://login.microsoftonline.com:443 "POST /common/oauth2/devicecode?api-

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DAVM5FGB9 to authenticate.


[2019-11-26 14:51:05,252] DEBUG [urllib3.connectionpool._make_request:437] https://login.microsoftonline.com:443 "POST /common/oauth2/token HTTP/1.1" 400 469
[2019-11-26 14:51:10,259] DEBUG [urllib3.connectionpool._new_conn:959] Starting new HTTPS connection (1): login.microsoftonline.com:443
[2019-11-26 14:51:10,701] DEBUG [urllib3.connectionpool._make_request:437] https://login.microsoftonline.com:443 "POST /common/oauth2/token HTTP/1.1" 400 469
[2019-11-26 14:51:15,713] DEBUG [urllib3.connectionpool._new_conn:959] Starting new HTTPS connection (1): login.microsoftonline.com:443
[2019-11-26 14:51:16,370] DEBUG [urllib3.connectionpool._make_request:437] https://login.microsoftonline.com:443 "POST /common/oauth2/token HTTP/1.1" 200 6959
[2019-11-26 14:51:16,380] DEBUG [adal-python.debug:121] 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c - OAuth2Client:Get token with device code Server returned this correlation_id: 1b0c9d9c-202f-45a1-92e9-367c6ac44b7c
[2019-11-26 14:51:16,385] DEBUG [adal-python

In [3]:
X.head()

Unnamed: 0,GRA-FIC -13-0041X.PV
2018-08-29 07:00:00+00:00,0.0
2018-08-29 07:10:00+00:00,0.0
2018-08-29 07:20:00+00:00,0.0
2018-08-29 07:30:00+00:00,0.0
2018-08-29 07:40:00+00:00,0.0


### Define a pipeline for model building

In [4]:
config = yaml.load(
    """ 
    sklearn.pipeline.Pipeline:
        steps:
          - sklearn.preprocessing.data.MinMaxScaler
          - gordo_components.model.models.KerasAutoEncoder:
              kind: feedforward_hourglass
    """
)
pipe = serializer.pipeline_from_definition(config)
pipe

  
[2019-11-26 14:04:43,216] DEBUG [gordo_components.serializer.pipeline_from_definition._build_step:115] Building step: {'sklearn.pipeline.Pipeline': {'steps': ['sklearn.preprocessing.data.MinMaxScaler', {'gordo_components.model.models.KerasAutoEncoder': {'kind': 'feedforward_hourglass'}}]}}
[2019-11-26 14:04:43,217] DEBUG [gordo_components.serializer.pipeline_from_definition._build_step:115] Building step: sklearn.preprocessing.data.MinMaxScaler
[2019-11-26 14:04:43,418] DEBUG [gordo_components.serializer.pipeline_from_definition._build_step:115] Building step: {'gordo_components.model.models.KerasAutoEncoder': {'kind': 'feedforward_hourglass'}}


Pipeline(memory=None,
         steps=[('step_0', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('step_1', KerasAutoEncoder(kind='feedforward_hourglass'))],
         verbose=False)

### AutoEncoders were agreed to meet the specifications of a `Transformer`. Therefore, they do not implement a `predict` method.

We shall then call `fit_transform` or `fit` -> `transform` if desired to treat datasets separately. 

In [5]:
pipe.fit(X, y=X.copy())  # Our target is just X
xhat = pipe.predict(X)

  return self.partial_fit(X, y)


Epoch 1/1


---
### `xhat` is now the auto-encoded result*

*where the first half of each resulting sample was the _input_ to the model and secondhalf is the _output_

In [6]:
xhat

array([[0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       ...,
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32450209, 0.26762422, 0.32392147, 0.27555713]])

## Using custom or multiple aggregation methods
TimeSeriesDataset supports customization of the aggregation method used for the resampled buckets, and it can even use multiple aggregation methods. 

### Custom aggregation method

In [6]:
# Remember to load the first cell to have the required imports
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods="max",
    data_provider=data_provider)
X, y = dataset.get_data()
X.head()

Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2016-07-01 07:40:00+00:00,100.032417,46.330772
2016-07-01 07:50:00+00:00,100.032417,46.330772
2016-07-01 08:00:00+00:00,100.032417,46.330772
2016-07-01 08:10:00+00:00,100.032417,46.330772
2016-07-01 08:20:00+00:00,100.032417,46.330772


### Multiple aggregation methods
When using multiple aggregation methods the returned dataframe will have multi-level columns, with the tag-name as top-level  and aggregation method as the second level. 

In [7]:
# Remember to load the first cell to have the required imports
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods=["max","min","mean"],
    data_provider=data_provider)
X, y = dataset.get_data()
X.head()

tag,asgb.19ZT3950%2FY%2FPRIM,asgb.19ZT3950%2FY%2FPRIM,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
aggregation_method,max,min,mean,max,min,mean
2016-07-01 07:40:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 07:50:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:00:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:10:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:20:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
