# Attention: AML Workspace & Auth
> When connecting to the AML workspace, you will need to look out for the device code to do an in-browser interactive authentication.  
The first section may not require it, but we encourage you to have an AML workspace ready for the full notebook.

# Load our dataset

In this example, we will be using a small example dataset which contains information about New York city energy demand across time.
The dataset is already split into training and validation data.

In [1]:
import os
from pathlib import Path

import pandas as pd

data_dir = os.path.join("./data")

training_data = pd.read_csv(os.path.join(data_dir, "nyc_data_train.csv"), parse_dates=True)
training_data.head()


Unnamed: 0,timeStamp,demand,precip,temp
0,2012-01-01 00:00:00,4937.5,0.0,46.13
1,2012-01-01 01:00:00,4752.1,0.0,45.89
2,2012-01-01 02:00:00,4542.6,0.0,45.04
3,2012-01-01 03:00:00,4357.7,0.0,45.03
4,2012-01-01 04:00:00,4275.5,0.0,42.61


In [2]:
validation_data = pd.read_csv(os.path.join(data_dir, "nyc_data_validation.csv"), parse_dates=True)
validation_data.head()

Unnamed: 0,timeStamp,demand,precip,temp
0,2016-07-25 14:00:00,10406.97,0.0,88.87
1,2016-07-25 15:00:00,10543.54,0.0,90.77
2,2016-07-25 16:00:00,10483.26,0.0091,89.9
3,2016-07-25 17:00:00,10087.47,0.0,85.11
4,2016-07-25 18:00:00,9593.675,0.3383,83.7


# Running our pipeline on AML

From this point on, we will require an AML workspace to work with.
For more details on creating an AML workspace, see [Quickstart: Create workspace resources you need to get started with Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources)  

**Replace the placeholder AML workspace details below with the workspace you have access to.**

# Experiment Setup

We begin by using the AML SDK to establish the AML workspace, experiment and compute target we will be utilizing. 

In [3]:
from azureml.core import Workspace, Experiment, ComputeTarget, Dataset, Environment
from azureml.core.compute import AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Replace these with your own AML workspace details
subscription_id = "<SUBSCRIPTION ID HERE>"
resource_group = "<RESOURCE GROUP HERE>"
workspace_name = "<AML WORKSPACE NAME HERE>"

# Choose a name for your CPU cluster, existing ones can be used.
cpu_cluster_name = "zendikon-cpu-ds3"

# AML Setup
workspace = Workspace(
      subscription_id=subscription_id,
      resource_group=resource_group,
      workspace_name=workspace_name
)
print('Workspace name: ' + workspace.name,
      'Subscription id: ' + workspace.subscription_id,
      'Resource group: ' + workspace.resource_group, sep='\n')

experiment_name = "reusable_pipeline_time_series_forecasting"
experiment = Experiment(workspace=workspace, name=experiment_name)

# Verify that the cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=workspace, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4, 
                                                           idle_seconds_before_scaledown=2400)
    compute_target = ComputeTarget.create(workspace, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Workspace name: zendikon-test
Subscription id: 7df29d08-a878-4f14-8044-00033608a1db
Resource group: zendikon-test
Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


# Preparing dataset for pipeline
Registered datasets in AML are used as input datasets to Zendikon pipelines. We can achieve this in several ways:

1. Use [AML Studio (UI)](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-connect-data-ui?tabs=credential) to use an existing external datastore and register datasets from it.
2. The same can be achieved with the [AML SDK](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets).

However for simplicity, we directly upload and register the small dataset we have loaded in this example with the SDK.
Only do this step if you decide to register the dataset via code instead of using the UI.

In [8]:
datastore = workspace.get_default_datastore()
training_dataset_name = "nyc_energy_training_data"
validation_dataset_name = "nyc_energy_validation_data"

# Register and upload the entire beer dataset to the workspace
training_dataset = Dataset.Tabular.register_pandas_dataframe(training_data, datastore, name=training_dataset_name, show_progress=True)
validation_dataset = Dataset.Tabular.register_pandas_dataframe(validation_data, datastore, name=validation_dataset_name, show_progress=True)

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/c94f36f0-abc3-40af-a47c-1214421d2bee/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.
Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/4a393720-eb46-41e5-b4a3-35e4db2863b3/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


## Prepare the forecasting parameters to be used in the pipeline

In [9]:
from azureml.automl.core.forecasting_parameters import ForecastingParameters

target_column_name = "demand"
time_column_name = "timeStamp"
forecast_horizon = 48
freq = "H"

# Forecasting Parameters
forecasting_parameters = ForecastingParameters(
    time_column_name=time_column_name,
    forecast_horizon=forecast_horizon,
    freq=freq,
)

## Create the pipeline instance

primary_metric=“NormRMSE” (normalized root mean squared error, by default). In order to change the primary metric, specify the parameter primary_metric when calling TimeSeriesForecastingPipeline.from_default_settings.


In [10]:
from zendikon.pipelines.time_series.forecasting import TimeSeriesForecastingPipeline

pipeline = TimeSeriesForecastingPipeline.from_default_settings(
    training_dataset=training_dataset_name,
    validation_dataset=validation_dataset_name,
    forecasting_parameters=forecasting_parameters,
    label_column_name=target_column_name,
    compute_targets=[compute_target])


If this is not your intention, we recommend interrupting the command using Ctrl + c and check your pipeline config.



# Submit Pipeline
In order to submit the pipeline, in the first submission, you will need to specify `add_zendikon_feed=True` and `personal_access_token`. This will allow Zendikon package to be installed from Zendikon feed to your pipeline when the pipeline is running on your AML workspace. This only need to be done once! After the first submission, you can specify `add_zendikon_feed=False` (default setting) and leave `personal_access_token` to be `None`.

The pipeline will now execute remotely on our specified compute target, and we can track the progress in AML Studio with the generated link below:

In [11]:
pipeline.submit(experiment, wait_for_completion=False, add_zendikon_feed=True, personal_access_token="<YOUR PAT>")

Created step zendikon_automl [b54fe52d][88cfd81c-29d4-4820-8520-cc458d5f6614], (This step will run and generate new outputs)
Created step zendikon AutoML evaluation [1c94dfda][0b401bd3-4593-4aca-ab07-5faf064b8bd4], (This step will run and generate new outputs)
Created step Models metrics summary [4c79941e][7393a29c-efbe-4bc2-958f-22f259999571], (This step will run and generate new outputs)
Submitted PipelineRun 217d1684-69f3-4d8c-9e43-c649655937e6
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/217d1684-69f3-4d8c-9e43-c649655937e6?wsid=/subscriptions/7df29d08-a878-4f14-8044-00033608a1db/resourcegroups/zendikon-test/workspaces/zendikon-test&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Experiment,Id,Type,Status,Details Page,Docs Page
reusable_pipeline_time_series_forecasting,217d1684-69f3-4d8c-9e43-c649655937e6,azureml.PipelineRun,Running,Link to Azure Machine Learning studio,Link to Documentation
