# AutoML 016: auto-ml-datetime-feature-transformer-test
Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.


In this example we demonstrate how AutoML's extracts features from datetime data.

Make sure you have executed the [setup](setup.ipynb) before running this notebook.

In this notebook you will see

1. Extraction of date/time features
2. Validation of generated features using a validation set

In [None]:
# AzureML imports
import azureml.core
from azureml.core import RunConfiguration
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun

# Pandas and Numpy imports
import logging
import numpy as np
import pandas as pd

In [None]:
# Get the workspace
ws = Workspace.from_config()

# choose a name for the run history container in the workspace
experiment_name = 'automl-local-datetime-features'
# project folder
project_folder = './sample_projects/automl-local-datetime-features'
# Create an experiment
experiment=Experiment(ws, experiment_name)

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace Name'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
pd.DataFrame(data = output, index = ['']).T

## Diagnostics
Opt-in diagnostics collection for better experience, quality, and security of future releases

In [None]:
from azureml.telemetry import set_diagnostics_collection
set_diagnostics_collection(send_diagnostics=True)

In [None]:
# Columns to read from raw data
fields = ['date', 'label']

# Read the test data
dataset = pd.read_csv('featurizers_test_data.csv', usecols=fields)

# Number of samples in test data
number_of_samples_in_test_data = 10

# Output label
y = dataset['label']

# Training data
X = dataset.drop('label', axis=1)

# Dump first ten rows of X
print(X.head(10))

# Split the data into train and test
X_train = X.iloc[0:X.shape[0] - number_of_samples_in_test_data]
X_test = X.iloc[X.shape[0] - number_of_samples_in_test_data:X.shape[0]]
y_train = y.iloc[0:y.shape[0] - number_of_samples_in_test_data].values
y_test = y.iloc[y.shape[0] - number_of_samples_in_test_data:y.shape[0]].values

## Instantiate Auto ML Classifier

This creates an Experiment in Azure ML. You can reuse this object to trigger multiple runs. Each run will be part of the same experiment.

|Property|Description|
|-|-|
|**primary_metric**|This is the metric that you want to optimize.<br> Auto ML Classifier supports the following primary metrics <br><i>AUC_macro</i><br><i>AUC_weighted</i><br><i>accuracy</i><br><i>weighted_accuracy</i><br><i>norm_macro_recall</i><br><i>balanced_accuracy</i><br><i>average_precision_score_weighted</i>|
|**iteration_timeout_minutes**|Time limit in minutes for each iteration|
|**iterations**|Number of iterations. In each iteration Auto ML Classifier trains the data with a specific pipeline|
|**n_cross_validations**|Number of cross validation splits|

In [None]:
automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             primary_metric = 'accuracy',
                             iteration_timeout_minutes = 60,
                             iterations = 5,
                             n_cross_validations = 4,
                             preprocess=True,
                             verbosity = logging.INFO,
                             X = X_train, 
                             y = y_train,
                             path=project_folder)

## Training the Model

You can call the submit method on the AutoML experiment instance and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.
You will see the currently running iterations printing to the console.

*submit* method on Auto ML Classifier triggers the training of the model. It can be called with the following parameters

|**Parameter**|**Description**|
|-|-|
|**automl_config**|Indicates the AutoML configuration to use
|**show_output**| True/False to turn on/off console output|

In [None]:
local_run = experiment.submit(automl_config, show_output=True)

## Looking at the AutoML categorical transformations

#### Viewing the transformed feature names

Since the datetime values in the data get transformed into many features, these features are named using their engineered feature names. We can view the engineered feature names using the best model by calling the method *get_output*.

In [None]:
# Get the parent run
parent_run = AutoMLRun(experiment=experiment, run_id=local_run.run_id)

# Find the best fitted model for the given run
best_run, fitted_model = parent_run.get_output(metric='accuracy')

# Print the engineered feature names
print("List of engineered feature names")        
for engineered_feature_name in fitted_model.named_steps.datatransformer.get_engineered_feature_names():
    print('\t' + engineered_feature_name)

# Test if the engineered feature names for the transformed data is as per expectation
expected_engineered_feature_names = ['date_ModeImputer_Year', 
                                     'date_ModeImputer_Month', 
                                     'date_ModeImputer_Day', 
                                     'date_ModeImputer_DayOfWeek', 
                                     'date_ModeImputer_DayOfYear', 
                                     'date_ModeImputer_QuarterOfYear', 
                                     'date_ModeImputer_WeekOfMonth', 
                                     'date_ModeImputer_Hour', 
                                     'date_ModeImputer_Minute', 
                                     'date_ModeImputer_Second']
assert all([a == b for a, b in zip(expected_engineered_feature_names, fitted_model.named_steps.datatransformer.get_engineered_feature_names())])

#### Transforming the test data using data transfromer
Given the best fitted model, transform the test data 

In [None]:
# Transform the test data using the data transformer for datetime data
x_test_transform = pd.DataFrame(fitted_model.named_steps.datatransformer.transform(X_test))

# Dump the transformed data
print(x_test_transform)

# Test the transformed data against the expected transformed data frame
expected_x_test_transform = [[2017, 12, 14,  3, 348, 4, 2, 0, 0, 0],
                             [2017, 11, 30,  3, 334, 4, 5, 0, 0, 0],
                             [2018,  1,  8,  0,   8, 1, 2, 0, 0, 0],
                             [2017, 12, 19,  1, 353, 4, 3, 0, 0, 0],
                             [2018,  2, 10,  5,  41, 1, 2, 0, 0, 0],
                             [2018,  5, 12,  5, 132, 2, 2, 0, 0, 0],
                             [2017, 11, 28,  1, 332, 4, 4, 0, 0, 0],
                             [2018,  3, 19,  0,  78, 1, 3, 0, 0, 0],
                             [2018,  2, 12,  0,  43, 1, 2, 0, 0, 0],
                             [2018,  3, 19,  0,  78, 1, 3, 0, 0, 0]]
assert((x_test_transform.values == pd.DataFrame(data=expected_x_test_transform).values).all())