# VI - Automatic Model Selection from Parametric Sweep using Task Dependencies
In this notebook we will be taking the example from the [Parametric Sweep](04_Parameter_Sweep.ipynb) notebook and automating the entire chain using task dependencies in a single Azure Batch job.

* [Setup](#section1)
* [Configure job](#section2)
* [Submit job](#section3)
* [Download best model](#section4)
* [Delete job](#section5)

<a id='section1'></a>

## Setup

Create a simple alias for Batch Shipyard

In [None]:
%alias shipyard SHIPYARD_CONFIGDIR=config python $HOME/batch-shipyard/shipyard.py %l

Check that everything is working

In [None]:
shipyard

Read in the account information we saved earlier

In [None]:
import json

def read_json(filename):
    with open(filename, 'r') as infile:
        return json.load(infile)
    
account_info = read_json('account_information.json')

storage_account_key = account_info['storage_account_key']
storage_account_name = account_info['storage_account_name']

<a id='section2'></a>

## Configure Job

As in the previous job we ran on a single node we will be running the job on GPU enabled nodes. The difference here is that depending on the number of combinations we will be creating the same number of tasks. Each task will have a different set of parmeters that we will be passing to our model training script. This parameters effect the training of the model and in the end the performance of the model. The model and results of its evaluation are recorded and stored on the node. At the end of the task the results are pulled into the specified storage container.

In [None]:
from copy import copy
import os
import random
from sklearn.grid_search import ParameterGrid
from toolz import curry, pipe

def write_json_to_file(json_dict, filename):
    """ Simple function to write JSON dictionaries to files
    """
    with open(filename, 'w') as outfile:
        json.dump(json_dict, outfile)

CNTK_TRAIN_DATA_FILE = 'Train_cntk_text.txt'
CNTK_TEST_DATA_FILE = 'Test_cntk_text.txt'
URL_FMT = 'https://{}.blob.core.windows.net/{}/{}'

def select_random_data_storage_container():
    """Randomly select a storage account and container for CNTK train/test data.
    This is specific for the workshop to help distribute attendee load. This
    function will only work on Python2"""
    ss = random.randint(0, 4)
    cs = random.randint(0, 4)
    sa = '{}{}bigai'.format(ss, chr(ord('z') - ss))
    cont = '{}{}{}'.format(cs, chr(ord('i') - cs * 2), chr(ord('j') - cs * 2))
    return sa, cont

def create_resource_file_list():
    sa, cont = select_random_data_storage_container()
    ret = [{
        'file_path': CNTK_TRAIN_DATA_FILE,
        'blob_source': URL_FMT.format(sa, cont, CNTK_TRAIN_DATA_FILE)
    }]
    sa, cont = select_random_data_storage_container()
    ret.append({
        'file_path': CNTK_TEST_DATA_FILE,
        'blob_source': URL_FMT.format(sa, cont, CNTK_TEST_DATA_FILE)
    })
    return ret

def compose_command(num_convolution_layers, minibatch_size, max_epochs=30):
    cmd_str = ' '.join(("source /cntk/activate-cntk;",
                        "python /code/ConvNet_CIFAR10.py",
                        "--num_convolution_layers {num_convolution_layers}",
                        "--minibatch_size {minibatch_size}",
                        "--max_epochs {max_epochs}")).format(num_convolution_layers=num_convolution_layers,
                                                             minibatch_size=minibatch_size,
                                                             max_epochs=max_epochs)
    return 'bash -c "{}"'.format(cmd_str)

@curry
def append_parameter(param_name, param_value, data_dict):
    data_dict[param_name]=param_value
    return data_dict

Create the task template:

In [None]:
STORAGE_ACCOUNT_ALIAS = "mystorageaccount"

IMAGE_NAME = "masalvar/cntkcifar" # Custom CNTK image

_TASK_TEMPLATE = {
    "image": IMAGE_NAME,
    "remove_container_after_exit": True,
    "gpu": True,
    "output_data": {
        "azure_storage": [
            {
                "storage_account_settings": STORAGE_ACCOUNT_ALIAS,
                "container": "output",
                "source": "$AZ_BATCH_TASK_DIR/wd/Models"
            },
        ]
    },
}

Here we modify the `task_generator` to append an additional parameter to each task definition with a specific `id` numbered from `0`. This is done so we can reference each dependent task in the selection task that must run after each training task has completed successfully.

In [None]:
def task_generator(parameters):
    id = 0
    for params in ParameterGrid(parameters):
        yield pipe(copy(_TASK_TEMPLATE),
                   append_parameter('command', compose_command(**params)),
                   append_parameter('resource_files', create_resource_file_list()),
                   append_parameter('id', str(id)))
        id += 1

Generate the `jobs.json` configuration file

In [None]:
parameters = {
    "num_convolution_layers": [2, 3, 4],
    "minibatch_size": [32, 64, 128]
}

In [None]:
JOB_ID = 'cntk-ps-as-job'

jobs = {
    "job_specifications": [
        {
            "id": JOB_ID,
            "tasks": list(task_generator(parameters))    
        }
    ]
}

num_parameter_sweep_tasks = len(jobs['job_specifications'][0]['tasks'])
print('number of tasks for parametric sweep {}: {}'.format(JOB_ID, num_parameter_sweep_tasks))

Now we'll create the Python program to run that performs the best model selection. Note that this code is nearly similar to the code for selecting the best model locally in the [Parameter sweep notebook](04_Parameter_Sweep.ipynb).

In [None]:
%%writefile autoselect.py
import json
import os
import shutil

def read_json(filename):
    with open(filename, 'r') as infile:
        return json.load(infile)

def scandir(basedir):
    for root, dirs, files in os.walk(basedir):
        for f in files:
            yield os.path.join(root, f) 

MODELS_DIR = os.path.join('wd', 'Models')
            
results_dict = {}
for model in scandir(MODELS_DIR):
    if not model.endswith('.json'):
        continue
    key = model.split(os.sep)[2]  # due to MODELS_DIR path change
    results_dict[key] = read_json(model)

# use items() instead of iteritems() as this will be run in python3
tuple_min_error = min(results_dict.items(), key=lambda x: x[1]['test_metric'])
configuration_with_min_error = tuple_min_error[0]
print('task with smallest error: {} ({})'.format(configuration_with_min_error, tuple_min_error[1]['test_metric']))

# copy best model to wd
MODEL_NAME = 'ConvNet_CIFAR10_model.dnn'
shutil.copy(os.path.join(MODELS_DIR, configuration_with_min_error, MODEL_NAME), '.')

We now need to prepare the file to be uploaded to the Azure Storage account to be referenced in the task:

In [None]:
INPUT_CONTAINER = 'input-autoselect'
OUTPUT_CONTAINER = 'output-autoselect'
UPLOAD_DIR = 'autoselect_upload'

!rm -rf $UPLOAD_DIR
!mkdir -p $UPLOAD_DIR
!mv autoselect.py $UPLOAD_DIR
!ls -alF $UPLOAD_DIR

Alias `blobxfer` and upload it to `INPUT_CONTAINER`:

In [None]:
%alias blobxfer python -m blobxfer

In [None]:
blobxfer $storage_account_name $INPUT_CONTAINER $UPLOAD_DIR --upload --storageaccountkey $storage_account_key

Now we'll append the `auto-model-selection` task which depends on the prior training tasks. The important properties here are `depends_on_range` which specifies a range of task ids the `auto-model-selection` task depends on. Additionally, this task requires data from the prior run task which is specified in `input_data`.

In [None]:
def generate_input_data_spec(job_id, task_id):
    return {
        "job_id": job_id,
        "task_id": task_id,
        "include": ["wd/Models/*_{}_{}/*".format(task_id, job_id)]
    }

input_data = []
for x in range(0, num_parameter_sweep_tasks):
    input_data.append(generate_input_data_spec(JOB_ID, str(x)))

model_selection_task = {
    "id": "auto-model-selection",
    "command": "python autoselect.py",
    "depends_on_range": [0, num_parameter_sweep_tasks - 1],
    "image": IMAGE_NAME,
    "remove_container_after_exit": True,
    "input_data": {
        "azure_batch": input_data,
        "azure_storage": [
            {
                "storage_account_settings": STORAGE_ACCOUNT_ALIAS,
                "container": INPUT_CONTAINER
            }
        ]
    },
    "output_data": {
        "azure_storage": [
            {
                "storage_account_settings": STORAGE_ACCOUNT_ALIAS,
                "container": OUTPUT_CONTAINER,
                "include": ["*wd/ConvNet_CIFAR10_model.dnn"],
                "blobxfer_extra_options": "--delete --strip-components 2"
            }
        ]
    }
}

# append auto-model-selection task to jobs
jobs['job_specifications'][0]['tasks'].append(model_selection_task)

In [None]:
write_json_to_file(jobs, os.path.join('config', 'jobs.json'))
print(json.dumps(jobs, indent=4, sort_keys=True))

<a id='section3'></a>

## Submit job
Check that everything is ok with our pool before we submit our jobs

In [None]:
shipyard pool list

Now that we have confirmed everything is working we can execute our job using the command below. 

In [None]:
shipyard jobs add

Using the command below we can check the status of our jobs. Once all jobs have an exit code we can continue. You can also view the **heatmap** of this pool on [Azure Portal](https://portal.azure.com) to monitor the progress of this job on the compute nodes under your Batch account.

In [None]:
shipyard jobs listtasks --jobid $JOB_ID

<a id='section4'></a>

## Download best model
The best performing model from the parametric sweep job should now be saved to our `OUTPUT_CONTAINER` container by the `auto-model-selection` task. Let's save this model in `MODELS_DIR`:

In [None]:
MODELS_DIR = 'auto-selected-model'

Download the best performing model:

In [None]:
blobxfer $storage_account_name $OUTPUT_CONTAINER $MODELS_DIR --remoteresource . --download --storageaccountkey $storage_account_key

The best model file (`ConvNet_CIFAR10_model.dnn`) is now ready for use.

In [None]:
!ls -alF $MODELS_DIR

<a id='section5'></a>

## Delete job

To delete the job use the command below. Just be aware that this will get rid of all the files created by the job and tasks.

In [None]:
shipyard jobs del -y --termtasks --wait

## Next Steps
You can proceed to the [Notebook: Clean Up](05_Clean_Up.ipynb) if you are done for now, or proceed to one of the following additional Notebooks:
* [Notebook: Tensorboard Visualization](07_Advanced_Tensorboard.ipynb) - note this requires running this notebook on your own machine
* [Notebook: Parallel and Distributed](08_Advanced_Parallel_and_Distributed.ipynb)
* [Notebook: Keras with TensorFlow](09_Keras_Single_GPU_Training_With_Tensorflow.ipynb)