# Flight Delay Demo - Deep Learning & Labeling

## Install prerequisites

Before running the notebook, make sure the correct versions of these libraries are installed.

In [None]:
!pip install --upgrade tensorflow-gpu==1.13.2 tensorflow==1.13.2

In [None]:
import warnings
warnings.filterwarnings("ignore")

import logging
logging.basicConfig(level = logging.ERROR)

## Import open source Python libraries

Import open source dependencies and modules that will be used through out this notebook.

In [None]:
import os
import requests
import utils
import numpy as np
from matplotlib import pyplot as plt
import environment_definition
import string_int_label_map_pb2

## Import Azure Machine Learning Python SDK

Import Azure Machine Learning SDK modules.

In [None]:
from azureml.core import Workspace, Experiment
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.core.image import ContainerImage
from azureml.train.dnn import TensorFlow
from azureml.core.runconfig import AzureContainerRegistry, DockerEnvironment, EnvironmentDefinition, PythonEnvironment

from azureml.core.compute import AksCompute, ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.webservice import Webservice, AksWebservice

from azureml.core.image import Image

## Connect to Azure Machine Learning workspace

In the next cell, we will create a new Workspace config object using the `<subscription_id>`, `<resource_group_name>`, and `<workspace_name>`. This will fetch the matching Workspace and prompt you for authentication. Please click on the link and input the provided details.

For more information on **Workspace**, please visit: [Microsoft Workspace Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py)

`<subscription_id>` = You can get this ID from the landing page of your Resource Group.

`<resource_group_name>` = This is the name of your Resource Group.

`<workspace_name>` = This is the name of your Workspace.

In [None]:
from azureml.core.workspace import Workspace

try:    
    # Get instance of the Workspace and write it to config file
    ws = Workspace(
        subscription_id = '<subscription_id>', 
        resource_group = '<resource_group>', 
        workspace_name = '<workspace_name>')

    # Writes workspace config file
    ws.write_config()
    
    print('Library configuration succeeded')
except Exception as e:
    print(e)
    print('Workspace not found')

## Collect and prepare training data

Let's take a look at a subset of images used for training our model.

<HTML>
    <TR>
        <TD><img src="./images/train/default/000.jpg" /></TD>
        <TD><img src="./images/train/default/001.jpg" /></TD>
        <TD><img src="./images/train/default/002.jpg" /></TD>
        <TD><img src="./images/train/default/003.jpg" /></TD>
    </TR>
    <TR>
        <TD><img src="./images/train/default/004.jpg" /></TD>
        <TD><img src="./images/train/default/005.jpg" /></TD>
        <TD><img src="./images/train/default/006.jpg" /></TD>
        <TD><img src="./images/train/default/007.jpg" /></TD>
    </TR>
</HTML>

## Keras: Data augmentation

The `tf.keras.preprocessing.image.ImageDataGenerator` function generates batches of tensor image data with real-time data augmentation.

In [None]:
import tensorflow as tf
import os

augmented_folder = './images/augmented'

# Working directory
if not os.path.exists(augmented_folder):
    os.makedirs(augmented_folder)
    
gen = tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=17, width_shift_range=0.12,
                     height_shift_range=0.12, zoom_range=0.12, horizontal_flip=True)

path = 'images/'
i = 0
for batch in gen.flow_from_directory('images/train', target_size=(224,224),
    class_mode=None, shuffle=False, batch_size=32,
    save_to_dir=augmented_folder, save_prefix='hi'):

    i += 1
    if i > 10:
        break  

## Keras: Augmented dataset sample

Review images generated by Keras.

In [None]:
from IPython.display import Image, display
from glob import glob

listofImageNames = glob(path+'augmented/*.png', recursive=True)
for imageName in listofImageNames[:1]:
    display(Image(filename=imageName))
    print(imageName)

In [None]:
import shutil

shutil.rmtree(augmented_folder)

## Create Azure Machine Learning experiment

The Experiment constructor allows to create an experiment instance. The constructor takes in the current workspace, which is fetched by calling `Workspace.from_config()` and an experiment name. 

For more information on **Experiment**, please visit: [Microsoft Experiment Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py)

In [None]:
experiment_name = 'flight-delay-tf'
experiment = Experiment(workspace=ws, name=experiment_name)

## Create auto-scaling AML Compute GPU cluster

Firstly, check for the existence of the cluster. If it already exists, we are able to reuse it. Checking for the existence of the cluster can be performed by calling the constructor `ComputeTarget()` with the current workspace and name of the cluster.

In case the cluster does not exist, the next step will be to provide a configuration for the new AML cluster by calling the function `AmlCompute.provisioning_configuration()`. It takes as parameters the VM size and the max number of nodes that the cluster can scale up to. After the configuration has executed, `ComputeTarget.create()` should be called with the previously configuration object and the workspace object.

For more information on **ComputeTarget**, please visit: [Microsoft ComputeTarget Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.computetarget?view=azure-ml-py)

For more information on **AmlCompute**, please visit: [Microsoft AmlCompute Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.akscompute?view=azure-ml-py)


**Note:** Please wait for the execution of the cell to finish before moving forward.

In [None]:
# Choose a name for your GPU cluster
cluster_name = "gpucluster"

# Verify that cluster does not exist already
try:
    gpu_cluster = ComputeTarget(workspace = ws, name = cluster_name)
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           min_nodes=0,
                                                           max_nodes=4,
                                                           admin_username="theadmin",
                                                           admin_user_password="Password123")
                                                           
    gpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

gpu_cluster.wait_for_completion(show_output=True)

## Upload training data to Azure Machine Learning Data Store

To register our training data with our Workspace we need to get the data into the data store. The Workspace will already have a default data store. The function `ws.get_default_datastore()` returns an instance of the data store associated with the Workspace, to which files will be uploaded by calling `ds.upload()`.

For more information on **Datastore**, please visit: [Microsoft Datastore Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore?view=azure-ml-py)

In [None]:
# Prepare data
ds = ws.get_default_datastore()
ds.upload('./data')

In [None]:
env_def = EnvironmentDefinition()
env_def.docker = environment_definition.docker_config
env_def.python = environment_definition.python_config

print('Base docker image: ' + env_def.docker.base_image)

## Train model using TensorFlow estimator on the GPU cluster

Create the TensorFlow estimator and submit the experiment. The TensorFlow instance takes in as parameters the `compute_target` that will be our GPU Cluster created previously, `entry_script` that points to our main training script: `train.py`. On the other hand `inputs` and `environment_definitions` take care of mounting the data store to our remote training cluster and of the dependencies requiered on this cluster for the training to start.

In [None]:
script_params = {
    '--model_dir': './outputs',
    '--pipeline_config_path': './faster_rcnn_resnet101_bird.config'
}

tf_est = TensorFlow(source_directory = './train/src',
                    script_params=script_params,
                    compute_target=gpu_cluster,
                    entry_script='train.py',
                    inputs=[ds.as_download(path_on_compute='/data')],
                    environment_definition=env_def
                   )
run = experiment.submit(tf_est)

## Display train.py

Let's take a look at our training script. As you can see, it's a standard TensorFlow training script.

In [None]:
f = open("./train/src/train.py", "r") 
print(f.read())

## Experiment run details

While the experiment is running, we can monitor it through the AML widget.

In [None]:
run = Run(experiment=experiment, run_id=run.id)
RunDetails(run).show() 

## Start TensorBoard

The `export_to_tensorboard` function exports experiment run history to Tensorboard logs ready for Tensorboard visualization.

For more information on ***tensorboard Package***, please visit: [Microsoft tensorboard Package Documentation](https://docs.microsoft.com/en-us/python/api/azureml-tensorboard/azureml.tensorboard?view=azure-ml-py)

In [None]:
# Export Run History to Tensorboard logs
from azureml.tensorboard.export import export_to_tensorboard
from azureml.tensorboard import Tensorboard
import os

logdir = 'exportedTBlogs'
log_path = os.path.join(os.getcwd(), logdir)
try:
    os.stat(log_path)
except os.error:
    os.mkdir(log_path)

export_to_tensorboard(run, logdir)

# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here
tb = Tensorboard([], local_root=logdir, port=6006)

# If successful, start() returns a string with the URI of the instance.
tb.start()

## Stop TensorBoard

The `Tensorboard.stop()` function stops the Tensorboard instance.

In [None]:
tb.stop()

## Hyperparameter training

Hyperparameters are adjustable parameters for model training that guide the training process. The HyperDrive package helps automate choosing these parameters.

The `choice` function specifies a discrete set of options to sample from.

The `HyperDriveConfig Class` is a configuration that defines a HyperDrive run. HyperDrive configuration includes information about hyperparameter space sampling, termination policy, primary metric, resume from configuration, estimator, and the compute target to execute the experiment runs on.

The `normal` function specifies a real value that is normally-distributed with mean mu and standard deviation sigma.

The `PrimaryMetricGoal Enum` defines supported metric goals for hyperparameter tuning. A metric goal is used to determine whether a higher value for a metric is better or worse. Metric goals are used when comparing runs based on the primary metric. For example, you may want to maximize accuracy or minimize error.

The `RandomParameterSampling Class` defines random sampling over a hyperparameter search space.

For more information on ***HyperDrive Package***, please visit: [Microsoft hyperDriver Package Documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py)

In [None]:
from azureml.train.hyperdrive import BanditPolicy, choice, HyperDriveConfig, normal, PrimaryMetricGoal, RandomParameterSampling

param_sampling = RandomParameterSampling({
    "--batch_size": choice(1, 4, 8, 16),
    "--learning_rate": normal(0.0002, 0.0006)
})

hyperdrive_run_config = HyperDriveConfig(estimator=tf_est,
                          hyperparameter_sampling=param_sampling,
                          primary_metric_name="loss", 
                          primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                          max_total_runs=10,
                          max_concurrent_runs=4)

hyperdrive_run = experiment.submit(hyperdrive_run_config)

In [None]:
from azureml.widgets import RunDetails

hyperdrive_run = Run(experiment, run_id=hyperdrive_run.id)
RunDetails(hyperdrive_run).show()

## Register model

After the experiment has ended successfully we will need to download the outputs of it in order for us to register the model against our Azure Machine Learning workspace.

The `get_file_names()` function lists the files that are stored in association with the run.

The `download_file()` function downloads an associated file from storage. As parameters it receives the `name` of the artifact to be downloaded, and the `output_file_path` which is the local path where to store the artifact.

For more information on ***Run Class***, please visit: [Microsoft Run Class Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#download-file-name--output-file-path-none---validate-checksum-false-)


In [None]:
files = run.get_file_names()
results = [file for file in files if ('outputs/model' in file or 'outputs/checkpoint' in file or 'outputs/events' in file or 'outputs/graph' in file or 'outputs/frozen_inference_graph' in file)]
run.download_file('outputs/frozen_inference_graph.pb', './outputs/frozen_inference_graph.pb')

Next, register the model obtained from the best run. In order to register the model, the function `register_model()` should be called. This will take care of registering the model obtained from the best run.

The `Model.register()` function registers a model with the provided workspace.

In [None]:
# register the model for deployment
model = Model.register(model_path = "./outputs/frozen_inference_graph.pb",
                       model_name = "frozen_inference_graph.pb",
                       description = "Flight Delay Image",
                       workspace = ws)

# Deployment
## Fetch Azure Kubernetes Cluster

Let's get a reference to our already existing AKS Cluster `flight-delay-aks`.

In [None]:
from azureml.core.compute import AksCompute
from azureml.core.compute import ComputeTarget
from azureml.exceptions import ComputeTargetException

prov_config = AksCompute.provisioning_configuration(location='westus2')

try:
    aks_target = AksCompute(ws, 'flight-delay-aks')
except ComputeTargetException:
    # Create the cluster
    aks_target = ComputeTarget.create(workspace = ws, 
                            name = 'flight-delay-aks', 
                            provisioning_configuration = prov_config)
    aks_target.wait_for_completion(True)

Now that the AKS cluster has been deployed, it’s time to create an `InferenceConfig` object by calling its constructor and passing the runtime type, the path to the `entry_script` (score.py), and the `conda_file` (the previously created file that holds the environment dependencies).

Next, define the configuration of the web service to deploy. This is done by calling `AksWebservice.deploy_configuration()` and passing along the number of `cpu_cores` and `memory_gb` that the service needs.

Finally, in order to deploy the model and service to the created AKS cluster, the function `Model.deploy()` should be called, passing along the workspace object, a list of models to deploy, the defined inference configuration, deployment configuration, and the AKS object created in the step above.

For more information on **InferenceConfig**, please visit: [Microsoft InferenceConfig Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py)

For more information on **AksWebService**, please visit: [Microsoft AksWebService Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.akswebservice?view=azure-ml-py)

For more information on **Model**, please visit: [Microsoft Model Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py)


**Note:** Please wait for the execution of the cell to finish before moving forward.

In [None]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AksWebservice
from azureml.core.model import Model
from azureml.exceptions import WebserviceException

# Create an inference config object based on the score.py and myenv.yml from previous steps
inference_config = InferenceConfig(runtime= "python",
                                    entry_script="score.py",
                                    conda_file="score.yml")

deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, 
                                                        memory_gb = 1)

try:
    service = AksWebservice(ws, 'fd-image-service')
    print(service.state)
except WebserviceException:
    service = Model.deploy(ws, 
                        'fd-image-service', 
                        [model], 
                        inference_config, 
                        deployment_config, 
                        aks_target)
    service.wait_for_deployment(show_output = True)
    print(service.state)

## Test the service

Now with test data, we can get it into a suitable format to consume the web service. First an instance of the web service should be obtained by calling the constructor `Webservice()` with the Workspace object and the service name as parameters. Finally, call the service via POST using the `requests` module. `requests.post()` will call the deployed web service. It takes for parameters the service URL, the test data, and a headers dictionary that contains the authentication token.

For more information on **Webservice**, please visit: [Microsoft Webservice Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice?view=azure-ml-py)

In [None]:
# Test the service
test_image = './images/train/default/000.JPG'
image = open(test_image, 'rb')
input_data = image.read()
image.close()

aks_service_name = 'fd-image-service'
aks_service = AksWebservice(workspace=ws, name=aks_service_name)

auth = 'Bearer ' + aks_service.get_keys()[0]
uri = aks_service.scoring_uri

res = requests.post(url=uri,
                    data=input_data,
                    headers={'Authorization': auth, 'Content-Type': 'application/octet-stream'})

results = res.json()

Let's parse the response received from the Webservice.

In [None]:
#import utils
from PIL import Image

# Show the results
image = Image.open(test_image)
image_np = utils.load_image_into_numpy_array(image)
category_index = utils.create_category_index_from_labelmap('./score/samples/label_map.pbtxt', use_display_name=True)

utils.visualize_boxes_and_labels_on_image_array(
    image_np,
    np.array(results['detection_boxes']),
    np.array(results['detection_classes']),
    np.array(results['detection_scores']),
    category_index,
    instance_masks=results.get('detection_masks'),
    use_normalized_coordinates=True,
    line_thickness=8)

plt.figure(figsize=(24, 16))
plt.imshow(image_np)