Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Creating and Updating a Docker Image before Deployment as a Webservice

This notebook demonstrates how to make changes to an existing docker image, before deploying it as a webservice.  

Knowing how to do this can be helpful, for example if you need to debug the execution script of a webservice you're developing, and debugging it involves several iterations of code changes.  In this case it is not an option to deploy your application as a webservice at every iteration, because the time it takes to deploy your service will significantly slow you down.  In some cases, it may be easier to simply run the execution script on the command line, but this not an option if your script accumulates data across individual calls.

We describe the following process:

1. Configure your Azure Workspace.
2. Create a Docker image, using the Azure ML SDK.
3. Test your Application by running the Docker container locally.
4. Update the execution script inside your running Docker container.
5. Commit the changes in your Docker container to its image
6. Update the image in the Azure Container Registry (ACR).
7. Deploy your Docker image as an Azure Container Instance ([ACI](https://azure.microsoft.com/en-us/services/container-instances/)) Webservice.
8. Test your Webservice.
    
> Several cells below are completely commented out. This is because they can only be run on Jupyter, but not on Azure Databricks.  If you do have access to Jupyter, we recommend to explore these cells, because they give you an insight into how to debug a docker container.

### Prerequisites
- You need to have an [Azure](https://azure.microsoft.com) subscription. You will also need to know a subscription_id and resource_group.  One way to discover those is to visit the [azure portal](https://portal.azure.com) and look use the same as those of the DSVM.

**Note:** 
- This code was tested on a Data Science Virtual Machine ([DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/)), running Ubuntu Linux 16.04 (Xenial). **Do *not* try to run this notebook on *Databricks***, because you may run into trouble executing some of the shell and docker commands.
- If you get an error message when trying to import `azureml` in the first cell below, you probably have to switch to using the correct kernel: `Python [conda env:amladpm]`.

## Configure your Azure Workspace

We start by selecting your workspace, and make sure we have access to it from here.  In order for this to work, make sure you followed the instructions for creating a workspace in your development environment:

- [DSVM](../lab0.0_Setting_Up_Env/configure_environment_DSVM.ipynb)
- [Aure Databricks](../lab0.0_Setting_Up_Env/configure_environment_ADB.ipynb)

In [None]:
# %matplotlib inline  

import os
from azureml.core import Workspace
import pandas as pd
import urllib

config_path = '/dbfs/tmp/'

# # run this if you are using Jupyter (instead of azure datarbicks)
# config_path = os.path.expanduser('~')

ws = Workspace.from_config(path=os.path.join(config_path, 'aml_config', 'config.json'))

ws.get_details()

Let's make sure that you have the correction version of the Azure ML SDK installed on your workstation or VM.  If you don't have the write version, please follow these [Installation Instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python#install-the-sdk).

In [None]:
import azureml

# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

## Create a Docker image using the Azure ML SDK

### Create a template execution script for your application

We are going to start with just a barebones execution script for your webservice.  This script will calculate the running average of numbers thrown at it.

We recommend that you execute the cells for your `score.py` scripts twice.

1. With a `#` sign at the beginning of the first line. This way you can detect typos and syntax errors during execution.
2. If the script runs OK, you can remove the `#` sign, to write the script to a file instead of executing it.

In [None]:
# #%%writefile score.py

# import json # we use json in order to interact with the anomaly detection service via a RESTful API

# # The init function is only run once, when the webservice (or Docker container) is started
# def init():
#     global running_avg, curr_n
    
#     running_avg = 0.0
#     curr_n = 0
    
#     pass

# # the run function is run everytime we interact with the service
# def run(raw_data):
#     """
#     Calculates rolling average according to Welford's online algorithm.
#     https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online

#     :param raw_data: raw_data should be a json query containing a dictionary with the key 'value'
#     :return: runnin_avg (float, json response)
#     """
#     global running_avg, curr_n
    
#     value = json.loads(raw_data)['value']
#     n_arg = 5 # we calculate the average over the last "n" measures
    
#     curr_n += 1
#     n = min(curr_n, n_arg) # in case we don't have "n" measures yet
    
#     running_avg += (value - running_avg) / n
    
#     return json.dumps(running_avg)

### Create environment file for your Conda environment

Next, create an environment file (environment.yml) that specifies all the python dependencies of your script. This file is used to ensure that all of those dependencies are installed in the Docker image.  Let's assume your Webservice will require ``azureml-sdk``, ``scikit-learn``, and ``pynacl``.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_pip_package("pynacl==1.2.1")
myenv.add_pip_package("pyculiarity")
myenv.add_pip_package("pandas")
myenv.add_pip_package("numpy")

with open("environment.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the `environment.yml` file.

In [None]:
with open("environment.yml","r") as f:
    print(f.read())

### Create the initial Docker image

The next step is to create a docker image.  

We start by creating downloading a scoring script that we prepared for this course.  You can skip the details of this script, if you are in a hurry.  Briefly, it contains our solution to online anomaly detection from the previous lab.

In [None]:
# uncomment the below lines to see the solution
filename = 'AD_score.py'
urllib.request.urlretrieve(
    os.path.join('https://raw.githubusercontent.com/Azure/LearnAI-ADPM/master/solutions/', filename),
    filename='score.py')

with open('score.py') as f:
    print(f.read())

In [None]:
%%time

from azureml.core.image import ContainerImage

# configure the image
image_config = ContainerImage.image_configuration(execution_script = "score.py", 
                                                  runtime = "python",
                                                  conda_file = "environment.yml")

# create the docker image. this should take less than 5 minutes
image = ContainerImage.create(name = "my-docker-image",
                              image_config = image_config,
                              models = [],
                              workspace = ws)

# we wait until the image has been created
image.wait_for_creation(show_output=True)

# let's save the image location
imageLocation = image.serialize()['imageLocation']

## Test your Application by running the Docker container locally

### Download the created Docker image from the Azure Container Registry ([ACR](https://azure.microsoft.com/en-us/services/container-registry/))

Here we use some [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) to exchange variables between python and bash.

In [None]:
# %%bash -s "$imageLocation" 

# # get the location of the docker image in ACR
# imageLocation=$1

# # extract the address of the repository within ACR
# repository=$(echo $imageLocation | cut -f 1 -d ".")

# echo "Attempting to login to repository $repository"
# az acr login --name $repository
# echo

# echo "Trying to pull image $imageLocation"
# docker pull $imageLocation

### Start the docker container

We use standard Docker commands to start the container locally.

In [None]:
# %%bash -s "$imageLocation"

# # extract image name and tag from imageLocation
# image_name=$(echo $1 | cut -f 1 -d ":")
# tag=$(echo $1 | cut -f 2 -d ":")
# echo "Image name: $image_name, tag: $tag"

# # extract image ID from list of downloaded docker images
# image_id=$(docker images $image_name:$tag --format "{{.ID}}")
# echo "Image ID: $image_id"

# # we forward TCP port 5001 of the docker container to local port 8080 for testing
# echo "Starting docker container"
# docker run -d -p 8888:5001 $image_id


### Test the docker container

We test the docker container, by sending some data to it to see how it responds - just as we would with a Webservice.

> If you get an error message below, you may just have to wait a couple of seconds.

In [None]:
# import json
# import requests
# import numpy as np
# import matplotlib.pyplot as plt

# values = np.random.normal(0,1,100)
# values = np.cumsum(values)


# running_avgs = []

# for value in values:
#     raw_data = {"value": value}

#     r = requests.post('http://localhost:8888/score', json=raw_data)

#     result = json.loads(r.json())
#     running_avgs.append(result)

# plt.close()
# plt.plot(values)
# plt.plot(running_avgs)
# display()

## Modifying the container

Let's make a change to the the execution script: Copy the changed ``AD_score.py`` into the running docker container and commit the changes to the container image.

In [None]:
# %%bash -s $imageLocation

# image_location=$1

# # extract image name and tag from imageLocation
# image_name=$(echo $image_location | cut -f 1 -d ":")
# tag=$(echo $image_location | cut -f 2 -d ":")

# echo "Image name: $image_name, tag: $tag"

# # extract image id
# image_id=$(docker images $image_name:$tag --format "{{.ID}}")

# echo "Image ID: $image_id"

# # extract container ID
# container_id=$(docker ps | tail -n1 | cut -f 1 -d " ")
# echo "Container ID: $container_id"

# # copy modified scoring script again
# docker cp AD_score.py $container_id:/var/azureml-app/score.py
# sleep 1

# # stop the container
# docker restart $container_id

## Test the container

### Load telemetry data

In [None]:
base_path = 'https://sethmottstore.blob.core.windows.net'
data_dir = os.path.join(base_path, 'predmaint')

print("Reading data ... ", end="")
telemetry = pd.read_csv(os.path.join(data_dir, 'telemetry.csv'))
print("Done.")

print("Parsing datetime...", end="")
telemetry['datetime'] = pd.to_datetime(telemetry['datetime'], format="%m/%d/%Y %I:%M:%S %p")
telemetry.columns = ['timestamp', 'machineID', 'volt', 'rotate', 'pressure', 'vibration']
print("Done.")

In [None]:
# import numpy as np
# import json
# import requests

# def test_docker(telemetry, n=None):
#     """
#         n is the number of sensor readings we are simulating
#         """

#     if not n:
#         n = telemetry.shape[0]

#     machine_ids = [1] # telemetry['machineID'].unique()
#     timestamps = telemetry['timestamp'].unique()

#     out_df = pd.DataFrame()
#     for timestamp in timestamps[:10]:
#         np.random.shuffle(machine_ids)
#         for machine_id in machine_ids:
#             data = telemetry.loc[(telemetry['timestamp'] == timestamp) & (telemetry['machineID'] == machine_id), :]
#             json_data = data.to_json()
#             input_data = {"data": json_data}
           
#             r = requests.post('http://localhost:8888/score', json=input_data)

#             result = pd.read_json(json.loads(r.json()))
#             out_df = out_df.append(result, ignore_index=True)
#     return out_df

In [None]:
# print("Processing ... ")
# out_df = test_docker(telemetry)
# print("Done.")
# out_df

### Push the updated container to ACR

**First**, test your Docker container again (run the json query above), to ensure that the changes are having the expected effect. 

**Then** you can push the image into ACR, so that it can be retrieved by the Azure ML SDK when you want to deploy your Webservice.

In [None]:
# %%bash -s "$imageLocation"

# image_location=$1

# # extract container ID
# container_id=$(docker ps | tail -n1 | cut -f 1 -d " ")
# echo "Container ID: $container_id"

# # commit changes made in the container to the local copy of the image
# docker commit $container_id $image_location

# docker push $image_location

Let's try to deploy the container to ACI, just to make sure everything behaves as expected.

In [None]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
from azureml.core.webservice import AciWebservice

# create configuration for ACI
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "some data",  "method" : "machine learning"}, 
                                               description="Does machine learning on some data")
# pull the image
image = ContainerImage(ws, name='my-docker-image')

# deploy webservice
service_name = 'my-web-service'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                            image = image,
                                            name = service_name,
                                            workspace = ws)
service.wait_for_deployment(show_output = True)
print(service.state)


In [None]:
import numpy as np
import json
import requests

def test_webservice(telemetry, n=None):
    """
        n is the number of sensor readings we are simulating
        """

    if not n:
        n = telemetry.shape[0]

    machine_ids = [1] # telemetry['machineID'].unique()
    timestamps = telemetry['timestamp'].unique()

    out_df = pd.DataFrame()
    for timestamp in timestamps[:n]:
        np.random.shuffle(machine_ids)
        for machine_id in machine_ids:
            data = telemetry.loc[(telemetry['timestamp'] == timestamp) & (telemetry['machineID'] == machine_id), :]
            json_data = data.to_json()
            input_data = bytes(json.dumps({"data": json_data}), encoding = 'utf8')
    
            result = pd.read_json(json.loads(service.run(input_data=input_data)))

            out_df = out_df.append(result, ignore_index=True)
    return out_df

In [None]:
print("Processing ... ")
out_df = test_webservice(telemetry, n=10)
print("Done.")
out_df

In [None]:
service.serialize()

## Clean up resources

To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:

In [None]:
service.delete()

# The end

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.