# Azure Machine Learning Service Framework

## Set up Development Environment

#### Initialzize Workspace

* Import base Azure ML packages
* Check the SDK version
* Connect to the workspace

In [1]:
# base packages to work with AMLS
import azureml.core
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')

Azure ML SDK Version:  1.0.65
azureml-demo	eastus2	jcantrell-rg1	eastus2


#### Create Experiment

In [2]:
# Name your experiment here.
experiment_name = 'framework'

from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)

#### Create a directory for the Training script and any custom Python code.

In [3]:
# Directory to write training script.
# Code Directory
import os
script_folder = os.path.join(os.getcwd(), "AzureMLFramework")
os.makedirs(script_folder, exist_ok=True)

#Upload Data
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
#ds.upload(src_dir=data_folder, target_path='mnist', overwrite=True, show_progress=True)

AzureBlob azuremldemo7980688565 azureml-blobstore-12aea559-641a-4a9b-919d-04ce0cc6ddd6


#### Create or Attach an existing compute resource

I've added two sets of code to create the Compute Cluster. The first cell is a simple version that uses defaults to create the cluster. The second cell is an examle of a more configurable version.

In [4]:
# Compute cluster creation.
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "AzureMLFramework"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


#### Environment Setup

This section outlines how to get the environment set up. This section MUST be included and filled out. Please be aware that there are two sections. Conda dependencies and PIP dependencies. Proper identification of where the packages are installed from is important.

In [5]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

fwrk = Environment("fwrk")

fwrk.docker.enabled = True
fwrk.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn',
                                                                          'pandas',
                                                                          'numpy',
                                                                          'seaborn',
                                                                          'category_encoders',
                                                                          'lightgbm',
                                                                          'papermill'
                                                                         ])
fwrk.python.conda_dependencies.add_pip_package("inference-schema[numpy-support]")
fwrk.python.conda_dependencies.save_to_file(".", script_folder + "/fwrk.yml")

'/mnt/azmnt/code/Users/jcantrell/AzureMLFramework/fwrk.yml'

## Train the Predictive Model

#### Create Training Script

This section creates a training script to be used by the Experiment to build the Machine Learning Model. The output is a pickle file that is used to create a web service for making predictions using the ML model.

In [6]:
%%writefile $script_folder/train.py

import argparse
import os
import numpy as np

from sklearn.svm import SVC
from sklearn.externals import joblib
import pickle

from azureml.core import Run

# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')
args = parser.parse_args()

# height, width, shoe size
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40], [190, 90, 47], [175, 64, 39],
     [177, 70, 40], [159, 55, 37], [171, 75, 42], [181, 85, 43]]

Y = ['male', 'male', 'female', 'female', 'male', 'male', 'female', 'female', 'female', 'male', 'male']

clf = SVC()
clf = clf.fit(X, Y)

print('Predicted value:', clf.predict([[190, 70, 43]]))
print('Accuracy', clf.score(X,Y))

print('Export the model to model.pkl')
f = open('fwrk.pkl', 'wb')
pickle.dump(clf, f)
f.close()

print('Import the model from model.pkl')
f2 = open('fwrk.pkl', 'rb')
clf2 = pickle.load(f2)

X_new = [[154, 54, 35]]
print('New Sample:', X_new)
print('Predicted class:', clf2.predict(X_new))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=clf, filename='outputs/fwrk.pkl')

Overwriting /mnt/azmnt/code/Users/jcantrell/AzureMLFramework/train.py


#### Submit the Training Job to the Compute Cluster

Run the experiment by submitting the estimator object. And you can navigate to Azure portal to monitor the run.

In [7]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory=script_folder, script='train.py')

# Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name

# Set environment
src.run_config.environment = fwrk
 
run = exp.submit(config=src)
run

Experiment,Id,Type,Status,Details Page,Docs Page
framework,framework_1570576914_938b648c,azureml.scriptrun,Starting,Link to Azure Portal,Link to Documentation


Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code.

In [8]:
%%time
# specify show_output to True for a verbose log
run.wait_for_completion(show_output=True) 

RunId: framework_1570576914_938b648c
Web View: https://mlworkspace.azure.ai/portal/subscriptions/09f45657-2113-4b94-8f6f-cca0449e7fc3/resourceGroups/jcantrell-rg1/providers/Microsoft.MachineLearningServices/workspaces/azureml-demo/experiments/framework/runs/framework_1570576914_938b648c

Streaming azureml-logs/55_azureml-execution-tvmps_40c352e2384c559aac6cafd69b46c12ee22246ff0d14e4ea61f578b2b0de87e2_d.txt

2019-10-08T23:24:49Z Successfully mounted a/an Azure File Shares at /mnt/batch/tasks/shared/LS_root/jobs/azureml-demo/azureml/framework_1570576914_938b648c/mounts/workspacefilestore
2019-10-08T23:24:50Z Mounted //azuremldemo7980688565.file.core.windows.net/azureml-filestore-12aea559-641a-4a9b-919d-04ce0cc6ddd6 at /mnt/batch/tasks/shared/LS_root/jobs/azureml-demo/azureml/framework_1570576914_938b648c/mounts/workspacefilestore
2019-10-08T23:24:50Z No blob file systems configured
2019-10-08T23:24:50Z No unmanaged file systems configured
2019-10-08T23:24:50Z Starting output-watcher...
L

{'endTimeUtc': '2019-10-08T23:27:46.084669Z',
 'inputDatasets': [],
 'logFiles': {'azureml-logs/55_azureml-execution-tvmps_40c352e2384c559aac6cafd69b46c12ee22246ff0d14e4ea61f578b2b0de87e2_d.txt': 'https://azuremldemo7980688565.blob.core.windows.net/azureml/ExperimentRun/dcid.framework_1570576914_938b648c/azureml-logs/55_azureml-execution-tvmps_40c352e2384c559aac6cafd69b46c12ee22246ff0d14e4ea61f578b2b0de87e2_d.txt?sv=2018-11-09&sr=b&sig=4gQY7xvSQHQJI2pwprDfwwuDLxCoeCZfK6cIGBbGtqs%3D&st=2019-10-08T23%3A17%3A47Z&se=2019-10-09T07%3A27%3A47Z&sp=r',
  'azureml-logs/65_job_prep-tvmps_40c352e2384c559aac6cafd69b46c12ee22246ff0d14e4ea61f578b2b0de87e2_d.txt': 'https://azuremldemo7980688565.blob.core.windows.net/azureml/ExperimentRun/dcid.framework_1570576914_938b648c/azureml-logs/65_job_prep-tvmps_40c352e2384c559aac6cafd69b46c12ee22246ff0d14e4ea61f578b2b0de87e2_d.txt?sv=2018-11-09&sr=b&sig=35klDQzvm8FMMuXSIFXP91CgsQSyaWVpgjqJyodYNps%3D&st=2019-10-08T23%3A17%3A47Z&se=2019-10-09T07%3A27%3A47Z&sp=r'

#### Register the Model

Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.

In [9]:
# register model 
model = run.register_model(model_name='fwrk', model_path='outputs/fwrk.pkl')
print(model.name, model.id, model.version, sep='\t')

from azureml.core import Workspace
from azureml.core.model import Model
import os 
ws = Workspace.from_config()
model=Model(ws, 'fwrk')

model.download(target_dir=os.getcwd(), exist_ok=True)

# verify the downloaded model file
file_path = os.path.join(os.getcwd(), "fwrk.pkl")

os.stat(file_path)

fwrk	fwrk:6	6


os.stat_result(st_mode=33279, st_ino=14123310421666430976, st_dev=45, st_nlink=1, st_uid=0, st_gid=0, st_size=1889, st_atime=1570533517, st_mtime=1570577268, st_ctime=1570577268)

## Deploy Model as an AzureML Service

#### Create the Scoring Script

Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:
* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. 

* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

In [10]:
%%writefile score.py
import json
import numpy as np
import os
import pickle
import pandas as pd
from sklearn.externals import joblib
from sklearn.svm import SVC

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType

from azureml.core.model import Model

def init():
    global model
    #model = joblib.load('recommender.pkl')
    model_path = Model.get_model_path('fwrk')
    model = joblib.load(model_path)

input_sample = pd.DataFrame(data=[{
              "input_name_1": 5,         # This is a decimal type sample. Use the data type that reflects this column in your data
              "input_name_2": 5,    # This is a string type sample. Use the data type that reflects this column in your data
              "input_name_3": 3            # This is a integer type sample. Use the data type that reflects this column in your data
            }])

output_sample = np.array([0])              # This is a integer type sample. Use the data type that reflects the expected result

@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))

def run(data):
    try:
        result = model.predict(data)
        # you can return any datatype as long as it is JSON-serializable
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

Overwriting score.py


#### Deploy in ACI

Configure the image and deploy. The following code goes through these steps:

1. Build an image using:
   * The scoring file
   * The environment file
   * The model file
1. Register that image under the workspace. 
1. Send the image to the ACI container.
1. Start up a container in ACI using the image.
1. Get the web service HTTP endpoint.

In [11]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "SVM",  "method" : "sklearn"}, 
                                               description='Predict gender with sklearn SVM')

In [12]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
from azureml.exceptions import WebserviceException

# configure the image
image_config = ContainerImage.image_configuration(execution_script="score.py", 
                                                  runtime="python", 
                                                  conda_file="fwrk.yml")

service_name = 'fwrk'

# delete service if it exists
try:
    service = Webservice(ws, name=service_name)
    if service:
        service.delete()
except WebserviceException as e:
    print()
    
service = Webservice.deploy_from_model(workspace=ws, 
                                       name=service_name, 
                                       deployment_config=aciconfig, 
                                       models=[model], 
                                       image_config=image_config)

service.wait_for_deployment(show_output=True)

Creating image
Running..........................................................
Succeeded
Image creation operation finished for image fwrk:4, operation "Succeeded"
Running.................
SucceededACI service creation operation finished, operation "Succeeded"
CPU times: user 680 ms, sys: 59.2 ms, total: 740 ms
Wall time: 6min 33s


In [13]:
print(service.scoring_uri)

http://b657addc-ef8a-4710-b48f-87c64d8e2389.eastus2.azurecontainer.io/score


#### Test deployed service

Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model.  

The following code goes through these steps:
1. Send the data as a JSON array to the web service hosted in ACI. 

1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.

1. Print the returned predictions.

In [14]:
import requests
import json

headers = {'Content-Type':'application/json'}

if service.auth_enabled:
    headers['Authorization'] = 'Bearer '+service.get_keys()[0]

print(headers)
    
test_sample = json.dumps({'data': [[190, 70, 43]]})

response = requests.post(service.scoring_uri, data=test_sample, headers=headers)
print(response.status_code)
print(response.elapsed)
print(response.json())

{'Content-Type': 'application/json'}
200
0:00:00.160056
['male']
