# Operationalizing Machine Learning
** Project 2 **
[[View Rubric](https://review.udacity.com/#!/rubrics/2893/view)]


This notebook consists of the following chapters:

0. Python Initialization
1. Authentication
2. Automated ML Experiment
3. Deployment
4. Enable logging
5. Swagger Documentation
6. Consume the model endpoints
7. Create, Publish and consume a pipeline
8. Documentation
9. Optional: Benchmarking
10. Optional: Cleanup


## 0. Python Initialization



<div class="alert alert-block alert-danger">
<b>Just don't:</b> In general, avoid the red boxes. These should only be
used for actions that might cause data loss or another major issue.
</div>

In [None]:
#  Not needed when running notebook on azure:
# !pip install --upgrade -q -r requirements.txt
# !python --version
# may be needed on some azure machines:
!pip install pyopenssl

In [1]:
import logging
import os
import csv
import json
import threading

import pickle
import joblib
import pkg_resources
import requests

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

import sklearn
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

import azureml.core
from azureml.core import Model
# from azureml.core.model import Model
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.experiment import Experiment
from azureml.core.resource_configuration import ResourceConfiguration
from azureml.core.run import Run
from azureml.core.webservice import LocalWebservice, Webservice
from azureml.core.workspace import Workspace
from azureml.pipeline.core import Pipeline, PipelineData, TrainingOutput
from azureml.pipeline.core.run import PipelineRun
from azureml.pipeline.steps import AutoMLStep
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails


# Check core SDK version number
print("Azure SDK version:", azureml.core.VERSION)

Azure SDK version: 1.19.0


## 1. Authentication
### Local CLI configuration
I skipped granting local shell rights because i am using the Azure environment provided by udacity.
### Azure Python SDK initialization

In [3]:
ws = Workspace.from_config()
# ws = Workspace.get(name="quick-starts-ws-128192") # UPDATE THIS LINE WITH EACH NEW VM INSTANCE!

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

# DONT FORGET TO CLICK THE LOGIN LINK!

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code A3G6N96VY to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-132956
Azure region: southcentralus
Subscription id: d7f39349-a66b-446e-aba6-0053c2cf1c11
Resource group: aml-quickstarts-132956


In [3]:
### Create an Azure Experiment object. An Experiment is a container of trials that represent multiple model runs.
experiment_name = 'ml-experiment-1'
exp = Experiment(workspace=ws, name=experiment_name)

NameError: name 'Experiment' is not defined

In [124]:
# retrieve the `auth_header` which will later be used for authenticating at the API endpoint
auth_header = InteractiveLoginAuthentication().get_authentication_header()
auth_header

{'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IjVPZjlQNUY5Z0NDd0NtRjJCT0hIeEREUS1EayIsImtpZCI6IjVPZjlQNUY5Z0NDd0NtRjJCT0hIeEREUS1EayJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuY29yZS53aW5kb3dzLm5ldC8iLCJpc3MiOiJodHRwczovL3N0cy53aW5kb3dzLm5ldC82NjBiMzM5OC1iODBlLTQ5ZDItYmM1Yi1hYzFkYzkzYjUyNTQvIiwiaWF0IjoxNjA5NjA2NjU0LCJuYmYiOjE2MDk2MDY2NTQsImV4cCI6MTYwOTYxMDU1NCwiYWNyIjoiMSIsImFpbyI6IkUySmdZRkRiVjJyVStYV040MHVWbVFyMkorVG1ISThLNnp2OXdmWnUyNTNJVDZFR2w2MEEiLCJhbXIiOlsicHdkIl0sImFwcGlkIjoiMDRiMDc3OTUtOGRkYi00NjFhLWJiZWUtMDJmOWUxYmY3YjQ2IiwiYXBwaWRhY3IiOiIwIiwiZmFtaWx5X25hbWUiOiIxMzI5NTYiLCJnaXZlbl9uYW1lIjoiT0RMX1VzZXIiLCJpcGFkZHIiOiIxMy42Ni44NC4yMjQiLCJuYW1lIjoiT0RMX1VzZXIgMTMyOTU2Iiwib2lkIjoiYTI2NGQwMDUtODM5NC00NTdmLTgzZTUtMDU3M2VjNzkwM2VhIiwicHVpZCI6IjEwMDMyMDAxMDg3NzZFMzEiLCJyaCI6IjAuQUFBQW1ETUxaZzY0MGttOFc2d2R5VHRTVkpWM3NBVGJqUnBHdS00Qy1lR19lMFpTQUVZLiIsInNjcCI6InVzZXJfaW1wZXJzb25hdGlvbiIsInN1YiI6Ik4yMnI3VjJpV3ZKdEZTWEtNMkxZX0ZDcVkzTGRCUjhNSTRVOWhUaG9DSjQiLCJ0aWQiOiI2N

## 2. Automated ML Experiment
In this step we will create an AutoML experiment to find a best model for classifying our dataset.
We will later do the same with a pipeline.

### Prepare Dataset

Before doing AutoML, we need to prepare the dataset. For this, we will be using the cleaning function from the first project.

In [4]:
# Try to load the dataset from the Workspace. Otherwise, create it from the file
found = False
ds_key = "Bank-marketing"

if ds_key in ws.datasets.keys(): 
        found = True
        ds = ws.datasets[ds_key] 

if not found:
        # Create AML Dataset and register it into Workspace
        # Create TabularDataset using TabularDatasetFactory
        # https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py  
        # i download and import the _train.csv, so no further splitting is necessary
        example_data = 'https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv'
        ds = TabularDatasetFactory.from_delimited_files(path=dataset_path)  
        #Register Dataset in Workspace
        ds = ds.register(workspace=ws,
                        name=ds_key,
                        description="Bank Marketing DataSet for Udacity Course 2")


In [5]:
# data cleaning like in project-01

def clean_data(data):
    # Dict for cleaning data
    months = {"jan":1, "feb":2, "mar":3, "apr":4, "may":5, "jun":6, "jul":7, "aug":8, "sep":9, "oct":10, "nov":11, "dec":12}
    weekdays = {"mon":1, "tue":2, "wed":3, "thu":4, "fri":5, "sat":6, "sun":7}

    # Clean and one hot encode data
    x_df = data.to_pandas_dataframe().dropna()
    jobs = pd.get_dummies(x_df.job, prefix="job")
    x_df.drop("job", inplace=True, axis=1)
    x_df = x_df.join(jobs)
    x_df["marital"] = x_df.marital.apply(lambda s: 1 if s == "married" else 0)
    x_df["default"] = x_df.default.apply(lambda s: 1 if s == "yes" else 0)
    x_df["housing"] = x_df.housing.apply(lambda s: 1 if s == "yes" else 0)
    x_df["loan"] = x_df.loan.apply(lambda s: 1 if s == "yes" else 0)
    contact = pd.get_dummies(x_df.contact, prefix="contact")
    x_df.drop("contact", inplace=True, axis=1)
    x_df = x_df.join(contact)
    education = pd.get_dummies(x_df.education, prefix="education")
    x_df.drop("education", inplace=True, axis=1)
    x_df = x_df.join(education)
    x_df["month"] = x_df.month.map(months)
    x_df["day_of_week"] = x_df.day_of_week.map(weekdays)
    x_df["poutcome"] = x_df.poutcome.apply(lambda s: 1 if s == "success" else 0)

    y_df = x_df.pop("y").apply(lambda s: 1 if s == "yes" else 0)

    return x_df, y_df

found_clean = False
if ds_key +"-clean" in ws.datasets.keys(): 
        found_clean = True
        ds_clean = ws.datasets[ds_key +"-clean"] 

if not found_clean:
    # Use the clean_data function to clean your data.
    x, y = clean_data(ds)
    df_clean = x.join(y)

    #Register cleaned Dataset in Workspace
    ds_clean = TabularDatasetFactory.register_pandas_dataframe(df_clean, ws.get_default_datastore(), ds_key +"-clean",
                                                                description="Cleaned Bank Marketing DataSet for Udacity Course 2")

Method register_pandas_dataframe: This is an experimental method, and may change at any time.<br/>For more information, see https://aka.ms/azuremlexperimental.


Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/e7dbda71-a120-496c-a124-96038b47fcec/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


In [6]:

df_clean = ds_clean.to_pandas_dataframe()
df_clean.describe()

Unnamed: 0,age,marital,default,housing,loan,month,day_of_week,duration,campaign,pdays,...,contact_telephone,education_basic_4y,education_basic_6y,education_basic_9y,education_high_school,education_illiterate,education_professional_course,education_university_degree,education_unknown,y
count,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,...,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0
mean,40.040212,0.605948,9.1e-05,0.522974,0.151806,6.605281,2.980789,257.335205,2.56173,962.17478,...,0.36431,0.101153,0.056055,0.147496,0.229226,0.000455,0.128346,0.294901,0.042367,0.112049
std,10.432313,0.488653,0.009542,0.499479,0.358838,2.041099,1.41158,257.3317,2.763646,187.646785,...,0.481243,0.301536,0.230031,0.354605,0.420341,0.021332,0.33448,0.456005,0.201429,0.315431
min,17.0,0.0,0.0,0.0,0.0,3.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,32.0,0.0,0.0,0.0,0.0,5.0,2.0,102.0,1.0,999.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,38.0,1.0,0.0,1.0,0.0,6.0,3.0,179.0,2.0,999.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,47.0,1.0,0.0,1.0,0.0,8.0,4.0,318.0,3.0,999.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
max,98.0,1.0,1.0,1.0,1.0,12.0,5.0,4918.0,56.0,999.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [7]:
df_clean.head(5)

Unnamed: 0,age,marital,default,housing,loan,month,day_of_week,duration,campaign,pdays,...,contact_telephone,education_basic_4y,education_basic_6y,education_basic_9y,education_high_school,education_illiterate,education_professional_course,education_university_degree,education_unknown,y
0,57,1,0,0,1,5,1,371,1,999,...,0,0,0,0,1,0,0,0,0,0
1,55,1,0,1,0,5,4,285,2,999,...,1,0,0,0,0,0,0,0,1,0
2,33,1,0,0,0,5,5,52,1,999,...,0,0,0,1,0,0,0,0,0,0
3,36,1,0,0,0,6,5,355,4,999,...,1,0,0,0,1,0,0,0,0,0
4,27,1,0,1,0,7,5,189,2,999,...,0,0,0,0,1,0,0,0,0,0


Screenshot of “Registered Datasets” in ML Studio showing that Bankmarketing dataset (and the cleaned version) are available:
![registered_datasets](images/registered_datasets.jpg)

### Create a compute cluster

Create compute cluster "Standard_DS12_v2" and min number of nodes = 1
([Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python))

In [88]:
# Choose a name for your CPU cluster
cpu_cluster_name = "auto-ml"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True) # , min_node_count = 1, timeout_in_minutes = 10
# For a more detailed view of current AmlCompute status, use get_status().


Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [89]:
cpu_cluster.get_status()

<azureml.core.compute.amlcompute.AmlComputeStatus at 0x7fb4f4be7fd0>

### Create an AutoML Experiment

In [8]:
automl_config = AutoMLConfig(
    compute_target=cpu_cluster,
    experiment_timeout_minutes=30,
    task="classification",
    primary_metric="accuracy",
    training_data=ds_clean,
    label_column_name="y",
    n_cross_validations=3)

In [None]:
# Submit automl run
automl_run = exp.submit(config=automl_config)
RunDetails(automl_run).show()

In [None]:
automl_run.wait_for_completion()

## 3. Deployment
After the experiment run completes, a summary of all the models and their metrics are shown, including explanations. The Best Model will be shown in the Details tab. In the Models tab, it will come up first (at the top). Make sure you select the best model for deployment.

Deploying the Best Model will allow to interact with the HTTP API service and interact with the model by sending data over POST requests.

### Select the best model for deployment

In [None]:
# Retrieve and save your best automl model.
# Get your best run and save the model from that run.
#best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_automl_run, best_automl_model = automl_run.get_output()
best_automl_run_metrics = best_automl_run.get_metrics()

#parameter_values = best_automl_run.get_details()['runDefinition']['arguments']

# examine metrics of best model

#print('Best Run Id: ', best_automl_run.id)
print('Accuracy:', best_automl_run_metrics['accuracy'])
print('Metrics:', best_automl_run_metrics)
#print('Inverse of regularization strength:',parameter_values[1])
#print('Maximum number of iterations to converge:',parameter_values[3])
print("Model",best_automl_model)

In [None]:
# save best model
print("Files", best_automl_run.get_file_names())
# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#download-file-name--output-file-path-none---validate-checksum-false-
best_automl_run.download_file('outputs/model.pkl', output_file_path='best_automl_model.joblib')

# register best model
best_automl_model_reg = best_automl_run.register_model(model_name='best_automl_model', model_path='outputs/model.pkl', 
                            model_framework=Model.Framework.SCIKITLEARN,
                            description = "Best Model to classify the Bank Marketing Dataset",
                            model_framework_version=sklearn.__version__,
                            resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))

#### Deployment of the best Model

Deploy the model using Azure Container Instance (ACI) and enable "Authentication"

(Documentation for [Deploy Model](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#deploy-your-model),
[model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?preserve-view=true&view=azure-ml-py#deploy-workspace--name--models--inference-config-none--deployment-config-none--deployment-target-none--overwrite-false-), [ONXX](https://docs.microsoft.com/en-us/python/api/azureml-automl-runtime/azureml.automl.runtime.onnx_convert.onnx_converter.onnxconverter?view=azure-ml-py), [Register Model](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#register-model-model-name--model-path-none--tags-none--properties-none--model-framework-none--model-framework-version-none--description-none--datasets-none--sample-input-dataset-none--sample-output-dataset-none--resource-configuration-none----kwargs-), [No Code](https://docs.microsoft.com/de-de/azure/machine-learning/how-to-deploy-no-code-deployment))

In [None]:
best_automl_model_pub = Model.deploy(ws, 'my-model-service', [best_automl_model_reg])
best_automl_model_pub.wait_for_deployment(show_output = True)
print(best_automl_model_pub.state)

## 4. Enable logging / Application Insights
Now that the Best Model is deployed, enable Application Insights and retrieve logs. Although this is configurable at deploy time with a check-box, it is useful to be able to run code that will enable it for you.


In [None]:
# Ensure <code>az</code> is installed, as well as the Python SDK for Azure
# Create a new virtual environment with Python3
# Write and run code to enable Application Insights


TODO: Take a screenshot showing that "Application Insights" is enabled in the Details tab of the endpoint.


In [None]:
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Assumes the environment variable APPLICATIONINSIGHTS_CONNECTION_STRING is already set
logger.addHandler(AzureLogHandler())
logger.warning("I will be sent to Application Insights")


In [86]:

# Use this provided code <code>logs.py</code> to view the logs


# Requires the config to be downloaded first to the current working directory
# ws = Workspace.from_config()

# Set with the deployment name
# load existing web service
service = Webservice(name=published_model.name, workspace=ws)
logs = service.get_logs()

for line in logs.split('\n'):
    print(line)


WebserviceException: WebserviceException:
	Message: WebserviceNotFound: Webservice with name bankmarketing_pipeline not found in provided workspace
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "WebserviceNotFound: Webservice with name bankmarketing_pipeline not found in provided workspace"
    }
}

TODO: Take a screenshot showing logs by running the provided logs.py script *above

## 5. Swagger Documentation
In this step, you will consume the deployed model using Swagger.

Azure provides a Swagger JSON file for deployed models. Head to the Endpoints section, and find your deployed model there, it should be the first one on the list.

A few things you need to pay attention to:

swagger.sh will download the latest Swagger container, and it will run it on port 80. If you don't have permissions for port 80 on your computer, update the script to a higher number (above 9000 is a good idea).

serve.py will start a Python server on port 8000. This script needs to be right next to the downloaded swagger.json file. NOTE: this will not work if swagger.json is not on the same directory.



question> we deployed a pipeline not a model, i cant find swagger json file

In [None]:
# Download the swagger.json file
# Interact with the swagger instance running with the documentation for the HTTP API of the model.
# Display the contents of the API for the model

In [106]:
# Run the swagger.sh and serve the json file

def serve_swaggerjson():
    os.system('python3 swagger/serve.py 8000 > swagger/serve_log.txt 2>&1 & ')

def run_swaggerui():
    os.system('swagger/swagger.sh > swagger/run_log.txt 2>&1 & ')


threading.Thread(target=serve_swaggerjson).start()
threading.Thread(target=run_swaggerui).start()

TODO: Take a screenshot showing that swagger runs on localhost showing the HTTP API methods and responses for the model


## 6. Consume model endpoints
Once the model is deployed, use the endpoint.py script provided to interact with the trained model. In this step, you need to run the script, modifying both the scoring_uri and the key to match the key for your service and the URI that was generated after deployment.

Hint: This URI can be found in the Details tab, above the Swagger URI.




In [None]:
# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score' #URI that was generated after deployment
scoring_uri = published_model.endpoint # ''
# If the service is authenticated, set the key or token # key for your service
key = auth_header # ''

# Two sets of data to score, so we get two results back
data = {"data":
        [
          {
            "age": 17,
            "campaign": 1,
            "cons.conf.idx": -46.2,
            "cons.price.idx": 92.893,
            "contact": "cellular",
            "day_of_week": "mon",
            "default": "no",
            "duration": 971,
            "education": "university.degree",
            "emp.var.rate": -1.8,
            "euribor3m": 1.299,
            "housing": "yes",
            "job": "blue-collar",
            "loan": "yes",
            "marital": "married",
            "month": "may",
            "nr.employed": 5099.1,
            "pdays": 999,
            "poutcome": "failure",
            "previous": 1
          },
          {
            "age": 87,
            "campaign": 1,
            "cons.conf.idx": -46.2,
            "cons.price.idx": 92.893,
            "contact": "cellular",
            "day_of_week": "mon",
            "default": "no",
            "duration": 471,
            "education": "university.degree",
            "emp.var.rate": -1.8,
            "euribor3m": 1.299,
            "housing": "yes",
            "job": "blue-collar",
            "loan": "yes",
            "marital": "married",
            "month": "may",
            "nr.employed": 5099.1,
            "pdays": 999,
            "poutcome": "failure",
            "previous": 1
          },
      ]
    }
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())

# output should be similar to this: {"result": ["yes", "no"]}

TODO: Take a screenshot showing that the `endpoint.py` script runs against the API producing JSON output from the model.

## 7. Create, Publish and consume a pipeline
The experiment above already runs inside a pipeline. It therefore does not have to be created again, the pipeline was created in the above steps already.

I updated the notebook [aml-pipelines-with-automated-machine-learning-step.ipynb](aml-pipelines-with-automated-machine-learning-step.ipynb) to have the same keys, URI, dataset, cluster, and model names etc. we already created, but i also included all of it's parts inside this notebook so everything can be run using one single notebook.


- upload the Jupyter Notebook aml-pipelines-with-automated-machine-learning-step.ipynb to the Azure ML studio
- Update all the variables that are noted to match your environment
- Make sure a <code>config.json</code> has been downloaded and is available in the current working directory
- Run through the cells
- Verify the pipeline has been created and shows in Azure ML studio, in the <em>Pipelines</em> section
- Verify that the pipeline has been scheduled to run or is running


### Create Pipeline

In [None]:
# Set parameters for AutoMLConfig
# NOTE: DO NOT CHANGE THE experiment_timeout_minutes PARAMETER OR YOUR INSTANCE WILL TIME OUT.
# If you wish to run the experiment longer, you will need to run this notebook in your own
# Azure tenant, which will incur personal costs.

automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=ds_clean,
                             label_column_name="y",   
                             path = './pipeline-project',
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             experiment_timeout_minutes = 20,
                             max_concurrent_iterations = 5,
                             # n_cross_validations=3,
                             primary_metric = "AUC_weighted" #  or "accuracy"
                            )


metrics_output_name = 'metrics_output'
metrics_data = PipelineData(name='metrics_data',
                           datastore=ws.get_default_datastore(),
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))

best_model_output_name = 'best_model_output'
best_model_data = PipelineData(name='model_data',
                           datastore=ws.get_default_datastore(),
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

pipeline = Pipeline(
    description="pipeline_with_automlstep",
    workspace=ws,    
    steps=[AutoMLStep(
            name='automl_module',
            automl_config=automl_config,
            outputs=[metrics_data, best_model_data],
            allow_reuse=True)
          ])

pipeline_run = exp.submit(pipeline) #TODO: compute_target = cpu_cluster #config=automl_config

# Submit automl run
RunDetails(pipeline_run).show()

In [None]:
pipeline_run.wait_for_completion()

Screenshot showing that the experiment is shown as completed:
![experiment overview](images/experiments_overview.jpg)
![completed run](images/completed_run.jpg)
([picture of second try](images/experiments_overview_2.jpg))

### Retreive best Model for the Pipeline Run
(Documentation for the [PipelineRun Class](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinerun?view=azure-ml-py))


In [None]:
#download pipeline output about metrics (of child runs) and examine them
metrics_portref = pipeline_run.get_pipeline_output(metrics_output_name)
num_file_downloaded = metrics_portref.download('.', show_progress=True)

with open(metrics_portref._path_on_datastore) as f:
    metrics = f.read()
    
pd.DataFrame(json.loads(metrics)).T.applymap(lambda x: np.round(x[0],8))

In [None]:
# download pipeline output about the best model and examine it
best_model_portref = pipeline_run.get_pipeline_output(best_model_output_name)
num_file_downloaded = best_model_portref.download('.', show_progress=True)

with open(best_model_portref._path_on_datastore, "rb" ) as f:
    best_model = pickle.load(f)

# show best model
best_model

In [None]:
best_model.steps

In [None]:
You can see details of the best model _VotingEnsemble_ above, following is a screenshot of the list of AutoML models:
![automl model list](images/generated_models.jpg)
(Further screenshots of [best model](images/best_model.jpg), [best model steps](images/best_model_steps.jpg))

### Optional: Quick testing of the best model


In [None]:
# Load test data
ds_test = TabularDatasetFactory.from_delimited_files(path='https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_test.csv')

x, y = clean_data(ds_test)

# Fix differently named columns in test dataset
x.columns = [c.replace(".","_") for c in x.columns]

df_test = x.join(y)
df_test = df_test[pd.notnull(df_test['y'])]

y_test = df_test['y']
X_test = df_test.drop(['y'], axis=1)

# predict
y_test_pred = best_model.predict(X_test)

# Visualize via confusion matrix
pd.DataFrame(confusion_matrix(y_test, y_test_pred)).style.background_gradient(cmap='Blues', low=0, high=0.9)

### Optional: Save, Register and Deploy the Model of Pipeline

In [None]:
# save best model to disk
os.makedirs('outputs', exist_ok=True)
joblib.dump(best_model, 'outputs/best_model.joblib')

# register the model we just saved
registered_model = Model.register(workspace = ws,
                           model_name='reg-bankmarketing-model', 
                           model_path='outputs/best_model.joblib',
                           model_framework=Model.Framework.SCIKITLEARN,
                           description = "Best Model to classify the Bank Marketing Dataset",
                           model_framework_version=sklearn.__version__,
                           resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))

# deploy the model we just registered
# TODO: left out because nobody found a solution for the crashing container problem
# https://knowledge.udacity.com/questions/439802
# published_model = Model.deploy(ws, "pub-bankmarketing-model", [registered_model], deployment_config=LocalWebservice.deploy_configuration(port=8890))
# published_model.wait_for_deployment(show_output = True)
# print(published_model.state)

### Deploy Pipeline
Deploying the pipeline enables us to trigger a pipeline training run using an API call.
([Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-pipelines))


In [None]:
published_pipeline = pipeline_run.publish_pipeline(name="bankmarketing_pipeline", description="Training bankmarketing pipeline", version="1.0")
published_pipeline

Screenshot of the deployed pipeline:
![deployed pipeline endpoint](images/pipeline_endpoint.jpg)

### Consume Pipeline
Get the REST url from the endpoint property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the process_count_per_node is passed through to ParallelRunStep because you defined it is defined as a PipelineParameter object in the step configuration.

Make the request to trigger the run. Access the Id key from the response dict to get the value of the run id.

In [90]:
rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": "pipeline-rest-endpoint"}
                        )
try:
    response.raise_for_status()
except Exception:    
    raise Exception("Received bad response from the endpoint: {}\n"
                    "Response Code: {}\n"
                    "Headers: {}\n"
                    "Content: {}".format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

# Use run id to monitor status of new run. This will take 10-15 min, looks similar to previous pipeline run, so you can skip watching full output.
published_pipeline_run = PipelineRun(ws.experiments["pipeline-rest-endpoint"], run_id)
RunDetails(published_pipeline_run).show()

Submitted pipeline run:  550c659e-bf03-4f36-a53b-bcb2b3645550


_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

In [120]:
published_pipeline_run.wait_for_completion()

PipelineRunId: 550c659e-bf03-4f36-a53b-bcb2b3645550
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-rest-endpoint/runs/550c659e-bf03-4f36-a53b-bcb2b3645550?wsid=/subscriptions/d7f39349-a66b-446e-aba6-0053c2cf1c11/resourcegroups/aml-quickstarts-132956/workspaces/quick-starts-ws-132956

PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '550c659e-bf03-4f36-a53b-bcb2b3645550', 'status': 'Completed', 'startTimeUtc': '2021-01-02T16:41:21.826092Z', 'endTimeUtc': '2021-01-02T17:07:32.81689Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'Unavailable', 'runType': 'HTTP', 'azureml.parameters': '{}', 'azureml.pipelineid': 'b600d816-9f74-4421-9a8c-80c9c66b9fe1'}, 'inputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://mlstrg132956.blob.core.windows.net/azureml/ExperimentRun/dcid.550c659e-bf03-4f36-a53b-bcb2b3645550/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=sBioWumsSBWeIBSf1SlcmL9b8dnB5Oy

'Finished'

Screenshot of a successfull pipeline run:
![deployed pipeline run endpoint](images/pipeline_run_endpoint.jpg)

In [123]:
# model and metrics could now be downloaded using the same commands as in the initial training
#    published_pipeline_run.get_pipeline_output(metrics_output_name)
# training log etc. is in step run:
#    list(published_pipeline_run.get_steps())[0]

## 8. Documentation
### Screencast
In this project, you need to record a screencast that shows the entire process of the working ML application. The screencast should meet the following criteria: 1-5 min lenght, clear and understandable audio, at least full hd 16:9, readable text.

In this project, you need to record a screencast that shows the entire process of the working ML application. The screencast should meet the following criteria:
- Working deployed ML model endpoint
- deployed pipeline
- available automl model
- Successful API requests to the endpoint with a JSON payload

In case you are unable to provide an audio file, you can include a written description of your script instead of audio, if you prefer. Please include it in your README file.

### Screenshots
TODO: Please take the following screenshots to show your work:
- The pipeline section of Azure ML studio, showing that the pipeline has been created
- The pipelines section in Azure ML Studio, showing the Pipeline Endpoint
- The Bankmarketing dataset with the AutoML module
- The “Published Pipeline overview”, showing a REST endpoint and a status of ACTIVE
- In Jupyter Notebook, showing that the “Use RunDetails Widget” shows the step runs
- In ML studio showing the scheduled run

-------> insert link to youtube here

### Readme
An important part of your project submissions is a README file that describes the project and documents the main steps. Please use the README.md template provided to you as a start. The README should include the following areas:

- project overview
- architectural diagram
- short description how to improve project in the future
- all screenshots mentioned above with short descriptions
- link to the screencast video on youtube (or similar)

-------> insert link to readme here [README.md](README.md)

## 9. Optional: Benchmarking
The following is an optional step to benchmark the endpoint using Apache bench. You will not be graded on it but I encourage you to try it out.

Make sure you have the Apache Benchmark command-line tool installed and available in your path
<p>In the <code>endpoint.py</code>, replace the key and URI again</p>
<p>Run <code>endpoint.py</code>. A data.json file should appear</p>
<p>Run the <code>benchmark.sh</code> file. The output should look similar to the text below</p>


In [None]:
# ab -n 10 -v 4 -p data.json -T 'application/json' -H 'Authorization: Bearer REPLACE_WITH_KEY' http://REPLACE_WITH_API_URL/score
!ab -n 10 -v 4 -p data.json -T 'application/json' -H {'Authorization: ' + auth_header['Authorization'] + published_model.endpoint}

TODO: Take a screenshot showing that Apache Benchmark (ab) runs against the HTTP API using authentication keys to retrieve performance results

Run Apache Benchmark for 10 times, producing output similar to:

```
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io (be patient)...INFO: POST header ==
---
POST /score HTTP/1.0
Content-length: 812
Content-type: application/json
Authorization: Bearer Agb3D23IygXXXXXXXXXXXXXXXXXXXXXXXXX
Host: 8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io
User-Agent: ApacheBench/2.3
Accept: */*


---
LOG: header received:
HTTP/1.0 200 OK
Content-Length: 33
Content-Type: application/json
Date: Thu, 30 Jul 2020 12:33:34 GMT
Server: nginx/1.10.3 (Ubuntu)
X-Ms-Request-Id: babfc511-a0f0-4ecb-a243-b3010a76b8b9
X-Ms-Run-Function-Failed: False

"{\"result\": [\"yes\", \"no\"]}"
LOG: Response code = 200

..done

Server Software:        nginx/1.10.3
Server Hostname:        8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io
Server Port:            80

Document Path:          /score
Document Length:        33 bytes

Concurrency Level:      1
Time taken for tests:   1.599 seconds
Complete requests:      10
Failed requests:        0
Total transferred:      2600 bytes
Total body sent:        10560
HTML transferred:       330 bytes
Requests per second:    6.25 [#/sec] (mean)
Time per request:       159.918 [ms] (mean)
Time per request:       159.918 [ms] (mean, across all concurrent requests)
Transfer rate:          1.59 [Kbytes/sec] received
                        6.45 kb/s sent
                        8.04 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       21   23   0.8     23      24
Processing:    92  137  28.3    151     176
Waiting:       92  137  28.3    151     176
Total:        114  160  28.0    172     199
```

## 10. Optional: Cleanup
Not required, but i think this is really important in production

In [None]:
cpu_cluster.delete()