# Operationalizing Machine Learning
** Project 2 **
[[View Rubric](https://review.udacity.com/#!/rubrics/2893/view)]

## Initialization



In [11]:
!python --version

Python 3.9.1


## Authentication

In [None]:
# skipped granting rights because using azure environment provided by udacity



## Prepare Dataset

In [None]:
from azureml.data.dataset_factory import TabularDatasetFactory

# Create TabularDataset using TabularDatasetFactory
# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py
dataset_path = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv"
# i download and import the _train.csv, so no further splitting is necessary
ds = TabularDatasetFactory.from_delimited_files(path=dataset_path)
ds

In [None]:
from train import clean_data

# Use the clean_data function to clean your data.
x, y = clean_data(ds)
ds_clean = x.join(y)

TODO Take a screenshot of “Registered Datasets” in ML Studio showing that Bankmarketing dataset available

## Automated ML Experiment

<p>Run the experiment using  <em>Classification</em>, ensure <em>Explain best model</em> is checked. <br> On Exit criterion, reduce the default (3 hours) to 1 and reduce the <em>Concurrency </em> from default to 5 (this number should always be less than the number of the compute cluster) <br><br> Note: This process takes about 15 minutes and it runs about 5 minutes per iteration</p>


In [None]:
# create an experiment using Automated ML
from azureml.core import Workspace, Experiment
from azureml.train.automl import AutoMLConfig

ws = Workspace.get(name="quick-starts-ws-128192") # UPDATE THIS LINE WITH EACH NEW VM INSTANCE!
exp = Experiment(workspace=ws, name="udacity-project")

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = exp.start_logging()

# DONT FORGET TO CLICK THE LOGIN LINK!

# Set parameters for AutoMLConfig
# NOTE: DO NOT CHANGE THE experiment_timeout_minutes PARAMETER OR YOUR INSTANCE WILL TIME OUT.
# If you wish to run the experiment longer, you will need to run this notebook in your own
# Azure tenant, which will incur personal costs.
automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    task="classification",
    primary_metric="accuracy",
    training_data=ds_clean,
    label_column_name="y",
    n_cross_validations=3)


# configure a compute cluster

# Create compute cluster "Standard_DS12_v2" and min number of nodes = 1
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpucluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)


# use that cluster to run the experiment.

# Submit automl run
automl_run = exp.submit(config=automl_config) #TODO: compute_target = cpu_cluster
RunDetails(automl_run).show()
automl_run.wait_for_completion()


TODO: Take a screenshot showing that the experiment is shown as completed


## Deploy the best model
After the experiment run completes, a summary of all the models and their metrics are shown, including explanations. The Best Model will be shown in the Details tab. In the Models tab, it will come up first (at the top). Make sure you select the best model for deployment.

Deploying the Best Model will allow to interact with the HTTP API service and interact with the model by sending data over POST requests.

1. Select the <strong>best</strong> model for deployment
2. Deploy the model and enable "Authentication"
3. Deploy the model using Azure Container Instance (ACI)



In [None]:
# find best model
# Retrieve and save your best automl model.
# Get your best run and save the model from that run.
#best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_automl_run, best_automl_model = automl_run.get_output()
best_automl_run_metrics = best_automl_run.get_metrics()

#parameter_values = best_automl_run.get_details()['runDefinition']['arguments']

#print('Best Run Id: ', best_automl_run.id)
print('Accuracy:', best_automl_run_metrics['accuracy'])
print('Metrics:', best_automl_run_metrics)
#print('Inverse of regularization strength:',parameter_values[1])
#print('Maximum number of iterations to converge:',parameter_values[3])
print("Model",best_automl_model)

# save best model
print("Files", best_automl_run.get_file_names())
# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#download-file-name--output-file-path-none---validate-checksum-false-
best_automl_run.download_file('outputs/model.pkl', output_file_path='best_automl_model.joblib')

# register best model
best_automl_model_reg = best_automl_run.register_model(model_name='best_automl_model', model_path='outputs/model.pkl')

TODO: Take a screenshot of the best model after the experiment completes

## Enable logging / Application Insights
Now that the Best Model is deployed, enable Application Insights and retrieve logs. Although this is configurable at deploy time with a check-box, it is useful to be able to run code that will enable it for you.


In [None]:
# Ensure <code>az</code> is installed, as well as the Python SDK for Azure
# Create a new virtual environment with Python3
# Write and run code to enable Application Insights
# Use the provided code <code>logs.py</code> to view the logs

TODO: Take a screenshot showing that "Application Insights" is enabled in the Details tab of the endpoint.

TODO: Take a screenshot showing logs by running the provided <code>logs.py</code> script

## Swagger Documentation
In this step, you will consume the deployed model using Swagger.

Azure provides a Swagger JSON file for deployed models. Head to the Endpoints section, and find your deployed model there, it should be the first one on the list.

A few things you need to pay attention to:

swagger.sh will download the latest Swagger container, and it will run it on port 80. If you don't have permissions for port 80 on your computer, update the script to a higher number (above 9000 is a good idea).

serve.py will start a Python server on port 8000. This script needs to be right next to the downloaded swagger.json file. NOTE: this will not work if swagger.json is not on the same directory.



In [None]:
# Download the swagger.json file
# <p>Run the <code>swagger.sh</code> and <code>serve.py</code></p>
# Interact with the swagger instance running with the documentation for the HTTP API of the model.
# Display the contents of the API for the model

TODO: Take a screenshot showing that swagger runs on localhost showing the HTTP API methods and responses for the model


## Consume model endpoints
Once the model is deployed, use the endpoint.py script provided to interact with the trained model. In this step, you need to run the script, modifying both the scoring_uri and the key to match the key for your service and the URI that was generated after deployment.

Hint: This URI can be found in the Details tab, above the Swagger URI.




In [None]:
# <p>Modifying both the <code>scoring_uri</code> and the <code>key</code> to match the key for your service and the URI that was generated after deployment</p>

# <p>Execute the <code>endpoint.py</code> file, the output should be similar to the following: <br><code>{"result": ["yes", "no"]}</code></p>



<p>Take a screenshot showing that the<code>endpoint.py</code> script runs against the API producing JSON output from the model.</p>

## Optional: Benchmarking
The following is an optional step to benchmark the endpoint using Apache bench. You will not be graded on it but I encourage you to try it out.



In [None]:
# Make sure you have the Apache Benchmark command-line tool installed and available in your path
# <p>In the <code>endpoint.py</code>, replace the key and URI again</p>
# <p>Run <code>endpoint.py</code>. A data.json file should appear</p>
# <p>Run the <code>benchmark.sh</code> file. The output should look similar to the text below</p>

TODO: Take a screenshot showing that Apache Benchmark (ab) runs against the HTTP API using authentication keys to retrieve performance results

 ```
 This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/

 Benchmarking 8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io (be patient)...INFO: POST header ==
 ---
 POST /score HTTP/1.0
 Content-length: 812
 Content-type: application/json
 Authorization: Bearer Agb3D23IygXXXXXXXXXXXXXXXXXXXXXXXXX
 Host: 8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io
 User-Agent: ApacheBench/2.3
 Accept: */*


 ---
 LOG: header received:
 HTTP/1.0 200 OK
 Content-Length: 33
 Content-Type: application/json
 Date: Thu, 30 Jul 2020 12:33:34 GMT
 Server: nginx/1.10.3 (Ubuntu)
 X-Ms-Request-Id: babfc511-a0f0-4ecb-a243-b3010a76b8b9
 X-Ms-Run-Function-Failed: False

 "{\"result\": [\"yes\", \"no\"]}"
 LOG: Response code = 200
 LOG: header received:
 HTTP/1.0 200 OK
 Content-Length: 33
 Content-Type: application/json
 Date: Thu, 30 Jul 2020 12:33:34 GMT
 Server: nginx/1.10.3 (Ubuntu)
 X-Ms-Request-Id: b48dd8da-0b4e-44fd-a1e5-04043bfa77f1
 X-Ms-Run-Function-Failed: False
```

## Create and publish a pipeline
For this part of the project, you will use the Jupyter Notebook provided in the starter files. You must make sure to update the notebook to have the same keys, URI, dataset, cluster, and model names already created.

In [None]:
# upload the Jupyter Notebook aml-pipelines-with-automated-machine-learning-step.ipynb to the Azure ML studio

# Update all the variables that are noted to match your environment

# Make sure a <code>config.json</code> has been downloaded and is available in the current working directory

# Run through the cells

# Verify the pipeline has been created and shows in Azure ML studio, in the <em>Pipelines</em> section

# Verify that the pipeline has been scheduled to run or is running


TODO: Please take the following screenshots to show your work:
- The pipeline section of Azure ML studio, showing that the pipeline has been created
- The pipelines section in Azure ML Studio, showing the Pipeline Endpoint
- The Bankmarketing dataset with the AutoML module
- The “Published Pipeline overview”, showing a REST endpoint and a status of ACTIVE
- In Jupyter Notebook, showing that the “Use RunDetails Widget” shows the step runs
- In ML studio showing the scheduled run


## Documentation

### Screencast
In this project, you need to record a screencast that shows the entire process of the working ML application. The screencast should meet the following criteria: 1-5 min lenght, clear and understandable audio, at least full hd 16:9, readable text.

In this project, you need to record a screencast that shows the entire process of the working ML application. The screencast should meet the following criteria:
- Working deployed ML model endpoint
- deployed pipeline
- available automl model
- Successful API requests to the endpoint with a JSON payload

In case you are unable to provide an audio file, you can include a written description of your script instead of audio, if you prefer. Please include it in your README file.


In [None]:
# insert link to youtube here

## Readme
An important part of your project submissions is a README file that describes the project and documents the main steps. Please use the README.md template provided to you as a start. The README should include the following areas:

- project overview
- architectural diagram
- short description how to improve project in the future
- all screenshots mentioned above with short descriptions
- link to the screencast video on youtube (or similar)

In [None]:
# insert link to readme here

## Cleanup
Not required but included because i think it's important

In [None]:
cpu_cluster.delete()