# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

[Wine Quality Data Set](https://archive.ics.uci.edu/ml/datasets/Wine+Quality)

In [None]:
from azureml.core.workspace import Workspace
from azureml.core.datastore import Datastore
from azureml.core.compute import ComputeTarget
from azureml.core.compute.amlcompute import AmlCompute
from azureml.exceptions import ComputeTargetException
from azureml.core.experiment import Experiment
from azureml.core.run import Run
from azureml.core.dataset import Dataset
from azureml.core.model import Model

from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


from azureml.core.webservice import Webservice
from azureml.core.authentication import InteractiveLoginAuthentication

import pandas as pd

from azureml.pipeline.core.pipeline import Pipeline
from azureml.pipeline.core import PipelineData
from azureml.pipeline.core import TrainingOutput
from azureml.pipeline.core.run import PipelineRun
from azureml.pipeline.steps.automl_step import AutoMLStep

from azureml.train.automl.automlconfig import AutoMLConfig
from azureml.data import TabularDataset
from azureml.widgets.run_details import RunDetails

from azureml.automl.core.shared import constants

import json
import pickle
import requests

from pprint import pprint

import logging
import joblib

from train import clean_data, get_dataset
import capstone_constants as c_constants



## Dataset

### Overview
Overview¶
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

This machine learning program detects the wine quality of white wine.
The task is to determine if the wine quality is "good'" (1) or "not good" (0).
More information about the dataset is provided in the README for this Capstone Project.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

The dataset is external and the URI as defined in capstone_constants.py is:
TABULAR_WINE_DATA_URI = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv'

Note that the oriignal data above qualifies quality as a classfification between 1 and 10. However, this Capstone project transforms quality > 7 as "good" (1) and otherwise "not good" (0). The project is thus framed as a binary classification challenge.

In [None]:
ws = Workspace.from_config()

# choose a name for experiment
experiment=Experiment(ws, c_constants.AUTOML_EXPERIMENT_NAME)

In [None]:
# Next, let's use if it exists, or create if required, a compute cluster to be used by the ML

# Access the compute cluster. If it exists, we will have the compute object. 
# If it does not exist, an exception will be thrown upon which the compute cluster is created
try:
    cc = ComputeTarget(workspace=ws, name=c_constants.COMPUTE_CLUSTER_AUTOML)
    print(f'Compute Cluster target exists and we have a handle to the same')
except ComputeTargetException:
    # Failed to obtain the compute cluster object
    # In all likelihood, a compute cluster of that name has not been created
    # Attempt to create the compute cluster
    # First set up the configuration

    # Specify the configuration of the compute cluster
    cc_cfg = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2', min_nodes=1, max_nodes=6)
    cc = ComputeTarget.create(workspace=ws, name=c_constants.COMPUTE_CLUSTER_AUTOML, provisioning_configuration=cc_cfg)

# At this point - we have access to the compute cluster object. Wait for the compute target to complete provisioing
cc.wait_for_completion(show_output='True')

InProgress....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded......................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [None]:
# grab the data and create a dataset
train_ds = get_dataset(ws)

# Take a peek at the data by converting the same to a Pandas dataframe
proj_df = train_ds.to_pandas_dataframe()

# print the data
proj_df

Uploading an estimated of 2 files
Uploading train_normalized_data/normaliztion_parameters.csv
Uploaded train_normalized_data/normaliztion_parameters.csv, 1 files out of an estimated total of 2
Uploading train_normalized_data/train_normalized.csv
Uploaded train_normalized_data/train_normalized.csv, 2 files out of an estimated total of 2
Uploaded 2 files


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,-0.775936,0.215874,-0.117266,-0.924953,-0.630372,-0.312109,-0.196730,-0.978759,0.607503,-0.699639,0.557225,1
1,0.764589,0.414297,1.287463,-0.924953,-0.676143,0.511075,-0.690870,-1.145932,-0.783220,-0.261526,1.613603,0
2,-0.775936,-1.173090,-0.117266,-1.043248,-0.447289,2.275040,1.073916,-1.025567,0.210154,-0.699639,0.394706,0
3,2.542118,1.505626,-0.282528,-0.964385,0.193503,-1.664483,0.862141,0.358628,-0.518320,-0.349149,-0.661672,0
4,-0.301928,-0.280185,1.452725,0.317146,0.056190,1.510655,1.191568,0.672914,0.210154,0.001342,-1.067971,0
...,...,...,...,...,...,...,...,...,...,...,...,...
3913,0.409083,1.108779,2.444298,0.908622,-0.081122,1.157862,0.415062,1.127626,-0.650770,-0.436771,-1.474270,0
3914,-1.960955,0.315085,-1.108839,-0.786942,-0.859227,-0.488506,-0.502627,-1.727696,2.461801,-0.349149,2.019902,1
3915,-0.420430,0.811144,-0.199897,1.411377,0.330815,-0.900098,-0.596749,0.472306,0.077704,0.614700,0.394706,0
3916,-0.894438,3.192225,-1.769888,-0.905237,0.193503,-1.429287,-1.467376,-0.450491,1.534652,-0.086281,-0.092853,0


In [None]:
proj_df.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0,3918.0
mean,-0.001076,0.004447,0.001914,-0.004852,-0.005363,0.003518,-0.000805,-0.005278,0.003332,0.013083,0.005939,0.213629
std,0.992282,0.991335,1.003364,0.987326,0.998503,0.971193,0.99282,0.976292,1.009029,0.998059,0.999789,0.40992
min,-3.619982,-1.966784,-2.761461,-1.141827,-1.683102,-1.958477,-3.020388,-2.312802,-3.101091,-2.364468,-2.043089,0.0
25%,-0.657434,-0.677032,-0.530422,-0.924953,-0.447289,-0.664902,-0.69087,-0.76812,-0.65077,-0.699639,-0.824192,0.0
50%,-0.064924,-0.180973,-0.117266,-0.234898,-0.126893,-0.076914,-0.102608,-0.109457,-0.054746,-0.086281,-0.092853,0.0
75%,0.527585,0.414297,0.461152,0.68189,0.193503,0.628672,0.673898,0.692975,0.607503,0.527077,0.719745,0.0
max,5.860171,8.152811,10.955302,4.97009,13.741673,6.537956,5.368229,5.440699,4.183648,5.171074,2.99502,1.0


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

This project is a classification issue. More so, it is a binary classification issue as teh outcome is whether the wine is of a good quality or not.

AUC_weighted is an apporpriate metric to target for a binary classification.
[Set up AutoML training with Python](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train)

It is generally recommended to enable early stopping as it is possible that after a while no further improvement in the model is feasible.

There is enrally limited to no benefit to using a large number of cross validations. In this instance, we have set it to 3.

In [None]:
# TODO: Put your automl settings here

automl_settings = {
    "iterations" : 20,
    "experiment_timeout_minutes" : 30,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes" : 5,
    "max_concurrent_iterations" : 5,
    "max_cores_per_iteration" : -1,
    "n_cross_validations" : 3,
    "primary_metric" : 'AUC_weighted',
    "verbosity" : logging.INFO,
}

# Provide the remainder of the settings/configuration
# Note that we are not providing a validation data set - and we may need to
# 


# TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target = cc,
    task='classification',
    training_data=train_ds,
    label_column_name=c_constants.LABEL_COLUMN_NAME,
    featurization='auto',
    model_explainability=True,
    debug_log=c_constants.DEBUG_LOG,
    **automl_settings)

In [None]:
# TODO: Submit your experiment
automl_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
exp-capstone-automl,AutoML_85bbf589-3fd5-4fb0-987e-c1d7e4e7134d,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [None]:
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [None]:
automl_run.wait_for_completion()

{'runId': 'AutoML_85bbf589-3fd5-4fb0-987e-c1d7e4e7134d',
 'target': 'CPU-CC-AUTOML',
 'status': 'Completed',
 'startTimeUtc': '2021-10-28T14:13:57.48449Z',
 'endTimeUtc': '2021-10-28T14:24:45.17487Z',
 'services': {},
 'properties': {'num_iterations': '20',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'CPU-CC-AUTOML',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"1f2a27ce-3403-473d-9b97-2a362d10318e\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.34.0", "azureml-train": "1.34.0", "azureml-train-restclients-hyperdrive": "1.34.0", "azureml-train-core": "1.34.0", "azureml-train-automl": "1.34.0", "azureml-train-automl-runtime": "1.34.0", "azureml-train-automl

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0]+ ' - ')
        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):
            print("\nMeta Learner")
            pprint(step[1]._meta_learner)
            print()
            for estimator in step[1]._base_learners:
                print_model(estimator[1], estimator[0]+ ' - ')
        else:
            pprint(step[1].get_params())
            print()

In [None]:
automl_best_run, automl_best_model = automl_run.get_output()

automl_best_run_metrics = automl_best_run.get_metrics()

print(f'********** Best AutoML accuracy: {automl_best_run_metrics.get("accuracy")}')
print(f'********** printing Best AutoML run:\n{automl_best_run}\n\nPrinting model:')

print_model(automl_best_model)

********** Best AutoML accuracy: 0.8631955079122001
********** printing Best AutoML run:
Run(Experiment: exp-capstone-automl,
Id: AutoML_85bbf589-3fd5-4fb0-987e-c1d7e4e7134d_18,
Type: azureml.scriptrun,
Status: Completed)

Printing model:
datatransformer
{'enable_dnn': False,
 'enable_feature_sweeping': True,
 'feature_sweeping_config': {},
 'feature_sweeping_timeout': 86400,
 'featurization_config': None,
 'force_text_dnn': False,
 'is_cross_validation': True,
 'is_onnx_compatible': False,
 'observer': None,
 'task': 'classification',
 'working_dir': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/nb-compute/code/Users/odl_user_162292'}

prefittedsoftvotingclassifier
{'estimators': ['8', '5', '0', '13', '7', '12', '11', '10'],
 'weights': [0.2727272727272727,
             0.09090909090909091,
             0.09090909090909091,
             0.18181818181818182,
             0.09090909090909091,
             0.09090909090909091,
             0.09090909090909091,
             0.090909090

In [None]:
print(automl_run.get_metrics())

{'experiment_status': ['DatasetEvaluation', 'FeaturesGeneration', 'DatasetFeaturization', 'DatasetFeaturizationCompleted', 'DatasetCrossValidationSplit', 'ModelSelection'], 'experiment_status_description': ['Gathering dataset statistics.', 'Generating features for the dataset.', 'Beginning to fit featurizers and featurize the dataset.', 'Completed fit featurizers and featurizing the dataset.', 'Generating individually featurized CV splits.', 'Beginning model selection.'], 'average_precision_score_micro': 0.9419537078584689, 'precision_score_weighted': 0.8554471865513738, 'AUC_micro': 0.9405027411084976, 'precision_score_macro': 0.8232033042909573, 'precision_score_micro': 0.8631955079122001, 'balanced_accuracy': 0.7369832782165603, 'matthews_correlation': 0.5534908238635298, 'average_precision_score_macro': 0.857478041601255, 'recall_score_micro': 0.8631955079122001, 'recall_score_macro': 0.7369832782165603, 'f1_score_micro': 0.8631955079122001, 'norm_macro_recall': 0.4739665564331205,

In [None]:
# Create the outputs directpry
if 'output' not in os.listdir():
    os.mkdir(os.path.join('.', 'output'))

In [None]:
#TODO: Save the best model
joblib.dump(automl_best_model, os.path.join('output','best_automl.pkl'))

['output/best_automl.pkl']

In [None]:
print(f'{constants.CONDA_ENV_FILE_PATH}')

outputs/conda_env_v_1_0_0.yml


In [None]:
automl_best_run.download_file(constants.CONDA_ENV_FILE_PATH, os.path.join('output', c_constants.BEST_RUN_ENV))

In [None]:
automl_best_run.download_file(constants.SCORING_FILE_PATH, os.path.join('output', c_constants.INFERENCE_SCORING_SCRIPT))

In [None]:
automl_best_run.download_file(constants.SCORING_FILE_V2_PATH, os.path.join('output', 'score_v2.py'))

In [None]:
constants.SCORING_FILE_V2_PATH

'outputs/scoring_file_v_2_0_0.py'

In [None]:
constants.MODEL_FILENAME

'model.pkl'

In [None]:
constants.MODEL_PATH

'outputs/model.pkl'

In [None]:
dir(constants)

['API',
 'ARTIFACT_TAG',
 'AcquisitionFunction',
 'AggregationFunctions',
 'AutoMLDefaultTimeouts',
 'AutoMLJson',
 'AutoMLValidation',
 'CHILD_RUNS_SUMMARY_PATH',
 'CONDA_ENV_FILE_PATH',
 'CheckImbalance',
 'ClientErrors',
 'DEFAULT_LOGGING_APP_NAME',
 'DEPENDENCIES_PATH',
 'DatetimeDtype',
 'Defaults',
 'EARLY_STOPPING_NUM_LANDMARKS',
 'EnsembleConstants',
 'EnsembleMethod',
 'Enum',
 'ErrorLinks',
 'FeatureSweeping',
 'FitPipelineComponentName',
 'HyperparameterSweepingConstants',
 'IterationTimeout',
 'LOCAL_CHILD_RUNS_SUMMARY_PATH',
 'LOCAL_CONDA_ENV_FILE_PATH',
 'LOCAL_DEPENDENCIES_PATH',
 'LOCAL_MODEL_PATH',
 'LOCAL_MODEL_PATH_ONNX',
 'LOCAL_MODEL_PATH_TRAIN',
 'LOCAL_MODEL_RESOURCE_PATH_ONNX',
 'LOCAL_OUTPUT_PATH',
 'LOCAL_PIPELINE_GRAPH_PATH',
 'LOCAL_SCORING_FILE_PATH',
 'LOCAL_SCORING_FILE_V2_PATH',
 'LOCAL_VERIFIER_RESULTS_PATH',
 'LOW_MEMORY_THRESHOLD',
 'LegacyModelNames',
 'List',
 'MAX_ITERATIONS',
 'MAX_SAMPLES_AUTOBLOCK',
 'MAX_SAMPLES_AUTOBLOCKED_ALGOS',
 'MLFLOW_OUT

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
# Refer - https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python

# Tutorial: Deploy an image classification model in Azure Container Instances -
# https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-deploy-models-with-aml

# Register the model
# registered_model = automl_run.register_model(model_name='wine-taste-automl', description=c_constants.DEPLOYED_AUTOML_MODEL_DESCRIPTION)
registered_model = automl_best_run.register_model(model_path=constants.MODEL_PATH, 
                                                model_name='wine-taste-automl-2', 
                                                description=c_constants.DEPLOYED_AUTOML_MODEL_DESCRIPTION,
                                                tags={'Method of execution':'AutoML'},
                                                properties={'Accuracy':automl_best_run_metrics['accuracy']})
print(f'{automl_run.model_id}')
print(f'{registered_model.name}  {registered_model.id}  {registered_model.version}')


wine-taste-automl
wine-taste-automl-2  wine-taste-automl-2:1  1


In [None]:
print(f'{automl_best_run.model_id}')

AttributeError: 'Run' object has no attribute 'model_id'

In [None]:
type(automl_run)

azureml.train.automl.run.AutoMLRun

In [None]:
#Model(workspace, name=None, id=None, tags=None, properties=None, version=None, run_id=None, model_framework=None, expand=True, **kwargs)
retrieved_model = Model(workspace=ws, name='wine-taste-automl-2')

In [None]:
curated_env = Environment.get(workspace=ws, name=c_constants.CURATED_ENV_NAME)

# Save the curated environment
curated_env.save_to_directory(path=c_constants.ENV_DIR, overwrite=True)


In [None]:
# Create an inference config

inference_config = InferenceConfig(
    environment=Environment.from_conda_specification(name='myenv', file_path=os.path.join('output', 'best_run_env.yml')),
    source_directory='output',
    entry_script=c_constants.INFERENCE_SCORING_SCRIPT,
)

aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)


In [None]:

service = Model.deploy(workspace=ws,
                       name=c_constants.DEPLOYED_SERVICE,
                       models=[retrieved_model],
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)
service.wait_for_deployment(show_output=True)



Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-10-28 15:33:13+00:00 Creating Container Registry if not exists.

In [None]:
logs = service.get_logs()

for line in logs.split('\n'):
    print(line)


TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
# To enable ApplicationInsights on the service (webservice), 
# * first access the endpoint using the name assigned at the time of deployment
# * next update webservice parameters such as enabling application insights (enable_app_insights)

webservice = Webservice(
    workspace = ws,
    name=c_constants.DEPLOYED_SERVICE
)

webservice.update(
    enable_app_insights=True
)

# At this point application insights (logging is enabled) and can be
# checked in the GUI in AutoML studio

In [None]:
# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score'

# From the tail end of the code at
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python
# - Deploy machine learning models to Azure


scoring_uri = webservice.scoring_uri

# If the service is authenticated, set the key or token
key, _ = webservice.get_keys()

# Set the appropriate headers
headers = {"Content-Type": "application/json"}
headers["Authorization"] = f"Bearer {key}"



# fixed ac	   volatile ac	citric acid	  residual sugar	chlorides	  free sulfurdi	total sulfurdi	density	       pH	        sulphates	    alcohol	quality		
# 0.883090875	0.3150853064	-0.5304215055	-0.1166025484	-0.447289012	-0.7237011554	-0.6908704601	-0.01249670459	1.004852702	0.4394546089	0.3947056997	0		
# 0.7645889612	1.307202455	-0.8609459206	1.657825186	0.3765862299	-0.4297069397	0.8386109571	1.655893566	-0.05474573919	0.001341709573	-0.6616718988	0		




# Two sets of data to score, so we get two results back
# data = {"data":
#         [
#           {
#             "fixed acidity": 0.883090875,
#             "volatile acidity": "0.3150853064",
#             "citric acid": "-0.5304215055",
#             "residual sugar": "-0.1166025484",
#             "chlorides": "-0.447289012",
#             "free sulfur dioxide": "-0.7237011554",
#             "total sulfur dioxide": "-0.6908704601",
#             "density": "-0.01249670459",
#             "pH": "1.004852702",
#             "sulphates": "0.4394546089",
#             "alcohol": 0.3947056997,
#           },
#           {
#             "fixed acidity": 0.7645889612,
#             "volatile acidity": "1.307202455",
#             "citric acid": "-0.8609459206",
#             "residual sugar": "1.657825186",
#             "chlorides": "0.3765862299",
#             "free sulfur dioxide": "-0.4297069397",
#             "total sulfur dioxide": "0.8386109571",
#             "density": "1.655893566",
#             "pH": "-0.05474573919",
#             "sulphates": "0.001341709573",
#             "alcohol": 0.3947056997,
#           },
#       ]
#     }


data = {"data":
        [
          [
           0.883090875,
           0.3150853064,
          -0.5304215055,
          -0.1166025484,
          -0.447289012,
          -0.7237011554,
          -0.6908704601,
          -0.01249670459,
          1.004852702,
          0.4394546089,
          0.3947056997
          ],
          [
          0.7645889612,
          1.307202455,
          -0.8609459206,
          1.657825186,
          0.3765862299,
          -0.4297069397,
          0.8386109571,
          1.655893566,
          -0.05474573919,
          0.001341709573,
          0.3947056997
          ]
        ]
    }

# Convert to JSON string
input_data = json.dumps(data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
logs = webservice.get_logs()

for line in logs.split('\n'):
    print(line)



In [None]:
# Clean up any resources
# Delete the Webservice
# delete the compute cluster

webservice.delete()
cc.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
