# Automated ML

we start by importing all the dependencies that we will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.data.dataset_factory import TabularDatasetFactory
import pandas as pd
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.model import Model
from azureml.core.model import InferenceConfig
from azureml.core import Workspace, Environment
from azureml.core.webservice import LocalWebservice 
import json
import joblib


## Dataset

### Overview
We first of all define the workspace and create the experiment 'Parkinson-classification-AutoML' 

We then access to the dataset from the external link (placed in the github)

We move then to prepare the data(read the data using TabularDatasetFactory  then transforme it to a dataframe then store it in a csv file ) and send it to the default datastore so that we can use it in the Automl Config 

To finally create the compute cluster 

The dataset used in this notebook  is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the data is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column)

In [2]:
#Connection to the workspace and definition of the experiment 
ws = Workspace.from_config()
experiment_name = 'Parkinson-classification-AutoML'

experiment=Experiment(workspace=ws, name=experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-132648
Azure region: southcentralus
Subscription id: a0a76bad-11a1-4a2d-9887-97a29122c8ed
Resource group: aml-quickstarts-132648


In [3]:
#Get the data from github
path_url = 'https://raw.githubusercontent.com/hananeouhammouch/Parkinsons-detection/master/parkinsons.data'
ds = TabularDatasetFactory.from_delimited_files(path = path_url)
ds.to_pandas_dataframe().head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [4]:
#Prepare the data and send it to the datastore
ds.to_pandas_dataframe().to_csv("./training_dataset.csv")
datastore = ws.get_default_datastore()
datastore.upload(src_dir = "./", target_path = "data/")
training_data = TabularDatasetFactory.from_delimited_files(path = [(datastore, ("data/training_dataset.csv"))])


Uploading an estimated of 14 files
Target already exists. Skipping upload for data/automl.ipynb
Target already exists. Skipping upload for data/automl.log
Target already exists. Skipping upload for data/azureml_automl.log
Target already exists. Skipping upload for data/best-trained-model-pakinson.pkl
Target already exists. Skipping upload for data/hyperparam_tuning.ipynb
Target already exists. Skipping upload for data/score.py
Target already exists. Skipping upload for data/train.py
Target already exists. Skipping upload for data/training_dataset.csv
Target already exists. Skipping upload for data/.ipynb_aml_checkpoints/automl-checkpoint2020-11-31-14-12-36.ipynb
Target already exists. Skipping upload for data/.ipynb_aml_checkpoints/hyperparam_tuning-checkpoint2020-11-31-14-10-14.ipynb
Target already exists. Skipping upload for data/.ipynb_checkpoints/automl-checkpoint.ipynb
Target already exists. Skipping upload for data/.ipynb_checkpoints/hyperparam_tuning-checkpoint.ipynb
Uploading .

In [5]:
#Creation of the compute-cluster
amlcompute_cluster_name = "cpu-cluster"

try:
    aml_compute = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=5)
    aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

aml_compute.wait_for_completion(show_output=True , min_node_count = 1, timeout_in_minutes = 2)

Creating
Succeeded....................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


## AutoML Configuration

In this section we configure the automnl setting and submit the experiment 

And Bellow is the description/Reason for the configuration used in Automl

|Setting |Reasons ?|
|-|-|
|**experiment_timeout_minutes**|Maximum amount of time in minutes that all iterations combined can take before the experiment terminates (15 minute because the dataset include only 195 lines)|
|**max_concurrent_iterations**|To help manage child runs  in parallele mode and when they can be performed, we create a dedicated cluster per experiment, and match the number of this setting (4) to the number of nodes in the cluster(5-1))|
|**n_cross_validations**|Number of cross validation (5) splits to ensure that they will be no overfiting |
|**primary_metric**|This is the metric that we want to optimize (accuracy) |
|**task**|classification |
|**compute_target**|To define the compute cluster we will be using |
|**training_data**|To specify the training dataset stored in the datastore  |
|**label_column_name**|To specify the dependent variable that we are trying to classify |

In [6]:
# Automl settings 
automl_settings = {
    "experiment_timeout_minutes" :15,
    "max_concurrent_iterations": 4,
    "n_cross_validations": 5,
    "primary_metric": 'accuracy',
}

# Automl config 
automl_config = AutoMLConfig(
    task="classification",
    compute_target=aml_compute,
    training_data=training_data,
    label_column_name="status",
    **automl_settings
)

In [7]:
# Submit experiment
auto_ml_run = experiment.submit(config = automl_config, show_output = True)


Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_89608153-f6ff-4964-aa1a-de68679df954

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values we

## Run Details

`RunDetails` widgets used here to show the different experiments.

In [8]:
RunDetails(auto_ml_run).show()
auto_ml_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_89608153-f6ff-4964-aa1a-de68679df954',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-12-31T16:55:54.287588Z',
 'endTimeUtc': '2020-12-31T17:20:25.463675Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"3a3d6301-e043-4c9c-895d-d683224b45d1\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/training_dataset.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-132648\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"a0a76bad-11a1-4a2d-9887-97a29122c8ed\\\\\\", \\\\\

## Best Model

The best model from the automl experiments is retrieved and  all the properties of the model are displayed



In [9]:
# Retrieve the best automl model.
best_run, fitted_model = auto_ml_run.get_output()
print(best_run)
print(fitted_model)

#Register the model
description ='Parkinson detection classification '
model_name='Parkinson-detection-automl'
model_path='./'
tags = None
model = auto_ml_run.register_model(model_name = model_name, description = description , tags = tags)
print(auto_ml_run.model_id)

Run(Experiment: Parkinson-classification-AutoML,
Id: AutoML_89608153-f6ff-4964-aa1a-de68679df954_5,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('StandardScalerWrapper',
                 <azureml...
                                   colsample_bytree=0.7, eta=0.3, gamma=0,
                                   learning_rate=0.1, max_delta_step=0,
                                   max_depth=6, max_leaves=31,
                    

## Model Deployment


In the cell below we first  register the model, then define the environement to finally  create the inference config and deploy the model as a web service.

In [10]:



#Save the best model for the deployement

joblib.dump(value=fitted_model, filename="best-trained-model-pakinson.pkl")

model = Model.register(model_path="best-trained-model-pakinson.pkl",
                       model_name="best-trained-model-pakinson",
                       workspace = ws)


service_name = 'automl-parkinson-v0'

env = Environment.get(workspace=ws, name="AzureML-AutoML")


inference_config = InferenceConfig(entry_script='score.py', environment=env)
deployment_conf = LocalWebservice.deploy_configuration(port=8088)


service_local = Model.deploy(workspace=ws,
                          name=service_name,
                          models=[model],
                          inference_config=inference_config,
                          
                          deployment_config=deployment_conf)
                       
service_local.wait_for_deployment(show_output=True)


print(service_local.state)
print("scoring URI: " + service_local.scoring_uri)

Registering model best-trained-model-pakinson
Downloading model best-trained-model-pakinson:10 to /tmp/azureml_qq_y24qw/best-trained-model-pakinson/10
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry viennaglobal.azurecr.io
Logging into Docker registry viennaglobal.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM viennaglobal.azurecr.io/azureml/azureml_4f3cee89203e005745d1830c04fe722a
 ---> c354c32ff2d1
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> fcd1e1835ac9
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6ImEwYTc2YmFkLTExYTEtNGEyZC05ODg3LTk3YTI5MTIyYzhlZCIsInJlc291cmNlR3JvdXBOYW1lIjoiYW1sLXF1aWNrc3RhcnRzLTEzMjY0OCIsImFjY291bnROYW1lIjoicXVpY2stc3RhcnRzLXdzLTEzMjY0OCIsIndvcmtzcGFjZUlkIjoiNjZkZGU0YjUtMTZhYS00MzgxLTliZmMtNmUyOTE4YmY5ZGY4In0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 6e4a4501a5cb
 --->

Send a request to the web deployed service  to test it.

In the cell below, the logs of the web service are printed once over the service is deleted

In [52]:

#choose a line to test with it 

data = {1:
        
          
        {"MDVP:Fo(Hz)":     125.64,
         "MDVP:Fhi(Hz)":       141.07,
         "MDVP:Flo(Hz)":       116.35,
         "MDVP:Jitter(%)":       0.03,
         "MDVP:Jitter(Abs)":     0.00,
         "MDVP:RAP":             0.02,
         "MDVP:PPQ":             0.02,
          "Jitter:DDP":           0.06,
          "MDVP:Shimmer":         0.09,
        "MDVP:Shimmer(dB)":     0.89,
 "Shimmer:APQ3":         0.05,
 "Shimmer:APQ5":         0.05,
 "MDVP:APQ":             0.06,
 "Shimmer:DDA":          0.16,
 "NHR":                  0.31,
 "HNR":                  8.87,
 "RPDE":                 0.67,
 "DFA":                  0.66,
 "spread1":             -3.70,
 "spread2":              0.26,
 "D":                    2.99,
 "PPE":                  0.37  
          },
      
    }

y_ds=1

json.dumps(data)

'{"1": {"MDVP:Fo(Hz)": 125.64, "MDVP:Fhi(Hz)": 141.07, "MDVP:Flo(Hz)": 116.35, "MDVP:Jitter(%)": 0.03, "MDVP:Jitter(Abs)": 0.0, "MDVP:RAP": 0.02, "MDVP:PPQ": 0.02, "Jitter:DDP": 0.06, "MDVP:Shimmer": 0.09, "MDVP:Shimmer(dB)": 0.89, "Shimmer:APQ3": 0.05, "Shimmer:APQ5": 0.05, "MDVP:APQ": 0.06, "Shimmer:DDA": 0.16, "NHR": 0.31, "HNR": 8.87, "RPDE": 0.67, "DFA": 0.66, "spread1": -3.7, "spread2": 0.26, "D": 2.99, "PPE": 0.37}}'

In [53]:
input_parkinson_case = json.dumps(x_ds.values.tolist())
output = service_local.run(input_parkinson_case)
print("Prediction for the parkinson case=>")
print(output)
print("Expectation for the parkinson case=>")
print(y_ds)

Prediction for the parkinson case=>
{"error": "list indices must be integers or slices, not str"}
Expectation for the parkinson case=>
1


In [49]:
print(service_local.get_logs())


2020-12-31T17:24:17,239749193+00:00 - rsyslog/run 
2020-12-31T17:24:17,241679036+00:00 - gunicorn/run 
2020-12-31T17:24:17,241600230+00:00 - iot-server/run 
2020-12-31T17:24:17,248303825+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [54]:
service_local.delete()
aml_compute.delete()

Container has been successfully cleaned up.
