# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.widgets import RunDetails
import os
import joblib

from azureml.train.automl import AutoMLConfig
from pprint import pprint # Used in printing automl model parameters
from azureml.core import Model # Used to get model information

## Dataset

### Overview

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
from azureml.data.dataset_factory import TabularDatasetFactory
ds = TabularDatasetFactory.from_delimited_files("https://raw.githubusercontent.com/BAderinto/capstone-project/main/fetal_health.csv")

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'FetalHealthAutomlExp'

experiment=Experiment(ws, experiment_name)

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code RHLAUPXL9 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


In [4]:
data = ds.to_pandas_dataframe().dropna()
x_test = data.sample(10)
x_train = data.drop(x_test.index)

In [6]:
from azureml.core.dataset import Dataset
if not os.path.isdir('data'):
    os.mkdir('data')
    
# Save the train data to a csv to be uploaded to the datastore
x_train.to_csv("data/train_data.csv", index=False)

ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='fh_training_dataset', overwrite=True, show_progress=True)

 

# Upload the training data as a tabular dataset for access during training on remote compute
train_dataset = Dataset.Tabular.from_delimited_files(path=ds.path('fh_training_dataset/train_data.csv'))

Uploading an estimated of 1 files
Uploading ./data/train_data.csv
Uploaded ./data/train_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [7]:
from azureml.core.compute import ComputeTarget, AmlCompute

compute_name = os.environ.get('UDACITY_AML_COMPUTE_CLUSTER_NAME', 'FHCapstoneCompute')
compute_min_nodes = os.environ.get('UDACITY_AML_COMPUTE_CLUSTER_MIN_NODES', 0)
compute_max_nodes = os.environ.get('UDACITY_AML_COMPUTE_CLUSTER_MAX_NODES', 4)

vm_size = os.environ.get('UDACITY_AML_COMPUTE_CLUSTER_SKU', 'STANDARD_D2_V2')


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    print(compute_name+ ' already exist.')
else:
    provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
                                                                min_nodes=compute_min_nodes, 
                                                                max_nodes=compute_max_nodes)

    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    print(compute_target.get_status().serialize())

FHCapstoneCompute already exist.


## AutoML Configuration

In [8]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'accuracy'
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
        task='classification',
        compute_target=compute_target,
        training_data=train_dataset,
        label_column_name='fetal_health',
        n_cross_validations=5,
        **automl_settings
)

In [9]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [11]:
remote_run.wait_for_completion(show_output=True)


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|176                              |3.0                       

{'runId': 'AutoML_719dc484-cd77-4408-ba66-fb1b871b7adf',
 'target': 'FHCapstoneCompute',
 'status': 'Completed',
 'startTimeUtc': '2021-02-01T06:56:31.429357Z',
 'endTimeUtc': '2021-02-01T07:24:55.522481Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'FHCapstoneCompute',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"668b51d2-3217-4f4e-a99f-76f2fbf34103\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"fh_training_dataset/train_data.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-136799\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"d4ad7261-832d-46b2-b093-22156

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [12]:
# Get best run and model
best_run, fitted_model = remote_run.get_output()

# Print the best run
print(best_run)

# Get all metrics of the best run
best_run_metrics = best_run.get_metrics()

# Print all metrics of the best run
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Run(Experiment: FetalHealthAutomlExp,
Id: AutoML_719dc484-cd77-4408-ba66-fb1b871b7adf_55,
Type: azureml.scriptrun,
Status: Completed)
average_precision_score_weighted 0.9815976262219278
log_loss 0.1721237656562539
f1_score_micro 0.9527376332575047
AUC_weighted 0.9874571209516432
recall_score_micro 0.9527376332575047
matthews_correlation 0.8686743151126549
accuracy 0.9527376332575047
AUC_micro 0.9936655050120058
norm_macro_recall 0.8513632713742301
AUC_macro 0.9877827227457061
balanced_accuracy 0.90090884758282
precision_score_macro 0.9383426707637895
average_precision_score_micro 0.9883976937037154
precision_score_micro 0.9527376332575047
f1_score_weighted 0.9514295442283691
weighted_accuracy 0.9771936924343023
precision_score_weighted 0.9519586705306097
f1_score_macro 0.9174440105525207
recall_score_weighted 0.9527376332575047
recall_score_macro 0.90090884758282
average_precision_score_macro 0.9573238934268075
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_719dc484-cd77-4

In [14]:
#TODO: Save the best model
automl_best_Model = best_run.register_model(model_path='outputs/model.pkl', model_name='fh_best_automl_Model',
                        tags={'Training context':'Auto ML'},
                        properties={'Accuracy': best_run_metrics['accuracy']})

print(automl_best_Model)

Model(workspace=Workspace.create(name='quick-starts-ws-136799', subscription_id='d4ad7261-832d-46b2-b093-22156001df5b', resource_group='aml-quickstarts-136799'), name=fh_best_automl_Model, id=fh_best_automl_Model:1, version=1, tags={'Training context': 'Auto ML'}, properties={'Accuracy': '0.9527376332575047'})


In [15]:
# List registered models to verify if model has been saved
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

fh_best_automl_Model version: 1
	 Training context : Auto ML
	 Accuracy : 0.9527376332575047




## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [16]:
# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')

# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')

In [18]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='score.py',
                                    environment=best_run.get_environment())

# deploy
from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [automl_best_Model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

print(service.scoring_uri)

print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running........................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://803091ac-684a-45d4-876e-a9276fc3e4b4.southcentralus.azurecontainer.io/score
http://803091ac-684a-45d4-876e-a9276fc3e4b4.southcentralus.azurecontainer.io/swagger.json


In [21]:
import json

label = x_test.pop('fetal_health')

testing_data = json.dumps({'data': x_test.to_dict(orient='records')})

print(testing_data)

{"data": [{"baseline value": 128.0, "accelerations": 0.0, "fetal_movement": 0.006, "uterine_contractions": 0.009, "light_decelerations": 0.01, "severe_decelerations": 0.0, "prolongued_decelerations": 0.0, "abnormal_short_term_variability": 63.0, "mean_value_of_short_term_variability": 2.7, "percentage_of_time_with_abnormal_long_term_variability": 0.0, "mean_value_of_long_term_variability": 1.3, "histogram_width": 96.0, "histogram_min": 64.0, "histogram_max": 160.0, "histogram_number_of_peaks": 6.0, "histogram_number_of_zeroes": 1.0, "histogram_mode": 129.0, "histogram_mean": 111.0, "histogram_median": 128.0, "histogram_variance": 21.0, "histogram_tendency": 0.0}, {"baseline value": 129.0, "accelerations": 0.003, "fetal_movement": 0.001, "uterine_contractions": 0.0, "light_decelerations": 0.0, "severe_decelerations": 0.0, "prolongued_decelerations": 0.0, "abnormal_short_term_variability": 59.0, "mean_value_of_short_term_variability": 0.9, "percentage_of_time_with_abnormal_long_term_vari

TODO: In the cell below, send a request to the web service you deployed to test it.

In [22]:
import requests # Used for http post request

# Set the content type
headers = {'Content-type': 'application/json'}


response = requests.post(service.scoring_uri, testing_data, headers=headers)

In [23]:
# Print results from the inference
print(response.text)

"{\"result\": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]}"


In [24]:
# Print original labels
print(label)

2045   1.00
763    1.00
1090   1.00
955    1.00
504    1.00
1166   1.00
949    1.00
716    1.00
532    1.00
502    1.00
Name: fetal_health, dtype: float64


TODO: In the cell below, print the logs of the web service and delete the service

In [25]:
print(service.get_logs())

/usr/sbin/nginx: /azureml-envs/azureml_265db83b0c6014ce472c5de2f0b97e04/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_265db83b0c6014ce472c5de2f0b97e04/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_265db83b0c6014ce472c5de2f0b97e04/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_265db83b0c6014ce472c5de2f0b97e04/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_265db83b0c6014ce472c5de2f0b97e04/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
2021-02-01T07:32:35,063175422+00:00 - gunicorn/run 
2021-02-01T07:32:35,064703926+00:00 - iot-server/run 
2021-02-01T07:32:35,064904827+00:00 - nginx/run 
2021-02-01T07:32:35,065891629+00:00 - rsyslog/run 
rsyslogd

In [26]:
service.delete()