# Automated ML

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
# Importing dependencies
from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


import os
import json
import joblib
import logging
import request

## Dataset

### Overview
For this project, I'm using the Heart Failure Prediction dataset from Kaggle. It contains 12 clinical features that can be used to predict mortality by heart failure. I have downloaded this data and stored in my github repository, using Tabular Datset Factory to get the data in a tabluar form.

In [None]:
# Getting data
data = TabularDatasetFactory.from_delimited_files("heart_failure_clinical_records_dataset.csv")

In [None]:
# Creating workspace and experiment

ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'ClassifyHeartFailure-AutoML'
experiment=Experiment(ws, experiment_name)

In [None]:
# Check if compute cluster exists, if not create one
cluster_name = "Compute-Standard" #"compute-cluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',min_nodes=1, max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)

# get status of the cluster
print(cpu_cluster.get_status().serialize())

## AutoML Configuration

AutoML config class used for submittting an automated ML experiment in the Azure Machine learning. Auto ML settings helps to moderate how we want our experiment to be run. In this case, wanted to experiment to timeout in 30 minutes, with max iterations to be executed in parallel is 5 and cross validation to perform is 2. Although there are many metrics, I choose to pick "accuracy" metric, as it would be good metric for simple datasets. These automl settings are passed on to AutoMLConfig class along with the compute instance, data, task type and label.

In [None]:
# Automl settings
automl_settings = {"experiment_timeout_minutes": 30,
                   "max_concurrent_iterations": 5,
                   "n_cross_validations": 2,
                   "primary_metric": 'accuracy',
                   "verbosity": logging.INFO
                  }

# TODO: Put your automl config here
automl_config = AutoMLConfig(task='classification',
                             compute_target=compute_target,
                             training_data=data,
                             label_column_name='DEATH_EVENT',
                             **automl_settings)

In [None]:
# Submit experiment
remote_run = experiment.submit(automl_config)

## Run Details
`RunDetails` widget to show the different experiments.

In [None]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

## Best Model

Get the best model from the automl experiments and display all the properties of the model.

In [None]:
# Finding best run and model
best_autoML_run, best_autoML_model = remote_run.get_output()

# print best run
print(best_autoML_run)

# print best model
print(best_autoML_model)

In [None]:
# print best model metrics

best_autoML_run_metrics = best_autoML_run.get_metrics()

for metric_name in best_autoML_run_metrics:
    metric = best_autoML_run_metrics[metric_name]
    print(metric_name, metric)

In [None]:
#Save the best model
best_automl_model = best_autoML_run.register_model(model_name='heart_failure_automl',
                                                   model_path='outputs/model.pkl', 
                                                   tags = {'Training context': 'Automated ML'},
                                                   properties = {'Accuracy': best_run_metrics['accuracy']})

print(best_automl_model)

## Model Deployment

As part of the project, trained both AutoML model and also the Hyper drive based model. Best model out of these two are picked for deployment. 

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
# Printing the registered & saved models
for model in model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

As we can see, Auto ML model performed best compared to hyper drive model. So, will be deployming Auto ML model.

In [None]:
# Downloadin the evironment and scoring file
best_autoML_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')
best_autoML_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoreScript.py')

In [None]:
inference_config = InferenceConfig(entry_script='scoreScript.py',environment=best_autoML_run.get_environment())

# Deploying the model
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, 
                       "heart_failure_classify_service", 
                       [best_automl_model], 
                       inference_config, 
                       deployment_config)

service.wait_for_deployment(show_output = True)
print(service.state)
print(service.scoring_uri)
print(service.swagger_uri)

In the cell below, send a request to the web service you deployed to test it.

In [None]:
# Creating Test data and labels 
test_df = data.to_pandas_dataframe().dropna().sample(2)
label_df = test_df.pop("DEATH_EVENT")

test_data = json.dumps({'data': test_df.to_dict(orient='records')})
print(test_data)

In [None]:
# Requesting webservice to get response

response = requests.post(service.scoring_uri, test_data, headers = {'Content-type': 'application/json'})
print(response.text)
print(label_df)

In the cell below, print the logs of the web service and delete the service

In [None]:
print(service.get_logs())

In [None]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
