# Automated ML

Importing Dependencies

In [1]:
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Initialize workspace

In [2]:
ws = Workspace.from_config()

# Optional: Creating a AMLCompute cluster

This step is optional. If you want to instantiate the AutoML compute cluster using the Python SDK, you can run the following code. However, once a compute instance is initiated within Designer, you don;t have to execute this step

`from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException`

### Update the cluster name to match the existing cluster
### Choose a name for your CPU cluster
`amlcompute_cluster_name = "Auto-ML-Compute"`

### Verify that cluster does not exist already
`try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)
`
`compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)`

For a more detailed view of current AmlCompute status, use `get_status().`

## Dataset

### Overview

 This experiment is run on custom Spotify data that includes a list of 'liked' tracks as well as disliked tracks. The Spotify API featurizes tracks according to "danceability","energy","key","loudness","mode","speechiness","acousticness","instrumentalness","liveness","valence","tempo"

A TabularDataset is then created using TabularDatasetFactory using the 'from_delimited_files()' method to pass a csv into a data structure Azure can work with

The dataset is then registered into AzureML workspace

In [3]:
from azureml.data.dataset_factory import TabularDatasetFactory

auto_ml_url_path='https://raw.githubusercontent.com/Mufumi/Udacity-Capstone-Project/main/Spotify_playlist/spotify_playlist.csv'
auto_ml_ds = TabularDatasetFactory.from_delimited_files(path=auto_ml_url_path)

auto_ml_ds = auto_ml_ds.register(workspace=ws,
                                 name='auto_ml_ds',
                                 description='auto_ml experiment training data')

auto_ml_df = auto_ml_ds.to_pandas_dataframe()

In [4]:
auto_ml_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,liked
0,0.273,0.163,7,-15.889,1,0.0306,0.853,1e-06,0.0835,0.202,68.994,1
1,0.784,0.75,7,-6.815,1,0.0459,0.233,0.21,0.107,0.358,130.024,0
2,0.572,0.209,8,-10.413,1,0.0313,0.765,0.0,0.356,0.446,80.069,1
3,0.583,0.625,1,-8.011,1,0.277,0.506,0.0,0.196,0.516,89.812,1
4,0.716,0.712,4,-6.247,1,0.128,0.218,0.0,0.112,0.562,84.978,0


### Splitting the Data into a training and test set

In [5]:
X_train, X_test = train_test_split(auto_ml_df, test_size=0.2, random_state=15)

# Setting up the AutoML config

An early `stopping_policy` was set to `True` so that underperforming models could be aborted. The `primary_metric` was set to `AUC_weighted` which is appropriate for classification problems. Automatic featurization of the data was enabled with the data considered to be standardized.

In [6]:
automl_settings = {
    "enable_early_stopping": True,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "n_cross_validations": 8,
}

`experiment_timeout_minutes` was set to 30 in order to minimize run cost, with the `task` selected as a `classfication` task. This is because our target variable will either be **True** or **False**

In [7]:

automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    task="classification",
    training_data=X_train,
    label_column_name="liked",**automl_settings)

In [8]:
#Submitting the experiment

auto_ml_experiment=Experiment(ws,"auto_ml_experiment")

# Submitting the automl run
auto_ml_run=auto_ml_experiment.submit(config=automl_config,show_output=True)

No run_configuration provided, running on local with default configuration
Running in the active local environment.


Experiment,Id,Type,Status,Details Page,Docs Page
auto_ml_experiment,AutoML_808d8d01-7318-463f-accb-7df463e309cb,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [9]:
from azureml.widgets import RunDetails

RunDetails(auto_ml_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [11]:
#TODO: Save the best model
import joblib
import pandas as pd

best_run,best_model = auto_ml_run.get_output()

test_data = {
    'danceability':0.273,
    'energy':0.163,
    'key':7,
    'loudness':-15.889,
    'mode':1,
    'speechiness':0.0306,
    'acousticness':0.853,
    'instrumentalness':1.01e-06,
    'liveness':0.0835,
    'valence':0.202,
    'tempo':68.994
}

# Convert data to dataframe and check if model is operational. The track specifications for the above track are ones I like. 
# Expecting model to output '1'
test_data_df=pd.DataFrame([test_data])
print("Based on the sampled data provided, the prediction is {}".format(best_model.predict(test_data_df)),'\n \n', 'The best run\'s details are: \n ',best_run, '\n','The best models\'s details are: \n \n',best_model)

joblib.dump(value=best_model,filename='auto-ml-best_run.pkl')

Based on the sampled data provided, the prediction is [1] 
 
 The best run's details are: 
  Run(Experiment: auto_ml_experiment,
Id: AutoML_808d8d01-7318-463f-accb-7df463e309cb_30,
Type: None,
Status: Completed) 
 The best models's details are: 
 
 Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
), random_state=None))], verbose=False)), ('8', Pipeline(memory=None, steps=[('robustscaler', RobustScaler(copy=True, quantile_range=[25, 75], with_centering=False, with_scaling=True)), ('kneighborsclassifier', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='manhattan', metric_params=None, n_jobs=1, n_neighbors=22, p=2, weights='distance'))], v

['auto-ml-best_run.pkl']

## Model Deployment

The first step in deployment requires us to register the model within the Azure Workspace.

In [12]:
# Register the model to deploy

model_name = best_run.properties['model_name']
description = 'AutoML model for predicting track recommendation'
tags={'area': "music", 'type': "classification"}
Auto_ML_model = auto_ml_run.register_model(model_name = model_name, 
                                  description = description, 
                                  tags = tags)

print("Model ID", auto_ml_run.model_id)

Model ID AutoML808d8d01730


# Set up environment

This step inititates the environment in which the model will be deplyed. Azure has the environment files that contain the specific packages required to set up deployment.

In [13]:
from azureml.core.environment import Environment
from azureml.automl.core.shared import constants

best_run.download_file(constants.CONDA_ENV_FILE_PATH, 'AML_test_env.yml') # check if this is the correct file in the child run
model_deploy_env = Environment.from_conda_specification(name="AML_test_env", file_path="AML_test_env.yml")

Check that model is registered in the Azure workspace by selecting the **models** tab

## Deploy model

In [14]:
from azureml.core.model import InferenceConfig, Model
from azureml.core import Environment
from azureml.core.webservice import AciWebservice, Webservice

## Instantiate the inference config

The inference class requires a `entry_script`. We can obtain this file from the workspace by accessing the running experiment, selecting the most recent run and the most recent child run. In **outputs and logs**, Azure places the trained model, environment dependencies, environment file set-up as well as other useful files associated with the experiment.

In [15]:
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'inference/score.py')

In [16]:
# Combine scoring script & environment in Inference configuration
# Ensure that no duplicate models are registered in workspace

inference_config = InferenceConfig(entry_script="./inference/score.py",
                                   environment=model_deploy_env)

# Set deployment configuration
aci_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               description = 'Spotify tracklist classification service')

# Define the model, inference, & deployment configuration and web service name and location to deploy
service = Model.deploy(workspace = ws,
                       name = "my-web-service",
                       models = [Auto_ML_model],
                       inference_config = inference_config,
                       deployment_config = aci_config)

In [17]:
service.wait_for_deployment(True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-10-06 13:42:34+00:00 Creating Container Registry if not exists..
2021-10-06 13:52:35+00:00 Registering the environment.
2021-10-06 13:52:36+00:00 Building image..
2021-10-06 14:03:53+00:00 Generating deployment configuration.
2021-10-06 14:03:54+00:00 Submitting deployment to compute..
2021-10-06 14:03:59+00:00 Checking the status of deployment my-web-service..
2021-10-06 14:07:42+00:00 Checking the status of inference endpoint my-web-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [18]:
print(service)

AciWebservice(workspace=Workspace.create(name='quick-starts-ws-160175', subscription_id='3e42d11f-d64d-4173-af9b-12ecaa1030b3', resource_group='aml-quickstarts-160175'), name=my-web-service, image_id=None, compute_type=None, state=ACI, scoring_uri=Healthy, tags=http://fa52c0d3-9bc5-424d-9adb-88cee0d4c9f4.southcentralus.azurecontainer.io/score, properties={}, created_by={'hasInferenceSchema': 'True', 'hasHttps': 'False'})


TODO: In the cell below, send a request to the web service you deployed to test it.

In [19]:
import requests
import json

uri = service.scoring_uri
key=''

data={
"data": [
    {
    "danceability":0.724,
    "energy":0.6,
    "key":1,
    "loudness":-6.25,
    "mode":0,
    "speechiness":0.087,
    "acousticness":0.28,
    "instrumentalness":6.83e-05,
    "liveness":0.108,
    "valence":0.201,
    "tempo":164.037
    }
  ],
    "method":"predict" 
}

input_data = json.dumps(data)

headers = {"Content-Type": "application/json"}
headers['Authorization'] = f'Bearer {key}' 

response = requests.post(uri, data=input_data, headers=headers)

print(response.text)

"{\"result\": [0]}"


TODO: In the cell below, print the logs of the web service and delete the service

In [20]:
print(service.get_logs())

2021-10-06T14:07:34,400764600+00:00 - iot-server/run 
2021-10-06T14:07:34,413791800+00:00 - rsyslog/run 
2021-10-06T14:07:34,426981300+00:00 - nginx/run 
2021-10-06T14:07:34,425476300+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
rsyslogd: /azureml-envs/azureml_09394784cb739a2b6e5ee3d577bc034d/lib/libuuid.so.1: no version information available (required by rsyslogd)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2021-10-06T14:07:34,941420200+00:00 - iot-server/finish 1 0
2021-10-06T14:07:34,943406900+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (68)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 99
SPARK_HOME not set. Skipping PySpark Initialization.
Generating new fontManager, this may take some time...
Initializing logger
2021-10-06 14:07:37,826 | root | INFO | Starting up app insights client
logging socket was 

In [22]:
service.delete()

No service with name my-web-service found to delete.
