# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [130]:
from azureml.core import Workspace,Experiment
import pandas as pd
import os
import sklearn

## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig

# This experiment is run on custom Spotify data that includes a list of 'liked' tracks as well as disliked tracks. The Spotify API featurizes tracks according to "danceability","energy","key","loudness","mode","speechiness","acousticness","instrumentalness","liveness","valence","tempo"
# A TabularDataset is then created using TabularDatasetFactory using the 'from_delimited_files()' method to pass a csv into a data structure Azure can work with

auto_ml_url_path='https://raw.githubusercontent.com/Mufumi/Udacity-Capstone-Project/main/Spotify_playlist/spotify_playlist.csv'
auto_ml_ds = TabularDatasetFactory.from_delimited_files(path=auto_ml_url_path)

In [3]:
# TODO: Put your automl settings here
# automl_settings = {}

# TODO: Put your automl config here

# Set parameters for AutoMLConfig
# NOTE: DO NOT CHANGE THE experiment_timeout_minutes PARAMETER OR YOUR INSTANCE WILL TIME OUT.
# If you wish to run the experiment longer, you will need to run this notebook in your own
# Azure tenant, which will incur personal costs.
automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    task="classification",
    primary_metric="accuracy",
    training_data=auto_ml_ds,
    label_column_name="liked",
    n_cross_validations=8)

In [4]:
ws = Workspace.from_config()

# TODO: Submit your experiment
auto_ml_experiment=Experiment(ws,"auto_ml_experiment")

# Submit your automl run
auto_ml_run=auto_ml_experiment.submit(config=automl_config,show_output=True)

No run_configuration provided, running on local with default configuration
Running in the active local environment.


Experiment,Id,Type,Status,Details Page,Docs Page
auto_ml_experiment,AutoML_19b2c7bc-4f1a-47b9-8e97-63d12892f11f,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.



## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [5]:
from azureml.widgets import RunDetails

RunDetails(auto_ml_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [96]:
#TODO: Save the best model
import joblib

best_run,best_model = auto_ml_run.get_output()

test_data = {
    'danceability':0.273,
    'energy':0.163,
    'key':7,
    'loudness':-15.889,
    'mode':1,
    'speechiness':0.0306,
    'acousticness':0.853,
    'instrumentalness':1.01e-06,
    'liveness':0.0835,
    'valence':0.202,
    'tempo':68.994
}

# Convert data to dataframe and check if model is operational. The track specifications for the above track are ones I like. 
# Expecting model to output '1'
test_data_df=pd.DataFrame([test_data])
print(best_model.predict(test_data_df))

joblib.dump(value=best_model,filename='auto-ml-best_run.pkl')

[1]


['auto-ml-best_run.pkl']

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [120]:
from azureml.core.model import InferenceConfig, Model
from azureml.core import Environment
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.webservice import LocalWebservice

# Register the model to deploy
model_name = best_run.properties['model_name']
description = 'AutoML model for predicting track recommendation'
tags={'area': "music", 'type': "classification"}
Auto_ML_model = Model.register(model_name = model_name, 
                                  description = description, 
                                  tags = tags,model_path='auto-ml-best_run.pkl',workspace=ws)

Registering model AutoML19b2c7bc439


INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


In [121]:
# Managing directories and dependencies

source_directory = "source_directory"

os.makedirs(source_directory, exist_ok=True)
os.makedirs(os.path.join(source_directory, "dependencies/target"), exist_ok=True)
os.makedirs(os.path.join(source_directory, "env"), exist_ok=True)
os.makedirs(os.path.join(source_directory, "dockerstep"), exist_ok=True)

## Instantiate test data

Magic line method '%%writefile' must be first entry of cell

In [122]:
%%writefile source_directory/testdata.json

{
    "danceability":0.724,
    "energy":0.6,
    "key":1,
    "loudness":-6.25,
    "mode":0,
    "speechiness":0.087,
    "acousticness":0.28,
    "instrumentalness":6.83e-05,
    "liveness":0.108,
    "valence":0.201,
    "tempo":164.037
}

Overwriting source_directory/testdata.json


## Instantiate scoring script

Magic line method '%%writefile' must be first entry of cell

In [123]:
!pip install inference-schema



# Testing if saved model responds to predicting data

`import joblib
import json`

`model_path = './auto-ml-best_run.pkl
model = joblib.load(model_path)`

**Check if model is working**

`with open ('./source_directory/testdata.json','r') as f:
    loaded_data = json.loads(f.read())`

`loaded_data_df=pd.DataFrame([loaded_data])`

**Check if model can predict loaded data**
 
 `print(model.predict(loaded_data_df))`

In [137]:
%%writefile ./source_directory/dependencies/target/score.py
import joblib
import json
import numpy as np

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType

def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment. Join this path with the filename of the model file.
    # It holds the path to the directory that contains the deployed model (./azureml-models/$MODEL_NAME/$VERSION)
    # If there are multiple models, this value is the path to the directory containing all deployed models (./azureml-models)
    model_path = './auto-ml-best_run.pkl'#os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'best_run_hd.pkl')
    # Deserialize the model file back into a sklearn model.
    model = joblib.load(model_path)

    # Note here, the entire source directory from inference config gets added into image.
    # Below is an example of how you can use any extra files in image.
    with open('./source_directory/testdata.json') as json_file:
        loaded_data = json.load(json_file)

input_sample = {
    "danceability":0.724,
    "energy":0.6,
    "key":1,
    "loudness":-6.25,
    "mode":0,
    "speechiness":0.087,
    "acousticness":0.28,
    "instrumentalness":6.83e-05,
    "liveness":0.108,
    "valence":0.201,
    "tempo":164.037
}
output_sample = 0

@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(loaded_data):
    try:
        loaded_data_df=pd.DataFrame([loaded_data])
        result = model.predict(loaded_data_df)
        prediction = 'Disliked'
        if result==0:
            prediction=prediction
        else:
            prediction='Liked'
        # You can return any JSON-serializable object.
        return "The track has features that you " + prediction
    except Exception as e:
        error = str(e)
        return error

Writing ./source_directory/dependencies/target/score.py


In [127]:
print(type(Auto_ML_model))

<class 'azureml.core.model.Model'>


In [138]:
# Combine scoring script & environment in Inference configuration

AML_test_env = Environment(name="project-environment")

AML_test_env.python.conda_dependencies.add_pip_package("inference-schema[numpy-support]")
AML_test_env.python.conda_dependencies.add_pip_package("joblib")
AML_test_env.python.conda_dependencies.add_pip_package("scikit-learn=={}".format(sklearn.__version__))

# explicitly set base_image to None when setting base_dockerfile
AML_test_env.docker.base_image = None
AML_test_env.docker.base_dockerfile = "FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04\nRUN echo \"this is test\""
AML_test_env.inferencing_stack_version = "latest"

inference_config = InferenceConfig(entry_script="./source_directory/dependencies/target/score.py",
                                   environment=AML_test_env)

# Set deployment configuration
deployment_config = LocalWebservice.deploy_configuration(port=6789)

# Define the model, inference, & deployment configuration and web service name and location to deploy
service = Model.deploy(workspace = ws,
                       name = "my-web-service",
                       models = [Auto_ML_model],
                       inference_config = inference_config,
                       deployment_config = deployment_config)

Downloading model AutoML19b2c7bc439:2 to /tmp/azureml_lgwqd7n9/AutoML19b2c7bc439/2
Generating Docker build context.


INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


Package creation Succeeded
Logging into Docker registry 158808ea80d846478b54f6e9925056b8.azurecr.io
Logging into Docker registry 158808ea80d846478b54f6e9925056b8.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM 158808ea80d846478b54f6e9925056b8.azurecr.io/azureml/azureml_3e92eb2a869925c317eb4b6ed4b5e2a4
 ---> 5b6aad60a70f
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> dc2a81abb381
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjlhNzUxMWI4LTE1MGYtNGE1OC04NTI4LTNlN2Q1MDIxNmMzMSIsInJlc291cmNlR3JvdXBOYW1lIjoiYW1sLXF1aWNrc3RhcnRzLTE1OTI4NSIsImFjY291bnROYW1lIjoicXVpY2stc3RhcnRzLXdzLTE1OTI4NSIsIndvcmtzcGFjZUlkIjoiMTU4ODA4ZWEtODBkOC00NjQ3LThiNTQtZjZlOTkyNTA1NmI4In0sIm1vZGVscyI6e30sIm1vZGVsc0luZm8iOnt9fQ== | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in e37a789d70cb
 ---> 083282a7ab5d
Step 4/5 : RUN mv '/var/azureml-app/tmp3f_74o0w.py' /var/azureml-app/main.py
 ---> Running in 4f63fe492603
 --

In [139]:
print(service)

LocalWebservice(workspace=Workspace.create(name='quick-starts-ws-159285', subscription_id='9a7511b8-150f-4a58-8528-3e7d50216c31', resource_group='aml-quickstarts-159285'), name=my-web-service, image_id=None, compute_type=None, state=Local, scoring_uri=deploying, tags=http://localhost:6789/score, properties=None, created_by=None)


TODO: In the cell below, send a request to the web service you deployed to test it.

In [140]:
import requests
import json

uri = service.scoring_uri
requests.get("http://localhost:6789")
headers = {"Content-Type": "application/json"}
data = {
    
    "danceability":0.724,
    "energy":0.6,
    "key":1,
    "loudness":-6.25,
    "mode":0,
    "speechiness":0.087,
    "acousticness":0.28,
    "instrumentalness":6.83e-05,
    "liveness":0.108,
    "valence":0.201,
    "tempo":164.037
}
data = json.dumps(data)
response = requests.post(uri, data=data, headers=headers)

print(response.json())

ConnectionError: HTTPConnectionPool(host='localhost', port=6789): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ec7a90470>: Failed to establish a new connection: [Errno 111] Connection refused',))

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


TODO: In the cell below, print the logs of the web service and delete the service

In [21]:
print(service.get_logs())

2021-09-17T13:53:38,963495688+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2021-09-17T13:53:38,964705385+00:00 - rsyslog/run 
2021-09-17T13:53:38,967305078+00:00 - nginx/run 
2021-09-17T13:53:38,970254370+00:00 - iot-server/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2021-09-17T13:53:39,065720959+00:00 - iot-server/finish 1 0
2021-09-17T13:53:39,067174461+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (12)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 41
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing logger
2021-09-17 13:53:39,438 | root | INFO | Starting up app insights client
logging socket was found. logging is available.
logging socket was found. logging is available.
2021-09-17 13:53:39,438 | root | INFO | Starting up request id generator
2021-09-17 13:53:39,438 | root | INFO | Star

In [None]:
service.delete()
model.delete()