# The "Azure ML SDK" for Email Spam Inference 

## Introduction

In this notebook, we will show the use of Azure ML SDK to train, deploy and consume a model through Azure ML.


Steps:

1. Create a workspace. Create an Experiment in an existing Workspace.
2. Create a Compute cluster.
3. Load the dataset.
4. Configure AutoML using AutoMLConfig.
5. Run the AutoML experiment.
6. Explore the results and get the best model.
7. Register the best model.
8. Deploy the best model.
9. Consume the endpoint.

## Azure Machine Learning SDK-specific imports

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import AmlCompute
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import AciWebservice

## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

nahmed30-azureml-workspace
epe-poc-nazeer
centralus
16bc73b5-82be-47f2-b5ab-f2373344794c


## Create an Azure ML experiment

Let's create an experiment named 'aml-experiment' in the workspace we just initialized.

In [3]:
experiment_name = 'emailspam-aml-experiment-v1'
experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
emailspam-aml-experiment-v1,nahmed30-azureml-workspace,Link to Azure Machine Learning studio,Link to Documentation


## Create a Compute Cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-azure-machine-learning-architecture#compute-target) for your AutoML run.

In [4]:
aml_name = "cpu-cluster"
try:
    aml_compute = AmlCompute(ws, aml_name)
    print('Found existing AML compute context.')
except:
    print('Creating new AML compute context.')
    aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3)
    aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
    aml_compute.wait_for_completion(show_output = True)

cts = ws.compute_targets
compute_target = cts[aml_name]

Found existing AML compute context.


## Data
Make sure you have uploaded the dataset to Azure ML and that the key is the same name as the dataset.

In [5]:
key = 'UdacityPrjEmailSpamDataSet'
dataset = ws.datasets[key]
df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,v1,v2,Column3,Column4,Column5
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""","MK17 92H. 450Ppw 16""","GNT:-)"""
freq,4825,30,3,2,2


## AutoML Configuration

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric

In [6]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 3,
    "primary_metric" : 'accuracy'
}

automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="v1",
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "emailspam_automl_errors.log",
                             **automl_settings
                            )

## AutoML Run

In [7]:
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
emailspam-aml-experiment-v1,AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


In [8]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
emailspam-aml-experiment-v1,AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number of training samples are fewer than 1000, and 3 folds in all other cases.
              Learn mo

{'runId': 'AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-09-02T14:06:07.787795Z',
 'endTimeUtc': '2022-09-02T14:16:06.847105Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"3527a22f-75c2-4ae0-81f9-28549e60c632\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-dataprep

## Save the best model

In [9]:
best_run, fitted_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.44.0, current version:1.41.0
Package:azureml-core, training version:1.44.0, current version:1.41.0
Package:azureml-dataprep, training version:4.2.2, current version:3.1.1
Package:azureml-dataprep-rslex, training version:2.8.1, current version:2.5.2
Package:azureml-dataset-runtime, training version:1.44.0, current version:1.41.0
Package:azureml-defaults, training version:1.44.0, current version:1.41.0
Package:azureml-inference-server-http, training version:0.7.4, current version:0.4.13
Package:azureml-interpret, training version:1.44.0, current version:1.41.0
Package:azureml-mlflow, training version:1.44.0, current version:1.41.0
Package:azureml-pipeline-core, training version:1.44.0, current version:1.41.0
Package:azureml-responsibleai, training version:1.44.0, current version:1.41.0
Package:azureml-telemetry, training version:1.44.0, current version:1.41.0
Package:azureml-train-automl-client, training version:1.44.0, current version:1

In [10]:
best_run.get_properties()

{'runTemplate': 'automl_child',
 'pipeline_id': '__AutoML_Stack_Ensemble__',
 'pipeline_spec': '{"pipeline_id":"__AutoML_Stack_Ensemble__","objects":[{"module":"azureml.train.automl.stack_ensemble","class_name":"StackEnsemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'emailspam-aml-experiment-v1\',\'compute_target\':\'cpu-cluster\',\'subscription_id\':\'16bc73b5-82be-47f2-b5ab-f2373344794c\',\'region\':\'centralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_35","experiment_name":"emailspam-aml-experiment-v1","workspace_name":"nahmed30-azureml-workspace","subscription_id":"16bc73b5-82be-47f2-b5ab-f2373344794c","resource_group_name":"epe-poc-nazeer"}}]}',
 'training_percent': '100',
 'predicted_cost': None,
 'iteration': '35',
 '_aml_system_scenario_identification': 'R

In [11]:
for child_run in remote_run.get_children():
    print(child_run,"\n")

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_35,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_34,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_33,
Type: azureml.scriptrun,
Status: Canceled) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_32,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_31,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_30,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: emailspam-aml-experiment-v1,
Id: AutoML_5caf5974-b58b-4f6b-9677-df37a90a5051_29,
Type: azureml.scriptrun,
Status: Completed) 



In [12]:
import os

os.makedirs('./outputs',exist_ok=True)

In [13]:
model_name = best_run.properties['model_name']
script_file = "./outputs/score.py"
# best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file)
description = "aml email spam project sdk"


In [14]:
model_name

'AutoML5caf5974b35'

In [15]:
#TODO: Save the best model
import joblib
joblib.dump(fitted_model,filename= "outputs/automl.joblib")

['outputs/automl.joblib']

In [16]:

from azureml.core.model import Model
reg_model = remote_run.register_model(model_name = model_name, description=description)


In [18]:
from azureml.automl.core.shared import constants
env = best_run.get_environment()
#script_file = "./outputs/score.py"

best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file)
best_run.download_file(constants.CONDA_ENV_FILE_PATH, 'env.yml')

## Deploy the Best Model

Run the following code to deploy the best model. You can see the state of the deployment in the Azure ML portal. This step can take a few minutes.

In [19]:
script_file

'./outputs/score.py'

In [21]:
inference_config = InferenceConfig(entry_script=script_file, environment=best_run.get_environment())

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1,
                                               memory_gb = 1,
                                               tags = {'type': "automl-email-spam-prediction"},
                                               description = 'Sample service for AutoML Email Spam Prediction')

aci_service_name = 'automl-es-sdk-v1'
aci_service = Model.deploy(ws, aci_service_name, [reg_model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-09-02 14:20:54+00:00 Creating Container Registry if not exists.
2022-09-02 14:20:54+00:00 Registering the environment.
2022-09-02 14:20:54+00:00 Use the existing image.
2022-09-02 14:20:55+00:00 Submitting deployment to compute.
2022-09-02 14:20:58+00:00 Checking the status of deployment automl-es-sdk-v1..
2022-09-02 14:23:47+00:00 Checking the status of inference endpoint automl-es-sdk-v1.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


## Consume the Endpoint
You can add inputs to the following input sample. 

In [22]:
scoring_uri = aci_service.scoring_uri
print(scoring_uri)

http://ef32fcb5-ec8b-46e6-9666-1192bcfe1090.centralus.azurecontainer.io/score


In [23]:
import requests
import json

 
data = {
  "data": [
    {
      "v2": "Click link below to collect $10000",
      "Column4": "example_value",
      "Column5": "example_value",
      "Column6": "example_value"
    }
  ],
  "method": "predict"
}
    
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
#headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print("prediction is :" , resp.json())

prediction is : {"result": ["ham"]}
