# The "Azure ML SDK" for SMS Spam Inference 

## Introduction

In this notebook, we will show the use of Azure ML SDK to train, deploy and consume a model through Azure ML.


Steps:

1. Create a workspace. Create an Experiment in an existing Workspace.
2. Create a Compute cluster.
3. Load the dataset.
4. Configure AutoML using AutoMLConfig.
5. Run the AutoML experiment.
6. Explore the results and get the best model.
7. Register the best model.
8. Deploy the best model.
9. Consume the endpoint.
10. Delete service.

## Azure Machine Learning SDK-specific imports

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import AmlCompute
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import AciWebservice

## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [2]:
from azureml.core.authentication import ServicePrincipalAuthentication

svc_pr_password = os.environ.get("AZUREML_PASSWORD")

svc_pr = ServicePrincipalAuthentication(
        tenant_id="db05faca-c82a-4b9d-b9c5-0f64b6755421",
        service_principal_id="ed84bd8b-0d92-4b6d-be98-b0adcbd37cc0",
        service_principal_password="9HU8Q~-GIbwm~NHuZtlC2C5ouzbNklX0~1mlebYh")

        
ws = Workspace(
            subscription_id="16bc73b5-82be-47f2-b5ab-f2373344794c",
            resource_group="epe-poc-nazeer",
            workspace_name="nahmed30-azureml-workspace",
            auth=svc_pr)

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

In [3]:
ws

Workspace.create(name='nahmed30-azureml-workspace', subscription_id='16bc73b5-82be-47f2-b5ab-f2373344794c', resource_group='epe-poc-nazeer')

## Create an Azure ML experiment

Let's create an experiment named 'aml-experiment' in the workspace we just initialized.

In [4]:
experiment_name = 'SMSspam-aml-experiment-v2'
experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
SMSspam-aml-experiment-v2,nahmed30-azureml-workspace,Link to Azure Machine Learning studio,Link to Documentation


## Create a Compute Cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-azure-machine-learning-architecture#compute-target) for your AutoML run.

In [5]:
aml_name = "cpu-cluster"
try:
    aml_compute = AmlCompute(ws, aml_name)
    print('Found existing AML compute context.')
except:
    print('Creating new AML compute context.')
    aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3)
    aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
    aml_compute.wait_for_completion(show_output = True)

cts = ws.compute_targets
compute_target = cts[aml_name]

Found existing AML compute context.


## Data
Make sure you have uploaded the dataset to Azure ML and that the key is the same name as the dataset.

In [6]:
key = 'UdacityPrjEmailSpamDataSet'
dataset = ws.datasets[key]
df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,v1,v2,Column3,Column4,Column5
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""",GE,"GNT:-)"""
freq,4825,30,3,2,2


In [7]:
df.head()

Unnamed: 0,v1,v2,Column3,Column4,Column5
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   v1       5572 non-null   object
 1   v2       5572 non-null   object
 2   Column3  50 non-null     object
 3   Column4  12 non-null     object
 4   Column5  6 non-null      object
dtypes: object(5)
memory usage: 217.8+ KB


In [66]:
df.shape

(5572, 5)

In [9]:
df.describe()

Unnamed: 0,v1,v2,Column3,Column4,Column5
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""",GE,"GNT:-)"""
freq,4825,30,3,2,2


In [68]:
df['v1'].value_counts()

ham     4825
spam     747
Name: v1, dtype: int64

In [69]:
 df[df['Column5'].str.len() > -1]

Unnamed: 0,v1,v2,Column3,Column4,Column5
281,ham,\Wen u miss someone,the person is definitely special for u..... B...,why to miss them,"just Keep-in-touch\"" gdeve.."""
1038,ham,"Edison has rightly said, \A fool can ask more ...",GN,GE,"GNT:-)"""
2255,ham,I just lov this line: \Hurt me with the truth,I don't mind,i wil tolerat.bcs ur my someone..... But,"Never comfort me with a lie\"" gud ni8 and swe..."
3525,ham,\HEY BABE! FAR 2 SPUN-OUT 2 SPK AT DA MO... DE...,HAD A COOL NYTHO,TX 4 FONIN HON,"CALL 2MWEN IM BK FRMCLOUD 9! J X\"""""
4668,ham,"When I was born, GOD said, \Oh No! Another IDI...",GOD said,"\""OH No! COMPETITION\"". Who knew","one day these two will become FREINDS FOREVER!"""
5048,ham,"Edison has rightly said, \A fool can ask more ...",GN,GE,"GNT:-)"""


## AutoML Configuration

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric

In [12]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 3,
    "primary_metric" : 'accuracy'
}

automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="v1",
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "SMSspam_automl_errors.log",
                             enable_code_generation=True,
                             **automl_settings
                            )

## AutoML Run

In [13]:
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
SMSspam-aml-experiment-v2,AutoML_345d81a4-23f3-44f9-a330-05bff90b8129,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


In [14]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
SMSspam-aml-experiment-v2,AutoML_345d81a4-23f3-44f9-a330-05bff90b8129,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number of training samples are f

{'runId': 'AutoML_345d81a4-23f3-44f9-a330-05bff90b8129',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-09-26T07:04:38.487685Z',
 'endTimeUtc': '2022-09-26T07:30:12.856393Z',
 'services': {},
   'message': 'Experiment timeout reached, hence experiment stopped. Current experiment timeout: 0 hour(s) 20 minute(s)'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"3527a22f-75c2-4ae0-81f9-28549e60c632\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-dataprep-native": "38.0.0", "azureml-dataprep": "3.1.1", "azureml-dataprep-rslex": "2.5.2", "azureml-mlflow"

## Save the best model

In [15]:
best_run, fitted_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.45.0, current version:1.41.0
Package:azureml-core, training version:1.45.0, current version:1.41.0
Package:azureml-dataprep, training version:4.2.2, current version:3.1.1
Package:azureml-dataprep-rslex, training version:2.8.1, current version:2.5.2
Package:azureml-dataset-runtime, training version:1.45.0, current version:1.41.0
Package:azureml-defaults, training version:1.45.0, current version:1.41.0
Package:azureml-inference-server-http, training version:0.7.5, current version:0.4.13
Package:azureml-interpret, training version:1.45.0, current version:1.41.0
Package:azureml-mlflow, training version:1.45.0, current version:1.41.0
Package:azureml-pipeline-core, training version:1.45.0, current version:1.41.0
Package:azureml-responsibleai, training version:1.45.0, current version:1.41.0
Package:azureml-telemetry, training version:1.45.0, current version:1.41.0
Package:azureml-train-automl-client, training version:1.45.0, current version:1

In [16]:
best_run.get_properties()

{'runTemplate': 'automl_child',
 'pipeline_id': '__AutoML_Stack_Ensemble__',
 'pipeline_spec': '{"pipeline_id":"__AutoML_Stack_Ensemble__","objects":[{"module":"azureml.train.automl.stack_ensemble","class_name":"StackEnsemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'SMSspam-aml-experiment-v2\',\'compute_target\':\'cpu-cluster\',\'subscription_id\':\'16bc73b5-82be-47f2-b5ab-f2373344794c\',\'region\':\'centralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_43","experiment_name":"SMSspam-aml-experiment-v2","workspace_name":"nahmed30-azureml-workspace","subscription_id":"16bc73b5-82be-47f2-b5ab-f2373344794c","resource_group_name":"epe-poc-nazeer"}}]}',
 'training_percent': '100',
 'predicted_cost': None,
 'iteration': '43',
 '_aml_system_scenario_identification': 'Remot

In [17]:
best_run.get_properties

<bound method Run.get_properties of Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_43,
Type: azureml.scriptrun,
Status: Completed)>

In [18]:
fitted_model.get_params

<bound method PipelineWithYTransformations.get_params of PipelineWithYTransformations(Pipeline={'memory': None,
                                       'steps': [('datatransformer',
                                                  DataTransformer(enable_dnn=False, enable_feature_sweeping=True, is_cross_validation=True, working_dir='/mnt/batch/tasks/shared/LS_root/mounts/clusters/nahmed30-computeinstance/code/Users/nahmed30/1WIP/finalproject')),
                                                 ('stackensembleclassifier',
                                                  StackEnsembleClassifier(base....6327367346938775, class_weight='balanced', eta0=0.01, fit_intercept=False, l1_ratio=0.8571428571428571, learning_rate='constant', loss='hinge', max_iter=1000, penalty='none', power_t=0.2222222222222222, tol=0.001))]))], meta_learner=LogisticRegressionCV(scoring=Scorer(metric='accuracy'))))],
                                       'verbose': False},
                             y_transforme

In [19]:
print(fitted_model)

Pipeline(steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, is_cross_validation=True, working_dir='/mnt/batch/tasks/shared/LS_root/mounts/clusters/nahmed30-computeinstance/code/Users/nahmed30/1WIP/finalproject')),
                ('stackensembleclassifier',
                 StackEnsembleClassifier(base_learners=[('38', Pipeline(steps=[('maxabsscale...with_std=True)), ('sgdclassifierwrapper', SGDClassifierWrapper(alpha=1.6327367346938775, class_weight='balanced', eta0=0.01, fit_intercept=False, l1_ratio=0.8571428571428571, learning_rate='constant', loss='hinge', max_iter=1000, penalty='none', power_t=0.2222222222222222, tol=0.001))]))], meta_learner=LogisticRegressionCV(scoring=Scorer(metric='accuracy'))))])
Y_transformer(['LabelEncoder', LabelEncoder()])


In [20]:
for child_run in remote_run.get_children():
    print(child_run,"\n")

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_42,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_43,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_41,
Type: azureml.scriptrun,
Status: Canceled) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_40,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_39,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_38,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment: SMSspam-aml-experiment-v2,
Id: AutoML_345d81a4-23f3-44f9-a330-05bff90b8129_37,
Type: azureml.scriptrun,
Status: Completed) 

Run(Experiment

In [21]:
import os

os.makedirs('./outputs',exist_ok=True)

In [22]:
model_name = best_run.properties['model_name']
script_file = "./outputs/score_v2.py"
description = "aml SMS spam project sdk"


In [23]:
model_name

'AutoML345d81a4243'

In [24]:

from azureml.core.model import Model
reg_model = remote_run.register_model(model_name = model_name, description=description)


In [25]:
from azureml.automl.core.shared import constants
env = best_run.get_environment()

best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file)
best_run.download_file(constants.CONDA_ENV_FILE_PATH, 'env.yml')

## Deploy the Best Model

Run the following code to deploy the best model. You can see the state of the deployment in the Azure ML portal. This step can take a few minutes.

In [26]:
script_file

'./outputs/score_v2.py'

In [28]:
inference_config = InferenceConfig(entry_script=script_file, environment=best_run.get_environment())

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1,
                                               memory_gb = 1,
                                               tags = {'type': "automl-SMS-spam-prediction"},
                                               description = 'Sample service for AutoML SMS Spam Prediction')

aci_service_name = 'automl-smss-sdk-v5'
aci_service = Model.deploy(ws, aci_service_name, [reg_model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-09-26 07:33:55+00:00 Creating Container Registry if not exists.
2022-09-26 07:33:55+00:00 Registering the environment.
2022-09-26 07:33:56+00:00 Use the existing image.
2022-09-26 07:33:57+00:00 Submitting deployment to compute.
2022-09-26 07:34:00+00:00 Checking the status of deployment automl-smss-sdk-v5..
2022-09-26 07:36:18+00:00 Checking the status of inference endpoint automl-smss-sdk-v5.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


## Consume the Endpoint
You can add inputs to the following input sample. 

In [29]:
scoring_uri = aci_service.scoring_uri
print(scoring_uri)

http://79087020-93a4-48b9-a3d9-7150d5c354f2.centralus.azurecontainer.io/score


In [32]:
import requests
import json

 
data = {
  "data": [
    {
      "v2": "You won $100. Click link below to collect",
      "Column4": "example_value",
      "Column5": "example_value",
      "Column6": "example_value"
    }
  ],
  "method": "predict"
}
    
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
#headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print("prediction is :" , resp.json())

prediction is : {"result": ["spam"]}


In [31]:
import requests
import json

 
data = {
  "data": [
    {
      "v2": "I'm waiting here see you soon",
      "Column4": "example_value",
      "Column5": "example_value",
      "Column6": "example_value"
    }
  ],
  "method": "predict"
}
    
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
#headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print("prediction is :" , resp.json())

prediction is : {"result": ["ham"]}


***Delete Service***

In [None]:
#aci_service.delete