# Automated ML

### Create workspace and experiment instances

In [2]:
from azureml.core import Workspace, Experiment
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'capstone-spam-classification-experiment'

experiment=Experiment(ws, experiment_name)

###  Dataset Overview
The dataset we are using is a spam classification dataset obtained from kaggle. We are going to perform multi-class text classification. 

### Get data

In [3]:
from azureml.core import Dataset
training_dataset = Dataset.get_by_name(ws, name='capstone-spam-dataset')

### View the dataset

In [4]:
training_dataset.to_pandas_dataframe()

Unnamed: 0,Category,Message,Column3
0,ham,"Go until jurong point, crazy.. Available only ...",
1,ham,Ok lar... Joking wif u oni...,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,
3,ham,U dun say so early hor... U c already then say...,
4,ham,"Nah I don't think he goes to usf, he lives aro...",
...,...,...,...
5569,spam,This is the 2nd time we have tried 2 contact u...,
5570,ham,Will ü b going to esplanade fr home?,
5571,ham,"Pity, * was in mood for that. So...any other s...",
5572,ham,The guy did some bitching but I acted like i'd...,


### Connect to compute target

In [5]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# TODO: Create compute cluster
cluster_name = "capstone-compute-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # To use a different region for the compute, add a location='<region>' parameter
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### AutoML Configuration

- task: "classification" because we are performing multi-class classification
- label_column_name: "Category" because we want to predict that 
- experiment_timeout_hours: 0.25 because we want the experiment to run for max 15 min

In [7]:
from azureml.train.automl import AutoMLConfig

automl_settings = {
    "n_cross_validations": 2,
    "primary_metric": 'accuracy',
    "enable_early_stopping": True,
    "max_concurrent_iterations": 5,
    "experiment_timeout_hours": 0.25,
    "featurization": 'auto',
}

automl_config = AutoMLConfig(
    task = 'classification',
    compute_target = compute_target,
    training_data = training_dataset,
    label_column_name = 'Category',
    **automl_settings
)

### Submit the experiment


In [8]:
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
capstone-spam-classification-experiment,AutoML_39e48925-bae5-4099-b09d-e950f0036c93,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


### Run Details

In [9]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [10]:
remote_run.wait_for_completion()

{'runId': 'AutoML_39e48925-bae5-4099-b09d-e950f0036c93',
 'target': 'capstone-compute-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-09-26T06:32:38.09341Z',
 'endTimeUtc': '2022-09-26T06:52:00.06843Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '2',
  'target': 'capstone-compute-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"bf22fd68-4f43-4a95-a0c7-63c91e293b65\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_version

### Get the best model and display its properties

In [11]:
best_run, fitted_model = remote_run.get_output()

# Get best_run metrics
best_run_metrics = best_run.get_metrics()
for name, value in best_run_metrics.items():
    print(f"{name}: {value}")

Package:azureml-automl-runtime, training version:1.45.0, current version:1.44.0
Package:azureml-core, training version:1.45.0, current version:1.44.0
Package:azureml-dataset-runtime, training version:1.45.0, current version:1.44.0
Package:azureml-defaults, training version:1.45.0, current version:1.44.0
Package:azureml-interpret, training version:1.45.0, current version:1.44.0
Package:azureml-mlflow, training version:1.45.0, current version:1.44.0
Package:azureml-pipeline-core, training version:1.45.0, current version:1.44.0
Package:azureml-responsibleai, training version:1.45.0, current version:1.44.0
Package:azureml-telemetry, training version:1.45.0, current version:1.44.0
Package:azureml-train-automl-client, training version:1.45.0, current version:1.44.0
Package:azureml-train-automl-runtime, training version:1.45.0, current version:1.44.0
Package:azureml-train-core, training version:1.45.0, current version:1.44.0
Package:azureml-train-restclients-hyperdrive, training version:1.45.

log_loss: 0.060339494012040806
matthews_correlation: 0.9474790732094922
precision_score_macro: 0.6597012677998131
AUC_weighted: 0.9938788405432769
f1_score_micro: 0.9879799067097237
precision_score_micro: 0.9879799067097237
AUC_micro: 0.9989869585613008
balanced_accuracy: 0.6398149182009907
average_precision_score_micro: 0.997237013286955
AUC_macro: 0.8296948996035933
norm_macro_recall: 0.45972237730148613
recall_score_weighted: 0.9879799067097237
recall_score_macro: 0.6398149182009907
accuracy: 0.9879799067097237
f1_score_weighted: 0.9876087940526146
average_precision_score_weighted: 0.9967764838390141
weighted_accuracy: 0.9969439478288387
precision_score_weighted: 0.9876886557153359
f1_score_macro: 0.6492553368057341
recall_score_micro: 0.9879799067097237
average_precision_score_macro: 0.6619634418564038
accuracy_table: aml://artifactId/ExperimentRun/dcid.AutoML_39e48925-bae5-4099-b09d-e950f0036c93_39/accuracy_table
confusion_matrix: aml://artifactId/ExperimentRun/dcid.AutoML_39e4892

In [12]:
best_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'confusion_matrix',
 'logs/azureml/azureml_automl.log',
 'outputs/conda_env_v_1_0_0.yml',
 'outputs/engineered_feature_names.json',
 'outputs/env_dependencies.json',
 'outputs/featurization_summary.json',
 'outputs/generated_code/conda_environment.yaml',
 'outputs/generated_code/script.py',
 'outputs/generated_code/script_run_notebook.ipynb',
 'outputs/internal_cross_validated_models.pkl',
 'outputs/model.pkl',
 'outputs/pipeline_graph.json',
 'outputs/run_id.txt',
 'outputs/scoring_file_pbi_v_1_0_0.py',
 'outputs/scoring_file_v_1_0_0.py',
 'outputs/scoring_file_v_2_0_0.py',
 'system_logs/cs_capability/cs-capability.log',
 'system_logs/hosttools_capability/hosttools-capability.log',
 'system_logs/lifecycler/execution-wrapper.log',
 'system_logs/lifecycler/lifecycler.log',
 'system_logs/metrics_capability/metrics-capability.log',
 'system_logs/snapshot_capability/snapshot-capability.log',
 'user_logs/std_log.txt']

### Save the model

In [13]:
import joblib
joblib.dump(fitted_model, 'best-automl-model.pkl')

['best-automl-model.pkl']

### Register the best model

In [14]:
from azureml.core import Model
model = Model.register(
    workspace=ws, 
    model_name='best-automl-model', 
    model_path='./best-automl-model.pkl'
)

Registering model best-automl-model


# Model Deployment

### Create an inference config

In [15]:
from azureml.core import Environment
from azureml.core.model import InferenceConfig

# Get the environment
from azureml.automl.core.shared import constants

best_run.download_file(constants.CONDA_ENV_FILE_PATH, 'conda_dependencies.yml')
env = Environment.from_conda_specification(name='deployment-env', file_path='conda_dependencies.yml')

inference_config = InferenceConfig(
    environment=env,
    source_directory=".",
    entry_script="./automl_score.py",
)

### Define the deployment config - we deploy on Azure Container Instance (ACI)

In [16]:
from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=1, memory_gb=2, auth_enabled=True, enable_app_insights=True
)

### Deploy the model as a web service

In [17]:
from azureml.core.model import Model
service = Model.deploy(
    ws,
    "automl-service",
    [model],
    inference_config,
    deployment_config,
    overwrite=True,
)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-09-26 06:52:34+00:00 Creating Container Registry if not exists..
2022-09-26 07:02:34+00:00 Registering the environment.
2022-09-26 07:02:35+00:00 Building image..
2022-09-26 07:22:51+00:00 Generating deployment configuration.
2022-09-26 07:22:51+00:00 Submitting deployment to compute..
2022-09-26 07:22:55+00:00 Checking the status of deployment automl-service.

### Send a request to the web service you deployed to test it

In [21]:
import requests
import json
from azureml.core import Webservice

service = Webservice(workspace=ws, name="automl-service")
scoring_uri = service.scoring_uri

# If the service is authenticated, set the key or token
key, _ = service.get_keys()

# Set the appropriate headers
headers = {"Content-Type": "application/json"}
headers["Authorization"] = f"Bearer {key}"

# Make the request and display the response and logs
data =  {
  "Inputs": {
    "data": [
      {
        "Message": "visit Taj mahal for 100 Rs",
        "Column3": ""
     }
    ]
  }
}

input_data = json.dumps(data)
resp = requests.post(scoring_uri, data=input_data, headers=headers)
print(resp.json())

{"result": ["ham"]}


### Print the logs of the web service

In [22]:
print(service.get_logs())

/bin/bash: /azureml-envs/azureml_944df6c9e2b12a3bdcde13b5b8baccf0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_944df6c9e2b12a3bdcde13b5b8baccf0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_944df6c9e2b12a3bdcde13b5b8baccf0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_944df6c9e2b12a3bdcde13b5b8baccf0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2022-09-26T07:26:28,453535500+00:00 - iot-server/run 
2022-09-26T07:26:28,460346000+00:00 - rsyslog/run 
bash: /azureml-envs/azureml_944df6c9e2b12a3bdcde13b5b8baccf0/lib/libtinfo.so.6: no version information available (required by bash)
2022-09-26T07:26:28,488064600+00:00 - gunicorn/run 
2022-09-26T07:26:28,491386500+00:00 - nginx/run 
2022-09-26T07:26:28,506808600+00:00 | gunicorn/run | 
2022-09-26T07:26:28,521812500+00:00 | gu

### Delete the web service

In [23]:
service.delete()

### Shutdown the computes

In [26]:
try:
    instance = ComputeTarget(workspace=ws, name=cluster_name)

    instance.delete()
    instance.wait_for_completion(show_output=True)
    print('Deleted compute resource')

except ComputeTargetException as e:
    print('Already deleted!')

InProgress.....Current provisioning state of AmlCompute is "Deleting"

..
SucceededProvisioning operation finished, operation "Succeeded"
ComputeTargetException:
	Message: ComputeTargetNotFound: Compute Target with name capstone-compute-cluster not found in provided workspace
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "ComputeTargetNotFound: Compute Target with name capstone-compute-cluster not found in provided workspace"
    }
}
Already deleted!


**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
