# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.data.dataset_factory import TabularDatasetFactory
#from train import clean_data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import os
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails

## Dataset

### Overview
Abstract: Clinical features were observed or measured for 64 patients with breast cancer and 52 healthy controls.

There are 10 predictors, all quantitative, and a binary dependent variable, indicating the presence or absence of breast cancer.
The predictors are anthropometric data and parameters which can be gathered in routine blood analysis.
Prediction models based on these predictors, if accurate, can potentially be used as a biomarker of breast cancer.

data set can be accesed by using url:

https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv


In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'Breastcancer_automl'

experiment=Experiment(ws, experiment_name)
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code A43QTLBWG to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-142915
Azure region: southcentralus
Subscription id: aa7cf8e8-d23f-4bce-a7b9-1f0b4e0ac8ee
Resource group: aml-quickstarts-142915


In [5]:
ds = TabularDatasetFactory.from_delimited_files(path="https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")

## AutoML Configuration

We start by setting up our compute cluster, where we will run our automl run
and set up our automl options.

In [6]:
cpu_cluster_name = "cpucluster-aml"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [7]:
#training_data, validation_data = d.random_split(percentage=0.8, seed=1)

In [8]:
automl_settings = {
    "experiment_timeout_minutes" :30,
    "max_concurrent_iterations": 4,
    "n_cross_validations": 5,
    "primary_metric": 'average_precision_score_weighted',
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    n_cross_validations=5,
    task="classification",
    primary_metric="average_precision_score_weighted",
    compute_target=cpu_cluster,
    training_data=ds,
    label_column_name="Classification",
    max_cores_per_iteration=-1,
    enable_onnx_compatible_models=True
    )

In [9]:
# TODO: Submit your experiment
remote_run = experiment.submit(config = automl_config, show_output = True)

Submitting remote run.
No run_configuration provided, running on cpucluster-aml with default configuration
Running on remote compute: cpucluster-aml


Experiment,Id,Type,Status,Details Page,Docs Page
Breastcancer_automl,AutoML_27fc52fe-52d5-4965-bac0-4c915d914966,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [48]:
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)

model_ml = best_run.register_model(model_name='Breast_Cancer_Classitification_auto_ml', model_path='./')

Run(Experiment: Breastcancer_automl,
Id: AutoML_27fc52fe-52d5-4965-bac0-4c915d914966_28,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                               objective='reg:logistic',
                                                                                               random_state=0,
                           

In [None]:
#TODO: Save the best model

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [12]:
from azureml.core.model import Model
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

TODO: In the cell below, send a request to the web service you deployed to test it.

In [13]:
os.makedirs('./amlmodel', exist_ok=True)

best_run.download_file('/outputs/model.pkl',os.path.join('./amlmodel','automl_best_model.pkl'))

for f in best_run.get_file_names():
    if f.startswith('outputs'):
        output_file_path = os.path.join('./amlmodel', f.split('/')[-1])
        print(f'Downloading from {f} to {output_file_path} ...')
        best_run.download_file(name=f, output_file_path=output_file_path)

Downloading from outputs/conda_env_v_1_0_0.yml to ./amlmodel/conda_env_v_1_0_0.yml ...
Downloading from outputs/env_dependencies.json to ./amlmodel/env_dependencies.json ...
Downloading from outputs/internal_cross_validated_models.pkl to ./amlmodel/internal_cross_validated_models.pkl ...
Downloading from outputs/model.onnx to ./amlmodel/model.onnx ...
Downloading from outputs/model.pkl to ./amlmodel/model.pkl ...
Downloading from outputs/model_onnx.json to ./amlmodel/model_onnx.json ...
Downloading from outputs/pipeline_graph.json to ./amlmodel/pipeline_graph.json ...
Downloading from outputs/scoring_file_v_1_0_0.py to ./amlmodel/scoring_file_v_1_0_0.py ...


In [21]:
model=best_run.register_model(
            model_name = 'automl-bestmodel-breast-cancer', 
            model_path = './outputs/model.pkl',
            model_framework=Model.Framework.SCIKITLEARN,
            description='Breast Cancer Prediction'
)

In [22]:
# Download the conda environment file and define the environement
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'conda_env.yml')
myenv = Environment.from_conda_specification(name = 'myenv',
                                             file_path = 'conda_env.yml')

In [23]:
# download the scoring file produced by AutoML
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score_auto.py')

# set inference config
inference_config = InferenceConfig(entry_script= 'score_auto.py',
                                    environment=myenv)

TODO: In the cell below, print the logs of the web service and delete the service

In [24]:
# set Aci Webservice config
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, auth_enabled=True)

In [25]:
service = Model.deploy(workspace=ws, 
                       name='automl-bestmodel-breast-cancer', 
                       models=[model], 
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)

In [26]:
service

AciWebservice(workspace=Workspace.create(name='quick-starts-ws-142915', subscription_id='aa7cf8e8-d23f-4bce-a7b9-1f0b4e0ac8ee', resource_group='aml-quickstarts-142915'), name=automl-bestmodel-breast-cancer, image_id=None, compute_type=None, state=ACI, scoring_uri=Transitioning, tags=None, properties=None, created_by={'azureml.git.repository_uri': 'https://github.com/Yagna27/capstone-project.git', 'mlflow.source.git.repoURL': 'https://github.com/Yagna27/capstone-project.git', 'azureml.git.branch': 'main', 'mlflow.source.git.branch': 'main', 'azureml.git.commit': 'c84f2e1462a81161113e4d21ef9035cef7824b89', 'mlflow.source.git.commit': 'c84f2e1462a81161113e4d21ef9035cef7824b89', 'azureml.git.dirty': 'True'})

In [27]:
# wait for deployment to finish and display the scoring uri and swagger uri
service.wait_for_deployment(show_output=True)

print('Service state:')
print(service.state)

print('Scoring URI:')
print(service.scoring_uri)

print('Swagger URI:')
print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-17 09:52:24+00:00 Creating Container Registry if not exists.
2021-04-17 09:52:25+00:00 Registering the environment.
2021-04-17 09:52:26+00:00 Use the existing image.
2021-04-17 09:52:27+00:00 Generating deployment configuration.
2021-04-17 09:52:28+00:00 Submitting deployment to compute.
2021-04-17 09:52:32+00:00 Checking the status of deployment automl-bestmodel-breast-cancer..
2021-04-17 09:57:00+00:00 Checking the status of inference endpoint automl-bestmodel-breast-cancer.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Service state:
Healthy
Scoring URI:
http://42e7f579-4b3d-4250-aee7-702d67e48ee5.southcentralus.azurecontainer.io/score
Swagger URI:
http://42e7f579-4b3d-4250-aee7-702d67e48ee5.southcentralus.azurecontainer.io/swagger.json


In [32]:
df=ds.to_pandas_dataframe()

In [45]:
import json

# select 3  samples from the dataframe
x_ds=df.sample(3)
y_ds = x_ds.pop('Classification')



# convert the records to a json data file
recored=x_ds.to_dict(orient='records')

scoring_json = json.dumps({'data': recored})
print(scoring_json)

{"data": [{"Age": 46, "BMI": 33.18, "Glucose": 92, "Insulin": 5.75, "HOMA": 1.304866667, "Leptin": 18.69, "Adiponectin": 9.16, "Resistin": 8.89, "MCP.1": 209.19}, {"Age": 58, "BMI": 29.15451895, "Glucose": 139, "Insulin": 16.582, "HOMA": 5.685415067, "Leptin": 22.8884, "Adiponectin": 10.26266, "Resistin": 13.97399, "MCP.1": 923.886}, {"Age": 44, "BMI": 27.88761707, "Glucose": 99, "Insulin": 9.208, "HOMA": 2.2485936, "Leptin": 12.6757, "Adiponectin": 5.47817, "Resistin": 23.03306, "MCP.1": 407.206}]}


In [46]:
!python3 endpoint.py

In [47]:
output = service.run(scoring_json)
output

'{"result": [1, 2, 2]}'

In [43]:
y_ds

17    1
86    2
97    2
Name: Classification, dtype: int64

In [50]:
!python3 logs.py

2021-04-17T10:30:17,781705500+00:00 - gunicorn/run 
2021-04-17T10:30:17,780511500+00:00 - iot-server/run 
2021-04-17T10:30:17,781705500+00:00 - rsyslog/run 
2021-04-17T10:30:17,870728200+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_5a33d0b1846b3717ef969a5f35fc31b7/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_5a33d0b1846b3717ef969a5f35fc31b7/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_5a33d0b1846b3717ef969a5f35fc31b7/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_5a33d0b1846b3717ef969a5f35fc31b7/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_5a33d0b1846b3717ef969a5f35fc31b7/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)

In [49]:
service.delete

<bound method Webservice.delete of AciWebservice(workspace=Workspace.create(name='quick-starts-ws-142915', subscription_id='aa7cf8e8-d23f-4bce-a7b9-1f0b4e0ac8ee', resource_group='aml-quickstarts-142915'), name=automl-bestmodel-breast-cancer, image_id=None, compute_type=None, state=ACI, scoring_uri=Healthy, tags=http://42e7f579-4b3d-4250-aee7-702d67e48ee5.southcentralus.azurecontainer.io/score, properties=None, created_by={'azureml.git.repository_uri': 'https://github.com/Yagna27/capstone-project.git', 'mlflow.source.git.repoURL': 'https://github.com/Yagna27/capstone-project.git', 'azureml.git.branch': 'main', 'mlflow.source.git.branch': 'main', 'azureml.git.commit': 'c84f2e1462a81161113e4d21ef9035cef7824b89', 'mlflow.source.git.commit': 'c84f2e1462a81161113e4d21ef9035cef7824b89', 'azureml.git.dirty': 'True', 'hasInferenceSchema': 'True', 'hasHttps': 'False'})>