# Automated ML

In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
pip install azureml-train-automl-runtime==1.22.0

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install inference_schema

Collecting inference_schema
  Downloading inference_schema-1.1.0-py3-none-any.whl (19 kB)
Collecting wrapt==1.11.1
  Downloading wrapt-1.11.1.tar.gz (27 kB)
Building wheels for collected packages: wrapt
  Building wheel for wrapt (setup.py) ... [?25l- \ | done
[?25h  Created wheel for wrapt: filename=wrapt-1.11.1-cp36-cp36m-linux_x86_64.whl size=66557 sha256=8feef0b8dd2570852fada881caab4724b987bdbe79f7b2adc65a68ce080d8222
  Stored in directory: /home/azureuser/.cache/pip/wheels/94/0f/ec/66085641573800014bb0c8b657f3366eff641c42df79abbfe9
Successfully built wrapt
[31mERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.2 which is incompatible.[0m
[31mERROR: tensorflow-gpu 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.2 which is incompatible.[0m
[31mERROR: autokeras 1.0.12 has requirement tensorflow>=2.3.0, but you'll have tensorflow 2.1.0 which is incompatible.[0m
Installing collected p

In [3]:
import azureml.core
print("The version used is :", azureml.core.VERSION)

The version used is : 1.22.0


## Initialize Workspace and Create Experiment

Initializing a workspace object from persisted configuration.  
Create an AzureML experiment with experiment name as "auto-ml". This will be recorded under the experiment section in Azure ML studio.

In [4]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'auto-ml'

experiment=Experiment(ws, experiment_name)


Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FAFLZLZHK to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


In [39]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Create compute cluster
cpu_cluster_name = "cpucluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Overview of Dataset
The dataset used in this notebook is __heart_failure_clinical_records_dataset.csv__ which is an external dataset available in kaggle.  
This dataset contains data of 299 patients and 12 features that are useful to predict mortality by heart failure.
* No.of patients data collected : 299
* Input variables or features : age, anaemia, creatinine_phosphokinase, diabetes, ejection_fraction, high_blood_pressure, platelets, serum_creatinine, serum_sodium, sex, smoking, time
* Output/target variable : DEATH_EVENT

In this project, we create a classification model for predicting mortality rate/DEATH_EVENT(target variabe) that is caused due to Heart Failure.

### Data Import
Create tabular dataset of heart_failure_clinical_records_dataset csv_file by importing of data using azureml's **TabularDatasetFactory** class.

In [40]:
from azureml.data.dataset_factory import TabularDatasetFactory

csv_file = 'https://raw.githubusercontent.com/SwapnaKategaru/Project3/main/heart_failure_clinical_records_dataset.csv'

data = TabularDatasetFactory.from_delimited_files(csv_file)

## Clean Data
A clean_data function is used from **train.py** script file to clean the data and the data is being normalized.  

## Split Data

Splitting of data into train and test subsets is done using **train_test_split** function with specified random state of split (random_state=42) and size of test set (test_size=0.33).

In [41]:
from train import clean_data
from sklearn.model_selection import train_test_split
import pandas as pd
x, y = clean_data(data)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)
train_data = pd.concat([x_train, y_train], axis = 1)

## AutoML Configuration
TODO: Explain why you chose the automl settings and cofiguration you used below.

In [8]:
from azureml.train.automl import AutoMLConfig
import logging

automl_settings = {
    "enable_early_stopping" : True,
    "max_concurrent_iterations": 4,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "verbosity": logging.INFO,
}

automl_config = AutoMLConfig(
    experiment_timeout_minutes=15,
    task='classification',
    training_data=train_data,
    label_column_name='DEATH_EVENT',
    n_cross_validations=2,
    blocked_models=['XGBoostClassifier']
)

In [9]:
remote_run = experiment.submit(automl_config, show_output=True)

No run_configuration provided, running on local with default configuration
Running on local machine
Parent Run ID: AutoML_d2989d9f-2b73-485a-92d2-33e34c18b2b7

Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

**************************************************

## Run Details
OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the RunDetails widget to show the different experiments.

In [10]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

In [11]:
best_run, best_model = remote_run.get_output()

In [12]:
best_run.get_file_names()[-3]

'outputs/model.pkl'

In [42]:
import pickle
best_run.download_file('outputs/model.pkl', './outputs/automlmodel.pkl')

In [43]:
best_run.get_file_names()[-6]

'outputs/conda_env_v_1_0_0.yml'

In [15]:
best_run.download_file('outputs/conda_env_v_1_0_0.yml', './outputs/conda_dependencies.yml')

In [16]:
import joblib
joblib.load(filename='outputs/automlmodel.pkl')

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                min_child_samples=18,
                                                                                                min_child_weight=2,
                                                                                                min_split_gain=0.8421052631578947,
                           

In [17]:
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
auto-ml,AutoML_d2989d9f-2b73-485a-92d2-33e34c18b2b7_35,,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [18]:
best_run.get_tags()

{'ensembled_iterations': '[2, 33, 6, 4, 31, 24, 27]',
 'ensembled_algorithms': "['RandomForest', 'RandomForest', 'GradientBoosting', 'RandomForest', 'LightGBM', 'LightGBM', 'LightGBM']",
 'ensemble_weights': '[0.3, 0.1, 0.1, 0.2, 0.1, 0.1, 0.1]',
 'best_individual_pipeline_score': '0.885',
 'best_individual_iteration': '2',
 'model_explanation': 'True'}

In [19]:
print('Accuracy :', best_run.get_metrics()['accuracy'])
print('AUC_weighted :', best_run.get_metrics()['AUC_weighted'])


Accuracy : 0.9
AUC_weighted : 0.9343513032216286


In [20]:
from pprint import pprint

def print_model(best_model, prefix=""):
    for step in best_model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()

print_model(best_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

prefittedsoftvotingclassifier
{'estimators': ['2', '33', '6', '4', '31', '24', '27'],
 'weights': [0.3, 0.1, 0.1, 0.2, 0.1, 0.1, 0.1]}

2 - minmaxscaler
{'copy': True, 'feature_range': (0, 1)}

2 - randomforestclassifier
{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 0.035789473684210524,
 'min_samples_split': 0.01,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 10,
 'n_jobs': 1,
 'oob_score': True,
 'random_state': None,
 'verbose': 0,
 'warm_start': F

In [21]:
import sklearn
from azureml.core.environment import Environment

environment = Environment.get(workspace=ws, name="AzureML-AutoML")
environment.python.conda_dependencies.add_pip_package("inference-schema[numpy-support]")
environment.python.conda_dependencies.add_pip_package("joblib")
environment.python.conda_dependencies.add_pip_package("scikit-learn=={}".format(sklearn.__version__))

In [30]:
%%writefile score.py

import json
import numpy as np
import pandas as pd
import os
import joblib
import pickle

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType

input_sample = pd.DataFrame({"age": pd.Series([0.0], dtype="float64"), "anaemia": pd.Series([0], dtype="int64"), "creatinine_phosphokinase": pd.Series([0.0], dtype="float64"), "diabetes": pd.Series([0], dtype="int64"), "ejection_fraction": pd.Series([0.0], dtype="float64"), "high_blood_pressure": pd.Series([0], dtype="int64"), "platelets": pd.Series([0.0], dtype="float64"), "serum_creatinine": pd.Series([0.0], dtype="float64"), "serum_sodium": pd.Series([0.0], dtype="float64"), "sex": pd.Series([0], dtype="int64"), "smoking": pd.Series([0], dtype="int64"), "time": pd.Series([0.0], dtype="float64")})
output_sample = np.array([0])


def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'automlmodel.pkl')
    print("Found model:", os.path.isfile(model_path)) 
    model = joblib.load(model_path)

@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))

def run(data):
    try:
        result = model.predict(data)
        return result.tolist()
    except Exception as e:
        result = str(e)
        return result  


Overwriting score.py


## Model Deployment


In [31]:
from azureml.core.model import Model
model = Model.register(model_name='automl-model', model_path='outputs/automlmodel.pkl', workspace=ws)

Registering model automl-model


In [32]:
Model(ws, 'automl-model')

Model(workspace=Workspace.create(name='quick-starts-ws-139939', subscription_id='9a7511b8-150f-4a58-8528-3e7d50216c31', resource_group='aml-quickstarts-139939'), name=automl-model, id=automl-model:2, version=2, tags={}, properties={})

In [44]:
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1,  enable_app_insights=False, auth_enabled=True)

from azureml.core.webservice import LocalWebservice, Webservice
inference_config = InferenceConfig(entry_script='score.py', environment=environment)
service = Model.deploy(ws, "automl-deploy", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

print(service.state)
print(service.scoring_uri)
print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running....................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://fc793b6b-35f1-406e-89e1-032ffafef2af.southcentralus.azurecontainer.io/score
http://fc793b6b-35f1-406e-89e1-032ffafef2af.southcentralus.azurecontainer.io/swagger.json


In [45]:
primaryKey, secondaryKey = service.get_keys()
print('Primary Key :', primaryKey)
scoringURI = service.scoring_uri
print('Scoring URI :', scoringURI)

Primary Key : HbBIN5R4F66Ay4enkbdioncmh3TNKhP0
Scoring URI : http://fc793b6b-35f1-406e-89e1-032ffafef2af.southcentralus.azurecontainer.io/score


In [46]:
import requests
import json

scoring_uri = scoringURI
key = primaryKey


# Sample data to send to the service
data = json.dumps(
    {"data":
            [
                {
                    'age': 75.0,
                    'anaemia': 0,
                    'creatinine_phosphokinase': 582,
                    'diabetes': 0,
                    'ejection_fraction': 20,
                    'high_blood_pressure': 1,
                    'platelets': 265000.0,
                    'serum_creatinine': 1.9,
                    'serum_sodium': 130,
                    'sex': 1,
                    'smoking': 0,
                    'time': 4
                },
                {
                    'age': 55.0,
                    'anaemia': 0,
                    'creatinine_phosphokinase': 7861,
                    'diabetes': 0,
                    'ejection_fraction': 38,
                    'high_blood_pressure': 0,
                    'platelets': 263358.03,
                    'serum_creatinine': 1.1,
                    'serum_sodium': 136,
                    'sex': 1,
                    'smoking': 0,
                    'time': 6
                }
            ] 
        }
)
                

headers = {'Content-Type': 'application/json'}

headers['Authorization'] = f'Bearer {key}'

resp = requests.post(scoring_uri, data, headers=headers)
print("Response result from input payload : ", resp.text)


Response result from input payload :  [0, 0]


In [38]:
service.delete()

In [47]:
cpu_cluster.delete()