# ~ Capstone Project ~

# Azure Machine Learning Engineer 

## Completed by Audrey Tan
 ___
 
- [1. Project Overview](#intro)
- [2. Environment Setup](#env-setup)
    - [2.1 Import dependencies](#env1)
    - [2.2 Workspace and experiment setup](#env2)
    - [2.3 Create a compute cluster](#env3)
- [3. AutoML Experiment Submission](#aml-exp)
    - [3.1 Dataset preparation](#aml-ds)
    - [3.2 AutoML config setup](#aml-setup)
    - [3.3 AutoML run](#aml-run)
    - [3.4 Monitor AutoML run](#aml-watch)
    - [3.5 Examine the best AutoML model details](#aml-model)
    - [3.6 Save and register the best AutoML model](#aml-reg)
- [4. Model Deployment](#deploy)
    - [4.1 Deployment setup](#dply1)
    - [4.2 Deploy the model as a web service](#dply2)
    - [4.3 Testing the web service](#dply3)
    - [4.4 Enable Application Insights](#dply4)
    - [4.5 Printing the logs of the web service](#dply5)
    - [4.6 Active web service endpoint demo](#dply6)
- [5. Cleanup](#clean)
- [6. Citations](#cita)
 ___

## Part I - AutoML Model Training
#### This notebook contains the AutoML setup, training and deployment steps using SDK. See  `hyperparameter_tuning` notebook for Part II - Custom Model Training with HyperDrive  
 ___

<a id='intro'></a>
## 1. Project Overview

> In this project, we will use a loan Application Prediction dataset from Kaggle to build a loan application prediction classifier. The classification goal is to predict if a loan application will be approved or denied given the applicant's credit history and other social economic demographic data.
>
> We will build two models of the classifier, one using AutoML and one custom model. AutoML is equipped to train and produce the best model on its own, the custom model will leverage HyperDrive to tune training hyperparameters to deliver the best model. Between the AutoML and Hyperdrive experiment runs, a best performing model is selected for deployment. Scoring requests can then be sent to the deployment endpoint to test the deployed model. The diagram below provides an overview of the workflow. 

![png](assets/MLworkflow.png)


<a id='env-setup'></a>

## 2. Environment Setup

This entails the follow tasks
> * Import all dependcies required to complete the AutoML project
>
> * Initialize workspace and create a new Experiment
>
> * Create a compute target for training 
>

<a id='env1'></a>
### 2.1 Import dependencies
#### import all the packages needed for the project

In [1]:
import logging
import os
import csv
import json
import requests

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import LocalTarget, ComputeTargetException

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from train import clean_data

from azureml.train.automl import utilities
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails

from azureml.core import Model
from azureml.core import Webservice

from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.19.0


<a id='env2'></a>
### 2.2 Workspace and experiment setup
#### Display the workspace details and set up an experiment

## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure `config.json` is present as ./config.json

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

quick-starts-ws-130725
aml-quickstarts-130725
southcentralus
c463503f-66c4-48b5-9bb5-b66fec87c814


In [3]:
exp = Experiment(workspace=ws, name="capstone-automl-exp")

<a id='env3'></a>
### 2.3 Create a compute cluster

#### look for an availble compute cluster in the workspace or create a new one to use

In [4]:
clist = ComputeTarget.list(workspace=ws)

In [5]:
clist

[{
   "id": "/subscriptions/c463503f-66c4-48b5-9bb5-b66fec87c814/resourceGroups/aml-quickstarts-130725/providers/Microsoft.MachineLearningServices/workspaces/quick-starts-ws-130725/computes/notebook130725",
   "name": "notebook130725",
   "location": "southcentralus",
   "tags": null,
   "properties": {
     "description": null,
     "computeType": "ComputeInstance",
     "computeLocation": "southcentralus",
     "resourceId": null,
     "provisioningErrors": null,
     "provisioningState": "Succeeded",
     "properties": {
       "vmSize": "STANDARD_DS3_V2",
       "applications": [
         {
           "displayName": "Jupyter",
           "endpointUri": "https://notebook130725.southcentralus.instances.azureml.ms"
         },
         {
           "displayName": "Jupyter Lab",
           "endpointUri": "https://notebook130725.southcentralus.instances.azureml.ms/lab"
         },
         {
           "displayName": "RStudio",
           "endpointUri": "https://notebook130725-8787.sout

In [6]:
clist[0]

Name,Workspace,State,Location,VmSize,Application URI,Docs
notebook130725,quick-starts-ws-130725,Running,southcentralus,STANDARD_DS3_V2,Jupyter JupyterLab RStudio,Doc


In [7]:
len(clist)

1

#### Create a  a new compute cluster as only a notebook compute instance is available in the workspace.

In [8]:
cluster_name = 'std-ds3-v2' 

In [9]:
# Create compute cluster
# Use vm_size = "Standard_DS3_V2".
# max_nodes no greater than 4.

# Test cluster exists

try:
    compute_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print(f'compute cluster {cluster_name} already exists')
except ComputeTargetException:
    print(f'creating a new compute cluster {cluster_name} ...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=4)
    compute_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

compute_cluster.wait_for_completion(show_output=True)

creating a new compute cluster std-ds3-v2 ...
Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


<a id='aml-exp'></a>
## 3. AutoML Experiment Submission

This entails the follow tasks
> * Preplare training and validation datasets for the AutoML experiment run  
>
> * Setup the AutoML Config
>
> * Submit the AutoML experiment  
>
> * Monitor the AutoML run
>
> * Save the best AutoML model
>

<a id='aml-ds'></a>
### 3.1 Dataset preparation

#### Dataset overview
The **external** dataset is the `train_u6lujuX_CVtuZ9i.csv` of this [kaggle Loan Prediction Problem Dataset](https://www.kaggle.com/altruistdelhite04/loan-prediction-problem-dataset) which I downloaded and staged on this [Github Repo](https://raw.githubusercontent.com/atan4583/datasets/master/train.csv). 

The dataset has 613 records and 13 columns. The **classification goal is to predict if a loan will be approved**. The input variables are the columns carrying the credit history and other demographics of the applicants. The output variable `Loan Status` column indicates if a loan application is approved or denied, i.e. a True(1) or False(0).


The block of code cells below performs these tasks:
> 1. checks if the dataset is in the worksplace, if not, download it from the [Github Repo](https://raw.githubusercontent.com/atan4583/datasets/master/train.csv) 
> 2. Loads it to a pandas dataframe to do a quick exploration of the data 
> 3. Runs the dataset through the clean_data function in `train.py` to generate the `x` and `y` dataframes
> 4. Checks all columns in `x` and `y` are of numeric type with no missing value
> 4. Calls sklearn `train_test_split` utility to split `x` and `y` into training and test sets
> 5. Creates a training and a validation dateframes, check all columns are of numeric type with no missing value 
> 6. Converts the training and validation dataframes to TabularDatasets on AML default datastore for AutoML run  


#### checks if the dataset exists in the workspace. If not download it. Loads it into a dataframe and performs a quick data exploration 

In [10]:
found = False
key = "loan prediction dataset"
description_text = "loan prediction dataset for MLEMAND Capstone Project "

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 

if not found:
        # Create AML Dataset and register it into Workspace
        example_data = 'https://raw.githubusercontent.com/atan4583/datasets/master/train.csv'
        dataset = Dataset.Tabular.from_delimited_files(example_data)        
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,
                                   description=description_text)


df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
count,614.0,612.0,592.0,600.0,564.0
mean,5403.459283,1624.906863,146.412162,342.0,0.842199
std,6109.041673,2930.199261,85.587325,65.12041,0.364878
min,150.0,0.0,9.0,12.0,0.0
25%,2877.5,0.0,100.0,360.0,1.0
50%,3812.5,1211.5,128.0,360.0,1.0
75%,5795.0,2303.0,168.0,360.0,1.0
max,81000.0,41667.0,700.0,480.0,1.0


#### perform quick data exploration steps to ensure all columns are numeric with no missing values. 

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
Loan_ID              614 non-null object
Gender               601 non-null object
Married              611 non-null object
Dependents           599 non-null object
Education            614 non-null object
Self_Employed        582 non-null object
ApplicantIncome      614 non-null int64
CoapplicantIncome    612 non-null float64
LoanAmount           592 non-null float64
Loan_Amount_Term     600 non-null float64
Credit_History       564 non-null float64
Property_Area        614 non-null object
Loan_Status          614 non-null bool
dtypes: bool(1), float64(4), int64(1), object(7)
memory usage: 58.3+ KB


In [12]:
df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,False,0,Graduate,False,5849,0.0,,360.0,1.0,Urban,True
1,LP001003,Male,True,1,Graduate,False,4583,1508.0,128.0,360.0,1.0,Rural,False
2,LP001005,Male,True,0,Graduate,True,3000,0.0,66.0,360.0,1.0,Urban,True
3,LP001006,Male,True,0,Not Graduate,False,2583,2358.0,120.0,360.0,1.0,Urban,True
4,LP001008,Male,False,0,Graduate,False,6000,0.0,141.0,360.0,1.0,Urban,True


In [13]:
df.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     2
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

In [14]:
# clean the dataset
x, y = clean_data(dataset)

In [15]:
# check column data type is numeric
x.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 11 columns):
Gender               614 non-null float64
Married              614 non-null float64
Dependents           614 non-null float64
Education            614 non-null int64
Self_Employed        614 non-null float64
ApplicantIncome      614 non-null int64
CoapplicantIncome    614 non-null float64
LoanAmount           614 non-null float64
Loan_Amount_Term     614 non-null float64
Credit_History       614 non-null float64
Property_Area        614 non-null int64
dtypes: float64(8), int64(3)
memory usage: 52.9 KB


In [16]:
#check column data type is numeric
y.to_frame().info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 1 columns):
y    614 non-null int64
dtypes: int64(1)
memory usage: 4.9 KB


#### check no missing value in all columns

In [17]:
print(f'x null chk: \n{x.isnull().sum()}\n \ny null chk: \n{y.isnull().sum()}\n')

x null chk: 
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
dtype: int64
 
y null chk: 
0



#### split `x` and `y` into train and test sets

In [18]:
x_train, x_test, y_train, y_test = train_test_split(x, y, stratify=y, random_state=42)

#### validate no missing value in all columns 

In [19]:
print(f'x_train null chk: \n{x_train.isnull().sum()}\n \ny_train null chk: \n{y_train.isnull().sum()}\n')

x_train null chk: 
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
dtype: int64
 
y_train null chk: 
0



In [20]:
print(f'x_test null chk: \n{x_test.isnull().sum()}\n \ny_test null chk: \n{y_test.isnull().sum()}\n')

x_test null chk: 
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
dtype: int64
 
y_test null chk: 
0



In [21]:
x_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 460 entries, 1 to 27
Data columns (total 11 columns):
Gender               460 non-null float64
Married              460 non-null float64
Dependents           460 non-null float64
Education            460 non-null int64
Self_Employed        460 non-null float64
ApplicantIncome      460 non-null int64
CoapplicantIncome    460 non-null float64
LoanAmount           460 non-null float64
Loan_Amount_Term     460 non-null float64
Credit_History       460 non-null float64
Property_Area        460 non-null int64
dtypes: float64(8), int64(3)
memory usage: 43.1 KB


In [22]:
y_train.to_frame().info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 460 entries, 1 to 27
Data columns (total 1 columns):
y    460 non-null int64
dtypes: int64(1)
memory usage: 7.2 KB


In [23]:
x_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154 entries, 194 to 557
Data columns (total 11 columns):
Gender               154 non-null float64
Married              154 non-null float64
Dependents           154 non-null float64
Education            154 non-null int64
Self_Employed        154 non-null float64
ApplicantIncome      154 non-null int64
CoapplicantIncome    154 non-null float64
LoanAmount           154 non-null float64
Loan_Amount_Term     154 non-null float64
Credit_History       154 non-null float64
Property_Area        154 non-null int64
dtypes: float64(8), int64(3)
memory usage: 14.4 KB


In [24]:
y_test.to_frame().info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154 entries, 194 to 557
Data columns (total 1 columns):
y    154 non-null int64
dtypes: int64(1)
memory usage: 2.4 KB


#### combine `x_train` &  `y_train`, `x_test` & `y_test` into a training and validation dataframe respectively 

In [25]:
xt=pd.concat([x_train, y_train], axis=1)

In [26]:
xt.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 460 entries, 1 to 27
Data columns (total 12 columns):
Gender               460 non-null float64
Married              460 non-null float64
Dependents           460 non-null float64
Education            460 non-null int64
Self_Employed        460 non-null float64
ApplicantIncome      460 non-null int64
CoapplicantIncome    460 non-null float64
LoanAmount           460 non-null float64
Loan_Amount_Term     460 non-null float64
Credit_History       460 non-null float64
Property_Area        460 non-null int64
y                    460 non-null int64
dtypes: float64(8), int64(4)
memory usage: 46.7 KB


In [27]:
xt.head()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,y
1,1.0,1.0,1.0,1,0.0,4583,1508.0,128.0,360.0,1.0,0,0
394,1.0,1.0,2.0,1,0.0,3100,1400.0,113.0,360.0,1.0,2,1
316,1.0,1.0,2.0,1,0.0,3717,0.0,120.0,360.0,1.0,1,1
62,1.0,1.0,0.0,0,1.0,2609,3449.0,165.0,180.0,0.0,0,0
158,1.0,0.0,0.0,1,0.0,2980,2083.0,120.0,360.0,1.0,0,1


In [28]:
xt.isnull().sum()

Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
y                    0
dtype: int64

In [29]:
xv=pd.concat([x_test, y_test], axis=1)

In [30]:
xv.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154 entries, 194 to 557
Data columns (total 12 columns):
Gender               154 non-null float64
Married              154 non-null float64
Dependents           154 non-null float64
Education            154 non-null int64
Self_Employed        154 non-null float64
ApplicantIncome      154 non-null int64
CoapplicantIncome    154 non-null float64
LoanAmount           154 non-null float64
Loan_Amount_Term     154 non-null float64
Credit_History       154 non-null float64
Property_Area        154 non-null int64
y                    154 non-null int64
dtypes: float64(8), int64(4)
memory usage: 15.6 KB


In [31]:
xv.head()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,y
194,1.0,0.0,0.0,1,0.0,4191,0.0,120.0,360.0,1.0,0,1
428,1.0,1.0,0.0,1,0.0,2920,0.0,87.0,360.0,1.0,0,1
444,1.0,1.0,0.0,1,0.0,7333,8333.0,175.0,300.0,0.0,0,1
34,1.0,0.0,3.0,1,0.0,12500,3000.0,320.0,360.0,1.0,0,0
164,1.0,1.0,0.0,1,0.0,9323,0.0,75.0,180.0,1.0,2,1


In [32]:
xv.isnull().sum()

Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
y                    0
dtype: int64

#### save training and validation dataframes as csv files

In [33]:
os.makedirs('./data', exist_ok=True)

In [34]:
xt.to_csv('data/automl-trn.csv', index = False)

In [35]:
xv.to_csv('data/automl-val.csv', index = False)

#### [create TabularDatasets from csv files](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets?WT.mc_id=AI-MVP-5003930#create-a-dataset-from-pandas-dataframe), upload the datasets to AML default datastore and register them. This step is required since AutoML job running on a remote cluster CAN'T access dataframes created on another compute instance  

In [36]:
datastore = ws.get_default_datastore()

In [37]:
datastore.upload(src_dir='data', target_path='data')

Uploading an estimated of 2 files
Uploading data/automl-trn.csv
Uploaded data/automl-trn.csv, 1 files out of an estimated total of 2
Uploading data/automl-val.csv
Uploaded data/automl-val.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_f2f175305d254b1a93e27d874350642a

In [38]:
dataset1 = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/automl-trn.csv'))])

In [39]:
train_ds = dataset1.register(workspace=ws,
                             name='loan-prediction-train-dataset',
                             description='loan prediction training data for MLEMAND Capstone Project')

In [40]:
dataset2 = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/automl-val.csv'))])

In [41]:
valid_ds = dataset2.register(workspace=ws,
                             name='loan-prediction-validate-dataset',
                             description='loan prediction validation data for MLEMAND Capstone Project')

#### Check available primary metrics for classification task type

In [42]:
utilities.get_primary_metrics('classification')

['precision_score_weighted',
 'accuracy',
 'average_precision_score_weighted',
 'norm_macro_recall',
 'AUC_weighted']

<a id='aml-setup'></a>
### 3.2 AutoML config setup
#### Configure AutoML config  

In [43]:
# Set parameters for AutoMLConfig
# NOTE: DO NOT CHANGE THE experiment_timeout_minutes PARAMETER OR YOUR INSTANCE WILL TIME OUT.
# If you wish to run the experiment longer, you will need to run this notebook in your own
# Azure tenant, which will incur personal costs.

automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'accuracy',
}

automl_config = AutoMLConfig(
    task='classification',
    max_cores_per_iteration=-1,
    featurization='auto',
    iterations=30,
    enable_early_stopping=True,
    compute_target=compute_cluster,
    debug_log = 'automl_errors.log',
    training_data=train_ds,
    validation_data=valid_ds,
    label_column_name='y',
    **automl_settings)

<a id='aml-run'></a>
### 3.3 AutoML run
#### submit AutoML run 

In [44]:
automl_run=exp.submit(config=automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on std-ds3-v2 with default configuration
Running on remote compute: std-ds3-v2
Parent Run ID: AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: ht

<a id='aml-watch'></a>
### 3.4 Monitor AutoML run 
#### use  `RunDetails` widget to show the different experiments

In [45]:
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [46]:
automl_run

Experiment,Id,Type,Status,Details Page,Docs Page
capstone-automl-exp,AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [47]:
# wait for AutoML run to complete before proceeding to the next step
automl_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc',
 'target': 'std-ds3-v2',
 'status': 'Completed',
 'startTimeUtc': '2020-12-16T15:05:25.507253Z',
 'endTimeUtc': '2020-12-16T15:25:34.142902Z',
 'properties': {'num_iterations': '30',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'std-ds3-v2',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"5f80d561-4ccf-49be-befa-02960ca67cde\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/automl-trn.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-130725\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"c463503f-66c4-48b5-9bb5-b66fec87c814\\\\\\", \\\\\\"workspa

<a id='aml-model'></a>
### 3.5 Examine the best AutoML model details
#### retrieve the best AutoML model, print all the relevant properties and metrics 

In [48]:
best_amlrun, aml_model = automl_run.get_output()
print(f'best amlrun:\n{best_amlrun}\n')
print(f'aml model:\n{aml_model}\n')

best amlrun:
Run(Experiment: capstone-automl-exp,
Id: AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28,
Type: azureml.scriptrun,
Status: Completed)

aml model:
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                        decision_function_shape='ovr',
                                                                                        degree=3,
                 

In [49]:
best_amlrun

Experiment,Id,Type,Status,Details Page,Docs Page
capstone-automl-exp,AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [50]:
# print best_amlrun properties to see which see which property-value to print
# cross check the details with the Raw JSON of best model from web ui
best_amlrun.get_properties()

{'runTemplate': 'automl_child',
 'pipeline_id': '__AutoML_Ensemble__',
 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'capstone-automl-exp\',\'compute_target\':\'std-ds3-v2\',\'subscription_id\':\'c463503f-66c4-48b5-9bb5-b66fec87c814\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28","experiment_name":"capstone-automl-exp","workspace_name":"quick-starts-ws-130725","subscription_id":"c463503f-66c4-48b5-9bb5-b66fec87c814","resource_group_name":"aml-quickstarts-130725"}}]}',
 'training_percent': '100',
 'predicted_cost': None,
 'iteration': '28',
 '_aml_system_scenario_identification': 'Remote.Child',
 '_azureml.Comput

In [51]:
# .get_tags() prints details of a VotingEnsemble model  
best_amlrun.get_tags()

{'_aml_system_azureml.automlComponent': 'AutoML',
 '_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":0,"CurrentNodeCount":4}',
 'ensembled_iterations': '[23, 21, 16, 6, 3, 0, 7]',
 'ensembled_algorithms': "['ExtremeRandomTrees', 'RandomForest', 'RandomForest', 'XGBoostClassifier', 'RandomForest', 'LightGBM', 'SVM']",
 'ensemble_weights': '[0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285, 0.14285714285714285]',
 'best_individual_pipeline_score': '0.8051948051948052',
 'best_individual_iteration': '23',
 '_aml_system_automl_is_child_run_end_telemetry_event_logged': 'True'}

In [52]:
# print best model metrics
best_amlrun.get_metrics()

{'f1_score_weighted': 0.808292290485254,
 'average_precision_score_micro': 0.8207599071180334,
 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28/confusion_matrix',
 'AUC_weighted': 0.7942216981132074,
 'precision_score_weighted': 0.8153743315508021,
 'accuracy': 0.8181818181818182,
 'balanced_accuracy': 0.7482311320754718,
 'average_precision_score_macro': 0.7570950647871318,
 'recall_score_micro': 0.8181818181818182,
 'weighted_accuracy': 0.8707533234859673,
 'AUC_micro': 0.8431438691178951,
 'precision_score_micro': 0.8181818181818182,
 'precision_score_macro': 0.8095588235294118,
 'f1_score_micro': 0.8181818181818182,
 'recall_score_weighted': 0.8181818181818182,
 'average_precision_score_weighted': 0.79622245789027,
 'recall_score_macro': 0.7482311320754718,
 'AUC_macro': 0.7942216981132075,
 'matthews_correlation': 0.5544082871265799,
 'f1_score_macro': 0.7673213900280595,
 'norm_macro_recall': 0.4964622641509435,
 'log_loss':

In [53]:
# print best model details
best_amlrun.get_details()

{'runId': 'AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28',
 'target': 'std-ds3-v2',
 'status': 'Completed',
 'startTimeUtc': '2020-12-16T15:23:54.477946Z',
 'endTimeUtc': '2020-12-16T15:25:16.754134Z',
 'properties': {'runTemplate': 'automl_child',
  'pipeline_id': '__AutoML_Ensemble__',
  'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'capstone-automl-exp\',\'compute_target\':\'std-ds3-v2\',\'subscription_id\':\'c463503f-66c4-48b5-9bb5-b66fec87c814\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28","experiment_name":"capstone-automl-exp","workspace_name":"quick-starts-ws-130725","subscription_id":"c463503f-66c4-48b

In [54]:
# print best model accuracy
best_amlrun.get_properties()['score']

'0.8181818181818182'

In [55]:
best_runid = best_amlrun.id
best_acc = best_amlrun.get_metrics()['accuracy']

In [56]:
print(f'best automl run job id: {best_runid}\n')
print(f'best automl run Accuracy: {best_acc}\n')

best automl run job id: AutoML_34a117d3-35fa-4411-b74c-962bb92e06bc_28

best automl run Accuracy: 0.8181818181818182



In [57]:
# print best automl model
aml_model

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                        decision_function_shape='ovr',
                                                                                        degree=3,
                                                                                        gamma='scale',
                                                                        

In [58]:
# Another way to print the same info about the best model
print(f'aml_model final estimator: \n{aml_model._final_estimator}\n')

aml_model final estimator: 
PreFittedSoftVotingClassifier(classification_labels=None,
                              estimators=[('23',
                                           Pipeline(memory=None,
                                                    steps=[('maxabsscaler',
                                                            MaxAbsScaler(copy=True)),
                                                           ('extratreesclassifier',
                                                            ExtraTreesClassifier(bootstrap=False,
                                                                                 ccp_alpha=0.0,
                                                                                 class_weight=None,
                                                                                 criterion='gini',
                                                                                 max_depth=None,
                                                                

In [59]:
#Example code from https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-features#scaling-and-normalization

from pprint import pprint

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()

print_model(aml_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

prefittedsoftvotingclassifier
{'estimators': ['23', '21', '16', '6', '3', '0', '7'],
 'weights': [0.14285714285714285,
             0.14285714285714285,
             0.14285714285714285,
             0.14285714285714285,
             0.14285714285714285,
             0.14285714285714285,
             0.14285714285714285]}

23 - maxabsscaler
{'copy': True}

23 - extratreesclassifier
{'bootstrap': False,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 0.7,
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 0.01,
 'min_samples_split': 0.3

In [60]:
# print the best experiment run file paths and names
best_amlrun.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'azureml-logs/55_azureml-execution-tvmps_749334143756ed5bdb7964ef69fa6de4d1f4298a9f12e3cddd5b58415b406a59_d.txt',
 'azureml-logs/65_job_prep-tvmps_749334143756ed5bdb7964ef69fa6de4d1f4298a9f12e3cddd5b58415b406a59_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_749334143756ed5bdb7964ef69fa6de4d1f4298a9f12e3cddd5b58415b406a59_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'confusion_matrix',
 'logs/azureml/103_azureml.log',
 'logs/azureml/azureml_automl.log',
 'logs/azureml/dataprep/python_span_31851814-f46f-4728-9fdb-ffad574da50c.jsonl',
 'logs/azureml/dataprep/python_span_a7e4c842-b4aa-45d9-8463-347857d06d0d.jsonl',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/conda_env_v_1_0_0.yml',
 'outputs/env_dependencies.json',
 'outputs/model.pkl',
 'outputs/pipeline_graph.json',
 'outputs/scoring_file_v_1_0_0.py']

<a id='aml-reg'></a>
### 3.6 Save and register the best AutoML model
#### Save the best model

In [61]:
os.makedirs('./amlmodel', exist_ok=True)

In [62]:
best_amlrun.download_file('/outputs/model.pkl',os.path.join('./amlmodel','automl_best_model.pkl'))

In [63]:
#Save the best model, save all files in outputs

for f in best_amlrun.get_file_names():
    if f.startswith('outputs'):
        output_file_path = os.path.join('./amlmodel', f.split('/')[-1])
        print(f'Downloading from {f} to {output_file_path} ...')
        best_amlrun.download_file(name=f, output_file_path=output_file_path)


Downloading from outputs/conda_env_v_1_0_0.yml to ./amlmodel/conda_env_v_1_0_0.yml ...
Downloading from outputs/env_dependencies.json to ./amlmodel/env_dependencies.json ...
Downloading from outputs/model.pkl to ./amlmodel/model.pkl ...
Downloading from outputs/pipeline_graph.json to ./amlmodel/pipeline_graph.json ...
Downloading from outputs/scoring_file_v_1_0_0.py to ./amlmodel/scoring_file_v_1_0_0.py ...


#### Register the best model

In [64]:
model=best_amlrun.register_model(
            model_name = 'automl_bestmodel', 
            model_path = './outputs/model.pkl',
            model_framework=Model.Framework.SCIKITLEARN,
            tags={'accuracy': best_acc},
            description='Loan Application Prediction'
)

<a id='deploy'></a>
## 4. Model Deployment

This entails the follow tasks
> * Deployment setup  
>
> * Deploy the model as a web service
>
> * Testing the web service 
>
> * Enable Application Insights
>
> * Printing the logs of the web service
>


<a id='dply1'></a>
### 4.1 Deployment setup
#### upload conda environment yml and scoring files, set inference config and aci config

#### use the conda environment and scoring files produced by AutoML for model deployment

In [65]:
# Download the conda environment file produced by AutoML 
best_amlrun.download_file('outputs/conda_env_v_1_0_0.yml', 'conda_env.yml')
myenv = Environment.from_conda_specification(name = 'myenv',
                                             file_path = 'conda_env.yml')

In [66]:
# inspect the new environment object 
myenv

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20200821.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": "2g"
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "myenv",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"
      

#### configure inference config and aci webservice config

In [67]:
# download the scoring file produced by AutoML
best_amlrun.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')

# set inference config
inference_config = InferenceConfig(entry_script= 'score.py',
                                    environment=myenv)

In [68]:
# set Aci Webservice config
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, auth_enabled=True)

<a id='dply2'></a>
### 4.2 Deploy the model as a web service
#### start model deployment and wait for the deployment to finish

In [69]:
service = Model.deploy(workspace=ws, 
                       name='best-automl-model', 
                       models=[model], 
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)

In [70]:
service

AciWebservice(workspace=Workspace.create(name='quick-starts-ws-130725', subscription_id='c463503f-66c4-48b5-9bb5-b66fec87c814', resource_group='aml-quickstarts-130725'), name=best-automl-model, image_id=None, compute_type=None, state=ACI, scoring_uri=Transitioning, tags=None, properties={}, created_by={})

In [71]:
# wait for deployment to finish and print the scoring uri and swagger uri
service.wait_for_deployment(show_output=True)
print(f'\nservice state: {service.state}\n')

print(f'scoring URI: \n{service.scoring_uri}\n')
print(f'swagger URI: \n{service.swagger_uri}\n')

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running..............................................................................................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"

service state: Healthy

scoring URI: 
http://8cb5942b-6d6b-4932-9f9e-b32cbf737a24.southcentralus.azurecontainer.io/score

swagger URI: 
http://8cb5942b-6d6b-4932-9f9e-b32cbf737a24.southcentralus.azurecontainer.io/swagger.json



In [72]:
# print the primary authentication key for the deployed webservice
pkey, skey = service.get_keys()
print(f'primary key: {pkey}')

primary key: vrsyLPJLCho8sNlOczCYAWeeNczheMY3


<a id='dply3'></a>
### 4.3 Testing the web service
#### randomly select 2 samples from the validation dataframe and send a request to the web service endpoint

In [73]:
# select 2 random samples from validation dataframe xv
scoring_sample = xv.sample(2)
y_label = scoring_sample.pop('y')

In [74]:
# convert the sample records to a json data file
scoring_json = json.dumps({'data': scoring_sample.to_dict(orient='records')})
print(f'{scoring_json}')

{"data": [{"Gender": 1.0, "Married": 1.0, "Dependents": 0.0, "Education": 1, "Self_Employed": 1.0, "ApplicantIncome": 2479, "CoapplicantIncome": 3013.0, "LoanAmount": 188.0, "Loan_Amount_Term": 360.0, "Credit_History": 1.0, "Property_Area": 2}, {"Gender": 1.0, "Married": 1.0, "Dependents": 0.0, "Education": 1, "Self_Employed": 0.0, "ApplicantIncome": 5829, "CoapplicantIncome": 0.0, "LoanAmount": 138.0, "Loan_Amount_Term": 360.0, "Credit_History": 1.0, "Property_Area": 0}]}


In [75]:
# Set the content type 
headers = {"Content-Type": "application/json"}

In [76]:
# set the authorization header
headers["Authorization"] = f"Bearer {pkey}"

In [77]:
# post a request to the scoring uri
resp = requests.post(service.scoring_uri, scoring_json, headers=headers)

In [78]:
# print the scoring results
print(resp.json())

{"result": [1, 1]}


In [79]:
# compare the scoring results with the corresponding y label values
print(f'True Values: {y_label.values}')

True Values: [1 1]


#### another way to test the scoring uri without sending a request with a key

In [80]:
print(f'Prediction: {service.run(scoring_json)}')

Prediction: {"result": [1, 1]}


<a id='dply4'></a>
### 4.4 Enable Application Insights
#### update web service to enable Application Insights and wait for the deployment to finish

In [81]:
service.update(enable_app_insights=True)

In [82]:
service.wait_for_deployment(show_output=True)
print(f'\nservice state: {service.state}\n')

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running...................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"

service state: Healthy



<a id='dply5'></a>
### 4.5 Printing the logs of the web service
#### print the logs by calling the get_logs() function of the web service 

In [83]:
print(f'webservice logs: \n{service.get_logs()}\n')

webservice logs: 
2020-12-16T16:53:25,855541400+00:00 - iot-server/run 
2020-12-16T16:53:25,851769100+00:00 - gunicorn/run 
2020-12-16T16:53:25,886913300+00:00 - rsyslog/run 
rsyslogd: /azureml-envs/azureml_792c0ca57fa993dcf1707a372ebeae46/lib/libuuid.so.1: no version information available (required by rsyslogd)
2020-12-16T16:53:25,960418000+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_792c0ca57fa993dcf1707a372ebeae46/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_792c0ca57fa993dcf1707a372ebeae46/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_792c0ca57fa993dcf1707a372ebeae46/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_792c0ca57fa993dcf1707a372ebeae46/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sb

<a id='dply6'></a>
### 4.6 Active web service endpoint Demo
#### randomly select 3 samples from the validation dataframe, send a request to the web service endpoint

In [207]:
# select 3 random samples from the xv dataframe
scoring_sample = xv.sample(3)
y_label = scoring_sample.pop('y')

In [208]:
# convert the sample records to a json data file
scoring_json = json.dumps({'data': scoring_sample.to_dict(orient='records')})
print(f'{scoring_json}')

{"data": [{"Gender": 1.0, "Married": 1.0, "Dependents": 2.0, "Education": 1, "Self_Employed": 0.0, "ApplicantIncome": 3159, "CoapplicantIncome": 461.0, "LoanAmount": 108.0, "Loan_Amount_Term": 84.0, "Credit_History": 1.0, "Property_Area": 2}, {"Gender": 0.0, "Married": 1.0, "Dependents": 1.0, "Education": 1, "Self_Employed": 0.0, "ApplicantIncome": 12000, "CoapplicantIncome": 0.0, "LoanAmount": 496.0, "Loan_Amount_Term": 360.0, "Credit_History": 1.0, "Property_Area": 1}, {"Gender": 1.0, "Married": 1.0, "Dependents": 3.0, "Education": 1, "Self_Employed": 1.0, "ApplicantIncome": 10139, "CoapplicantIncome": 0.0, "LoanAmount": 260.0, "Loan_Amount_Term": 360.0, "Credit_History": 1.0, "Property_Area": 1}]}


In [209]:
# send a request to the scoring uri
resp = requests.post(service.scoring_uri, scoring_json, headers=headers)

In [210]:
# print the scoring results
print(f'Prediction: {resp.json()}')

Prediction: {"result": [1, 1, 1]}


In [211]:
# compare the scoring results with the corresponding y label values
print(f'True Values: {y_label.values}')

True Values: [1 1 1]


In [212]:
# another way to test the scoring uri without sending a request with a key
print(f'Prediction: {service.run(scoring_json)}')

Prediction: {"result": [1, 1, 1]}


<a id='clean'></a>
## 5. Clean Up

### delete the web service

In [213]:
# clean up
print(f'delete service ... {service.delete()}')

delete service ... None


In [214]:
try:
    print(f'service state: {service.state}')
except:
    print(f'service not found')    

service state: Deleting


<a id='cita'></a>
## 6. Citations

### Project Starter Code
[Udacity Github Repo](https://github.com/udacity/nd00333-capstone/tree/master/starter_file)

### MLEMAND ND Using Azure Machine Learning 
[Lesson 6.8 - Exercise: AutoML and the SDK](https://youtu.be/KM8wYoxYeX0)

### MLEMAND ND Machine Learning Operations 
[Lesson 2.5 - Exercise: Enable Security and Authentication](https://youtu.be/rsECJolX2Ns)

[Lesson 2.10 - Exercise: Deploy an Azure Machine learning Model](https://youtu.be/_RKfF1D6W24)

[Lesson 2.15 - Exercise: Enable Application Insights](https://youtu.be/EXGfNMMTuMY)

### Azure Machine Learning Documentation and Example Code Snippets
[List all ComputeTarget objects within the workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py#list-workspace-)

[Create a dataset from pandas dataframe](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets#create-a-dataset-from-pandas-dataframe)

[Model Registration and Deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb)

[Using environments](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training/using-environments/using-environments.ipynb)

[AciWebservice Class](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.aciwebservice?view=azure-ml-py#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--primary-key-none--secondary-key-none--collect-model-data-none--cmk-vault-base-url-none--cmk-key-name-none--cmk-key-version-none--vnet-name-none--subnet-name-none-)

[What is Application Insights?](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)

### External Dataset
[Kaggle Loan Prediction Dataset](https://www.kaggle.com/altruistdelhite04/loan-prediction-problem-dataset?select=train_u6lujuX_CVtuZ9i.csv)
