# Porto Seguro's Safe Driving Prediction (AutoML Local Compute)

Now let's use Azure Automated ML!

#### RELATED BUG:
https://msdata.visualstudio.com/Vienna/_workitems/edit/583733

## Import Needed Packages

Import the packages needed for this solution notebook. The most widely used package for machine learning is [scikit-learn](https://scikit-learn.org/stable/), [pandas](https://pandas.pydata.org/docs/getting_started/index.html#getting-started), and [numpy](https://numpy.org/). These packages have various features, as well as a lot of clustering, regression and classification algorithms that make it a good choice for data mining and data analysis.

In [3]:
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import joblib
from sklearn import metrics

## Check Azure ML SDK version

In [4]:
import azureml.core
print("This notebook was created and tested using version 1.2.0 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

This notebook was created and tested using version 1.2.0 of the Azure ML SDK
You are currently using version 1.2.0 of the Azure ML SDK


##  Get Azure ML Workspace to use

In [5]:
from azureml.core import Workspace, Dataset

# Get Workspace defined in by default config.json file
ws = Workspace.from_config()

## Load data from file into regular Pandas DataFrame

In [6]:
## Directly load from file into Pandas DataFrame
# DATA_DIR = "../../data/"
# data_df = pd.read_csv(os.path.join(DATA_DIR, 'porto_seguro_safe_driver_prediction_train.csv'))

# print(data_df.shape)
# data_df.head(5)

### Submit dataset file into DataStore (Azure Blob under the covers)

Download the dataset file from the URL below and copy into the folder **/data** within the folder structure after you did a "git clone" of the repo.

Dataset file: https://azmlworkshopdata.blob.core.windows.net/safedriverdata/porto_seguro_safe_driver_prediction_train.csv

In [8]:
datastore = ws.get_default_datastore()
datastore.upload(src_dir='../../data/', 
                 target_path='Datasets/porto_seguro_safe_driver_prediction', overwrite=True, show_progress=True)

Uploading an estimated of 2 files
Uploading ../../data/Place_here_the_Dataset_files.txt
Uploading ../../data/porto_seguro_safe_driver_prediction_train.csv
Uploaded ../../data/Place_here_the_Dataset_files.txt, 1 files out of an estimated total of 2
Uploaded ../../data/porto_seguro_safe_driver_prediction_train.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_c5def51e6792437eab431b9dac9587ff

## Load data into Azure ML Dataset and Register into Workspace

In [10]:
# Try to load the dataset from the Workspace. Otherwise, create it from the file in the HTTP URL
found = False
aml_dataset_name = "porto_seguro_safe_driver_prediction_train"

if aml_dataset_name in ws.datasets.keys(): 
       found = True
       aml_dataset = ws.datasets[aml_dataset_name] 
       print("Dataset loaded from the Workspace")
       
if not found:
        # Create AML Dataset and register it into Workspace
        print("Dataset does not exist in the current Workspace. It will be imported and registered.")
        
        # Option A: Create AML Dataset from file in AML DataStore
        datastore = ws.get_default_datastore()
        aml_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('Datasets/porto_seguro_safe_driver_prediction/porto_seguro_safe_driver_prediction_train.csv'))
        data_origin_type = 'AMLDataStore'
        
        # Option B: Create AML Dataset from file in HTTP URL
        # data_url = 'https://azmlworkshopdata.blob.core.windows.net/safedriverdata/porto_seguro_safe_driver_prediction_train.csv'
        # aml_dataset = Dataset.Tabular.from_delimited_files(data_url)  
        # data_origin_type = 'HttpUrl'
        
        print(aml_dataset)
                
        #Register Dataset in Workspace
        registration_method = 'SDK'  # or 'UI'
        aml_dataset = aml_dataset.register(workspace=ws,
                                           name=aml_dataset_name,
                                           description='Porto Seguro Safe Driver Prediction Train dataset file',
                                           tags={'Registration-Method': registration_method, 'Data-Origin-Type': data_origin_type},
                                           create_new_version=True)
        
        print("Dataset created from file and registered in the Workspace")

Dataset loaded from the Workspace


In [11]:
# Use Pandas DataFrame just to sneak peak some data and schema
data_df = aml_dataset.to_pandas_dataframe()
# print(data_df.describe())

print(data_df.shape)
data_df.head(5)

(595212, 59)


Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,7,0,2,2,5,1,0,0,1,0,...,9,1,5,8,0,1,1,0,0,1
1,9,0,1,1,7,0,0,0,0,1,...,3,1,1,9,0,1,1,0,1,0
2,13,0,5,4,9,1,0,0,0,1,...,4,2,7,7,0,1,1,0,1,0
3,16,0,0,1,2,0,0,1,0,0,...,2,2,4,9,0,0,0,0,0,0
4,17,0,0,2,0,1,0,1,0,0,...,3,1,1,3,0,0,0,1,1,0


In [12]:
# Split in train/test datasets (Test=10%, Train=90%)

train_dataset, test_dataset = aml_dataset.random_split(0.9, seed=0)

# Use Pandas DF only to check the data
train_df = train_dataset.to_pandas_dataframe()
test_df = test_dataset.to_pandas_dataframe()

print(train_df.shape)
print(test_df.shape)

train_df.describe()

(536096, 59)
(59116, 59)


Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
count,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,...,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0
mean,743520.5,0.036404,1.899701,1.35846,4.425696,0.416701,0.404937,0.393836,0.257077,0.163702,...,5.439973,1.441108,2.872583,7.539911,0.122657,0.628026,0.554289,0.287154,0.348833,0.153282
std,429311.0,0.187293,1.983901,0.664517,2.701037,0.493316,1.349914,0.4886,0.437023,0.370005,...,2.332432,1.201962,1.694968,2.746033,0.328044,0.483332,0.497044,0.452434,0.476601,0.36026
min,7.0,0.0,0.0,-1.0,0.0,-1.0,-1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,371726.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,...,4.0,1.0,2.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,743168.0,0.0,1.0,1.0,4.0,0.0,0.0,0.0,0.0,0.0,...,5.0,1.0,3.0,7.0,0.0,1.0,1.0,0.0,0.0,0.0
75%,1115280.0,0.0,3.0,2.0,6.0,1.0,0.0,1.0,1.0,0.0,...,7.0,2.0,4.0,9.0,0.0,1.0,1.0,1.0,1.0,0.0
max,1488027.0,1.0,7.0,4.0,11.0,1.0,6.0,1.0,1.0,1.0,...,19.0,10.0,13.0,23.0,1.0,1.0,1.0,1.0,1.0,1.0


In [13]:
test_df.head(5)

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,50,0,1,2,1,0,0,0,0,1,...,3,3,1,8,0,0,1,0,0,0
1,64,1,0,1,2,1,0,1,0,0,...,10,3,1,11,0,1,1,0,1,0
2,79,0,0,1,4,1,0,0,0,1,...,4,2,4,3,0,1,1,1,0,1
3,84,1,0,2,0,1,4,1,0,0,...,3,2,0,8,0,1,1,0,0,0
4,95,0,0,1,8,0,0,1,0,0,...,2,1,3,11,1,0,1,0,0,1


## Split Data into Train and Test Sets

Partitioning data into training, validation, and holdout/test sets allows you to develop highly accurate models that are relevant to data that you collect in the future, not just the data the model was trained on. 

In this notebook we holdout a test dataset to calculate the metrics with that set "not seen" by AutoML, after the AutoML process is finished.
Not taking into account the test dataset, AutoML will by default internally either use a validation dataset split from the train dataset or use cross-validation, depending on the size of the data (larger than 20k rows will use validation split), or as explicitely specified in the AutoMLConfig class (Validation-split vs. Cross-Validation).

In [14]:
# Only needed if stratifying ######
# Segregate y from x feature columns

# x_data_df = data_df.copy()

# if 'target' in x_data_df.columns:
#     y_data_df = x_data_df.pop('target')

# print(data_df.shape)
# print(x_data_df.shape)
# print(y_data_df.shape)

In [15]:
# Split in train/test datasets (Test=10%, Train=90%)

# Split in full sets, if not stratifying
# train_df, test_df = train_test_split(data_df, test_size=0.1, random_state=0)

# -----------------------------------------------------------------
# Stratify by 'y' (label column)

# x_train_df, x_test_df, y_train_df, y_test_df = train_test_split(
#                                                        x_data_df, 
#                                                        y_data_df,
#                                                        test_size=0.1,
#                                                        random_state=0,
#                                                        stratify=y_data_df)

# print(x_train_df.shape)
# print(x_test_df.shape)

# # Join y to recreate the whole train dataset but now stratified

# train_df = x_train_df.copy()
# train_df['target'] = y_train_df

# test_df = x_test_df.copy()
# test_df['target'] = y_test_df
# -----------------------------------------------------------------

print(train_df.shape)
print(test_df.shape)

train_df.describe()

(536096, 59)
(59116, 59)


Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
count,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,...,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0,536096.0
mean,743520.5,0.036404,1.899701,1.35846,4.425696,0.416701,0.404937,0.393836,0.257077,0.163702,...,5.439973,1.441108,2.872583,7.539911,0.122657,0.628026,0.554289,0.287154,0.348833,0.153282
std,429311.0,0.187293,1.983901,0.664517,2.701037,0.493316,1.349914,0.4886,0.437023,0.370005,...,2.332432,1.201962,1.694968,2.746033,0.328044,0.483332,0.497044,0.452434,0.476601,0.36026
min,7.0,0.0,0.0,-1.0,0.0,-1.0,-1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,371726.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,...,4.0,1.0,2.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,743168.0,0.0,1.0,1.0,4.0,0.0,0.0,0.0,0.0,0.0,...,5.0,1.0,3.0,7.0,0.0,1.0,1.0,0.0,0.0,0.0
75%,1115280.0,0.0,3.0,2.0,6.0,1.0,0.0,1.0,1.0,0.0,...,7.0,2.0,4.0,9.0,0.0,1.0,1.0,1.0,1.0,0.0
max,1488027.0,1.0,7.0,4.0,11.0,1.0,6.0,1.0,1.0,1.0,...,19.0,10.0,13.0,23.0,1.0,1.0,1.0,1.0,1.0,1.0


In [16]:
train_df.head(5)

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,7,0,2,2,5,1,0,0,1,0,...,9,1,5,8,0,1,1,0,0,1
1,9,0,1,1,7,0,0,0,0,1,...,3,1,1,9,0,1,1,0,1,0
2,13,0,5,4,9,1,0,0,0,1,...,4,2,7,7,0,1,1,0,1,0
3,16,0,0,1,2,0,0,1,0,0,...,2,2,4,9,0,0,0,0,0,0
4,17,0,0,2,0,1,0,1,0,0,...,3,1,1,3,0,0,0,1,1,0


## Train with Azure AutoML automatically searching for the 'best model' (Best algorithms and best hyper-parameters)

### List and select primary metric to drive the AutoML classification problem

In [17]:
from azureml.train import automl
    
# Get a list of valid metrics for your given task
automl.utilities.get_primary_metrics('classification')

# List of possible primary metrics is here:
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric

['average_precision_score_weighted',
 'norm_macro_recall',
 'accuracy',
 'AUC_weighted',
 'precision_score_weighted']

## Define AutoML Experiment settings

In [18]:
import logging

# You can provide additional settings as a **kwargs parameter for the AutoMLConfig object
automl_settings = {
      "blacklist_models":['LogisticRegression', 'ExtremeRandomTrees', 'RandomForest'], 
      # "whitelist_models": ['LightGBM'],
      # "n_cross_validations": 5,
      # "validation_data": test_df,  # Better to holdout the Test Dataset
      "experiment_exit_score": 0.7
}

from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(task='classification',
                             primary_metric='AUC_weighted',                           
                             training_data= train_df, #  train_dataset, # Pandas DataFrame
                             label_column_name="target",                                                    
                             enable_early_stopping= True,
                             iterations=5,
                             experiment_timeout_hours=1,  # Enforced: Cannot be less than 1h due to the size of this dataset (Cols*Rows)                         
                             featurization= 'off',        # (auto/off) All feature columns in this dataset are numbers, no need to featurize. 
                             debug_log='automated_ml_errors.log',
                             verbosity= logging.INFO,
                             model_explainability=True,
                             enable_onnx_compatible_models=False,
                             **automl_settings
                             )

# Explanation of Settings: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#configure-your-experiment-settings

# AutoMLConfig info on: 
# https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig

## Run Experiment with multiple child runs under the covers

In [19]:
from azureml.core import Experiment

experiment_name = "SDK_local_porto_seguro_driver_pred"
print(experiment_name)

experiment = Experiment(workspace=ws, 
                        name=experiment_name)

import time
start_time = time.time()

run = experiment.submit(automl_config, show_output=True)

print('Manual run timing: --- %s minutes needed for running the whole Remote AutoML Experiment ---' % ((time.time() - start_time)/60))

SDK_local_porto_seguro_driver_pred
Running on local machine
Parent Run ID: AutoML_92135c22-dc6d-46e1-a79e-fed2432d2626

Current status: DatasetBalancing. Performing class balancing sweeping

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Train-Test data split
STATUS:       DONE
DESCRIPTION:  Your input data has been split into a training dataset and a holdout test dataset for validation of the model. The test holdout dataset reflects the original distribution of your input data.
PARAMETERS:   Dataset : train, Row counts : 482486, Percentage : 89.99992538649794
              Dataset : test, Row counts : 53610, Percentage : 10.00007461350206
              
TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  Classes in the training data are imbalanced.
PARAMETERS:   Size of the smallest class : 17564, number of samples in the training data : 482486
              
****************

## Explore results with Widget

In [20]:
# Explore the results of automatic training with a Jupyter widget: https://docs.microsoft.com/en-us/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py
from azureml.widgets import RunDetails
RunDetails(run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### Measure Parent Run Time needed for the whole AutoML process 

In [21]:
import time
from datetime import datetime

run_details = run.get_details()

# Like: 2020-01-12T23:11:56.292703Z
end_time_utc_str = run_details['endTimeUtc'].split(".")[0]
start_time_utc_str = run_details['startTimeUtc'].split(".")[0]
timestamp_end = time.mktime(datetime.strptime(end_time_utc_str, "%Y-%m-%dT%H:%M:%S").timetuple())
timestamp_start = time.mktime(datetime.strptime(start_time_utc_str, "%Y-%m-%dT%H:%M:%S").timetuple())

parent_run_time = timestamp_end - timestamp_start
print('Run Timing: --- %s minutes needed for running the whole Remote AutoML Experiment ---' % (parent_run_time/60))

Run Timing: --- 3.5166666666666666 minutes needed for running the whole Remote AutoML Experiment ---


## Retrieve the 'Best' Model

In [22]:
best_run, fitted_model = run.get_output()
print(best_run)
print('--------')
print(fitted_model)

Run(Experiment: SDK_local_porto_seguro_driver_pred,
Id: AutoML_92135c22-dc6d-46e1-a79e-fed2432d2626_4,
Type: None,
Status: Completed)
--------
Pipeline(memory=None,
     steps=[('stackensembleclassifier', StackEnsembleClassifier(base_learners=[('0', Pipeline(memory=None,
     steps=[('MaxAbsScaler', MaxAbsScaler(copy=True)), ('LightGBMClassifier', LightGBMClassifier(boosting_type='gbdt', class_weight=None,
          colsample_bytree=1.0, importance_type='split', lea...022230184A90>,
           solver='lbfgs', tol=0.0001, verbose=0),
            training_cv_folds=5))])


## Make Predictions and calculate metrics

### Prep Test Data: Extract X values (feature columns) from dataset and convert to NumPi array for predicting 

In [23]:
import pandas as pd

x_test_df = test_df.copy()

if 'target' in x_test_df.columns:
    y_test_df = x_test_df.pop('target')

print(test_df.shape)
print(x_test_df.shape)
print(y_test_df.shape)


(59116, 59)
(59116, 58)
(59116,)


In [24]:
y_test_df.describe()

count   59116.00
mean        0.04
std         0.19
min         0.00
25%         0.00
50%         0.00
75%         0.00
max         1.00
Name: target, dtype: float64

### Make predictions in bulk

In [25]:
# Try the best model making predictions with the test dataset
y_predictions = fitted_model.predict(x_test_df)

print('30 predictions: ')
print(y_predictions[:30])

30 predictions: 
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


### Get all the predictions' probabilities needed to calculate ROC AUC

In [26]:
class_probabilities = fitted_model.predict_proba(x_test_df)
print(class_probabilities.shape)

print('Some class probabilities...: ')
print(class_probabilities[:3])

print('Probabilities for class 1:')
print(class_probabilities[:,1])

print('Probabilities for class 0:')
print(class_probabilities[:,0])

(59116, 2)
Some class probabilities...: 
[[0.96358202 0.03641798]
 [0.96357294 0.03642706]
 [0.96356763 0.03643237]]
Probabilities for class 1:
[0.03641798 0.03642706 0.03643237 ... 0.03642523 0.03641572 0.03643125]
Probabilities for class 0:
[0.96358202 0.96357294 0.96356763 ... 0.96357477 0.96358428 0.96356875]


## Evaluate Model

Evaluating performance is an essential task in machine learning. In this case, because this is a classification problem, the data scientist elected to use an AUC - ROC Curve. When we need to check or visualize the performance of the multi - class classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. It is one of the most important evaluation metrics for checking any classification model’s performance.

<img src="https://www.researchgate.net/profile/Oxana_Trifonova/publication/276079439/figure/fig2/AS:614187332034565@1523445079168/An-example-of-ROC-curves-with-good-AUC-09-and-satisfactory-AUC-065-parameters.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 12px; width: 320px; height: 239px;" />

### Calculate the ROC AUC with probabilities vs. the Test Dataset

In [27]:
print('ROC AUC *method 1*:')
fpr, tpr, thresholds = metrics.roc_curve(y_test_df, class_probabilities[:,1])
metrics.auc(fpr, tpr)


ROC AUC *method 1*:


0.5031060882649053

In [28]:
from sklearn.metrics import roc_auc_score

print('ROC AUC *method 2*:')
print(roc_auc_score(y_test_df, class_probabilities[:,1]))

print('ROC AUC Weighted:')
print(roc_auc_score(y_test_df, class_probabilities[:,1], average='weighted'))

# ********** THIS IS THE BUG when training with Pandas DataFrame ***********
# AUC should be around 0.63 instead of 0.49 or 0.5


ROC AUC *method 2*:
0.5031060882649053
ROC AUC Weighted:
0.5031060882649053


### Calculate the Accuracy with predictions vs. the Test Dataset

In [29]:
from sklearn.metrics import accuracy_score

print('Accuracy:')
print(accuracy_score(y_test_df, y_predictions))


Accuracy:
0.9631571824886663


## See files associated with the 'Best run'

In [30]:
print(best_run.get_file_names())

# best_run.download_file('azureml-logs/70_driver_log.txt') # To check errors

['accuracy_table', 'confusion_matrix', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/classes.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/eval_data_viz.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/expected_values.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/features.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/global_importance_names/0.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/global_importance_rank/0.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/global_importance_values/0.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/local_importance_values.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/per_class_names/0.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/per_class_rank/0.interpret.json', 'explanation/04035dfd-dcee-476e-9179-bc5fe05e807e/per_class_values/0.interpret.json', 'explanation/04035dfd

## Download model pickle file from the run

In [31]:
# Download the model .pkl file to local (Using the 'run' object)
best_run.download_file('outputs/model.pkl')

### Load model in memory from .pkl file

In [32]:
# Load the model into memory
import joblib
fitted_model = joblib.load('model.pkl')
print(fitted_model)

Pipeline(memory=None,
     steps=[('stackensembleclassifier', StackEnsembleClassifier(base_learners=[('0', Pipeline(memory=None,
     steps=[('MaxAbsScaler', MaxAbsScaler(copy=True)), ('LightGBMClassifier', LightGBMClassifier(boosting_type='gbdt', class_weight=None,
          colsample_bytree=1.0, importance_type='split', lea...022230184860>,
           solver='lbfgs', tol=0.0001, verbose=0),
            training_cv_folds=5))])
