# Binary classification Scikit-Learn model training. Run remotely via Azure Machine Learning Compute
_**This notebook showcases the training/creation of a Binary classification model training and predictions using Scikit-Learn on an Azure Machine Leanrning Compute Target (AMLCompute).**_


## Setup and connect to AML Workspace

In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

from azureml.core import Workspace, Dataset

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

SDK version: 1.0.76
cesardl-automl-northcentralus-ws
automlpmdemo
northcentralus
102a16c3-37d3-48a8-9237-4c9b1e8e80e0


## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [2]:
from azureml.core import Experiment
experiment_name = 'classif-attrition-amlcompute'
experiment = Experiment(workspace=ws, name=experiment_name)

## Introduction to AmlCompute

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. 

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. 

For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)

**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

### Create project directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [3]:
import os
import shutil

project_folder = './classif-attrition-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train.py', project_folder)

'./classif-attrition-amlcompute/train.py'

### Fetch or create the compute target 

We are going to use the compute target you had created before (make sure you provide the same name here in the variable `cpu_cluster_name`. 

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "cesardl-cpu-clus"

cluster = ws.compute_targets[cluster_name]

### Configure & Run

In [5]:
from azureml.train.estimator import Estimator

pip_packages = [
    'azureml-defaults', 'azureml-core', 'azureml-telemetry',
    'sklearn-pandas', 'azureml-dataprep', 'joblib'
]

estimator = Estimator(source_directory=project_folder, 
                      compute_target=cluster,
                      entry_script='train.py',
                      pip_packages=pip_packages,
                      conda_packages=['scikit-learn'],
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')])

run = experiment.submit(estimator)
run

Experiment,Id,Type,Status,Details Page,Docs Page
classif-attrition-amlcompute,classif-attrition-amlcompute_1578453729_33872bdb,azureml.scriptrun,Starting,Link to Azure Machine Learning studio,Link to Documentation


In [7]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSET',…

In [8]:
# run.wait_for_completion(show_output=True)

Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

### Download model pickle file and load model in-memory

In [9]:
# retrieve model for visualization and deployment
from azureml.core.model import Model

import joblib
# from joblib import dump, load

# import pickle

run.download_file('original_model.pkl')
original_model = joblib.load('original_model.pkl')

# Alternate method with pickle class
# s = pickle.dumps(clf)
# original_model = pickle.loads('original_model.pkl')

# Load test dataset
run.download_file('x_test_ibm.pkl')

x_test = joblib.load('x_test_ibm.pkl')
# x_test = pickle.loads('x_test_ibm.pkl')

# joblib
# https://joblib.readthedocs.io/en/latest/installing.html
# https://joblib.readthedocs.io/en/latest/generated/joblib.load.html    


ImportError: cannot import name 'LatentDirichletAllocation'

### Make Predictions

In [None]:
# original_model.predict(X[0:1])

## Make Predictions and calculate Accuracy metric

In [70]:
from sklearn.metrics import accuracy_score

# Make Multiple Predictions
# y_predictions = model.predict(x_test)

# print('Accuracy:')
# accuracy_score(y_test, y_predictions)



## Confusion Matrix

In [71]:
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

# cm = confusion_matrix(y_test, y_predictions)

# print(cm)

# Show confusion matrix in a separate window
# plt.matshow(cm)
# plt.title('Confusion matrix')
# plt.colorbar()
# plt.ylabel('True label')
# plt.xlabel('Predicted label')
# plt.show()

In [72]:
# One Prediction
# instance_num = 6
# Get the prediction for the first member of the test set and explain why model made that prediction
# prediction_value = model.predict(x_test)[instance_num]

# print("One Prediction: ")
# print(prediction_value)

# print(y_predictions[:20])

# x_test.head(20)

In [73]:
# y_test.head(5)