<a href="https://colab.research.google.com/github/alfeuduran/AzureMachineLearningTests/blob/master/Azure_Fashion_Mnist_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Running your code on Azure Virtual Machine using Azure SDK

With the availability of manifold cloud computing resources the work of training your model on cloud has become very easy. 

Jupyter Notebooks provide an easy interface for writing the code and see how it is performing immediately, however for the purpose of deployment the `.py` files are better as they run faster.

And so in this tutorial we will see how we can combine the flexibility and ease of working on Jupyter Notebook, along with the speed of `.py` files, on the Microsoft Azure virtual machine with the help of **Azure machine Learning SDK**.

### Pre-requisite

* The tutorial assumes that you already have a virtual machine instance ready and running. In case, it is not so then please refer to this [step](https://github.com/amita-kapoor/ailabs/blob/master/Azure-Tutorial.md#creating-a-virtual-machine-for-deep-learning).

* On your local machine install Azure machine learning SDK (_We recommend using an Anaconda environment_) using pip on CLI:


  ` pip install azureml-sdk[notebooks]`
  
### Step 1: Verify that your local machine is properly configured

Verify that your local machine is properly configured. Open a command terminal on your local machine and run  `jupyter notebook`. The command would open the Jupyter in a browser. 

![pic1](pic1.png)

 Now create a new notebook. (or open this notebook `Using_Azure_SDK.ipynb`)
 
 The first step we vireify if Azure SDK is correctly installed. To do it we import it and print its version.

In [0]:
import azureml.core
print("Azure ML SDK Version: ",azureml.core.VERSION)

Azure ML SDK Version:  1.0.23


### Step 2: Connect to your virtual machine on Azure

We will use the `Workspace` class of Azure ML module to connect to the virtual machine. Please remember that once you run this cell a popoup window will open and you will need to provide your microsoft account username and password. 

To ensure you connect with the right virtual machine from the Azure dashboard note down:

* subscription_id
* Resource_group (if you have created one)

![pic2](pic2.png)

To create the workspace we need a unique name, we set the `create_resource_group` to `True` so that in case it is not created, the program will create it. Only few US locations support AzureML at present, so choose the location `eastus2`. 

In [0]:
from azureml.core import Workspace
ws = Workspace.create(name='AiLabs2019ver3',
                      subscription_id='11d739b0-c8e5-4754-836a-9488de187dfe',
                      resource_group='AILabs2_2019',
                      create_resource_group=True,
                      location='eastus2' # supports only few Us locations for now
                     )

Deploying KeyVault with name ailabs20keyvaultkbcwpjqy.
Deploying StorageAccount with name ailabs20storagepajlhxfp.
Deploying AppInsights with name ailabs20insightsibmeguhx.
Deployed AppInsights with name ailabs20insightsibmeguhx.
Deploying ContainerRegistry with name ailabs20acrfcrzzlxk.
Deployed ContainerRegistry with name ailabs20acrfcrzzlxk.
Deployed KeyVault with name ailabs20keyvaultkbcwpjqy.
Deployed StorageAccount with name ailabs20storagepajlhxfp.
Deploying Workspace with name AiLabs2019ver3.
Deployed Workspace with name AiLabs2019ver3.


Now that workspace is created you can save it for future refrence. 

In [0]:
# Let us save the configuration file for future
ws.write_config(file_name="ws_config.json")

In case you want to work on the same VM (virtual machine) you can load the config file using:

`ws = Workspace.from_config()`

Let us see the details of our workspace.


In [0]:
import azureml
from azureml.core import Run
# Check workspace config
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

AiLabs2019ver3	eastus2	AILabs2_2019	eastus2


Now each code that you run, runs as an experiment, so we need to provide it a name, and start an new experiment on the virtual machine. 

In [0]:
experiment_name = 'mnist_azure'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
print(exp)

Experiment(Name: mnist_azure,
Workspace: AiLabs2019ver3)


### Step 3: Create Compute Target

To run the experiment we will need to specify the environment we want our code to run. Since our machine is on GPU we select the compute cluster name `gpucluster`, and also decide maximum and minimum number of nodes. While making the virtual machine we had chosen [Standard_NC6](https://github.com/amita-kapoor/ailabs/blob/master/images/vm5.png) machine so we set the compute  SKU to `STANDARD_NC6`. If instead of GPU, you selected a CPU machine you should set "AML_COMPUTE_CLUSTER_SKU" to `STANDARD_D2_V2`.

In [0]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "gpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 2)

# This example uses GPU VM. For using CPU VM, set SKU to STANDARD_D2_V2
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_NC6")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

     # For a more detailed view of current AmlCompute status, use the 'status' property    
    print(compute_target.status.serialize())

creating a new compute target...
Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'targetNodeCount': 0, 'provisioningState': 'Succeeded', 'creationTime': '2019-04-11T12:39:52.816882+00:00', 'scaleSettings': {'maxNodeCount': 2, 'nodeIdleTimeBeforeScaleDown': 'PT120S', 'minNodeCount': 0}, 'nodeStateCounts': {'idleNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'preemptedNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0}, 'modifiedTime': '2019-04-11T12:40:09.042870+00:00', 'vmSize': 'STANDARD_NC6', 'provisioningStateTransitionTime': None, 'currentNodeCount': 0, 'vmPriority': 'Dedicated', 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-04-11T12:40:08.197000+00:00', 'errors': None}


### Step 4: Building your model.
Now that the machine is set, you need to define the folder where your source code of model training will be. 


In [0]:
import os
script_folder = './keras-mnist'
os.makedirs(script_folder, exist_ok=True)

ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

AzureBlob ailabs20storagepajlhxfp azureml-blobstore-ef2dd6b5-05c5-4cf3-a049-d8afbe86a571


In the folder you specified above using any text editor of your choice create a Python file, we have name our `train.py`. The file contains:
* The import for all necessary modules for building the model and reading the data.
* Also additionally import `azureml` module. 
    ```
    import azureml
    from azureml.core import Workspace, Run
    ```
* The `train.py` reads in the data, pre-process it, define the model.
* Then we instantiate the AzureML Run object to submit it to the VM, using: `run = Run.get_submitted_run()`
* Now, as you would normally do, compile the model and train it on training data. 
* Finally, save the model so that you can use it in future.

Below you can see our `train.py` code: 

In [0]:
%%writefile $script_folder/train.py

import numpy as np
from keras.datasets import mnist
from keras.models import Sequential

from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Dense, Dropout
from keras import utils, losses, optimizers
from keras import backend as K

import azureml
from azureml.core import Workspace, Run

# Define Hyper Parameters of the model:
num_classes = 10
batch_size = 128
epochs = 5 #20 to reduce time

# input image dimensions
img_rows, img_cols = 28, 28

#data for train and testing
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# One hot encode labels
y_train = utils.to_categorical(y_train,num_classes)
y_test = utils.to_categorical(y_test,num_classes)
num_classes = y_test.shape[1]


# Create model
model = Sequential()
model.add(Convolution2D(filters=32, kernel_size=3, padding='same', activation='relu', input_shape=input_shape))
model.add(Convolution2D(filters=64, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.4))
model.add(Convolution2D(filters=128, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.4))
model.add(Convolution2D(filters=256, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Convolution2D(filters=256, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))


# get hold of the current run
run = Run.get_submitted_run()

print('Train a deep learning model')
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

history = model.fit(x_train, y_train, validation_data=(x_test, y_test), 
                    epochs=epochs, batch_size=batch_size, verbose=2)


#evaluate the model on the test data
print('Predict the test set')
score = model.evaluate(x_test, y_test, verbose=0)
print('Test Loss: ', score[0])
print('Test Accuracy: ', score[1])

# calculate accuracy on the prediction
print('Accuracy is', score[1])

run.log('accuracy', np.float(score[1]))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
model.save('outputs/model.h5')

Overwriting ./keras-mnist/train.py


### Step 5: Run the model on cloud compute machine

Now that all the basic steps are in place. We are ready to submit our code to the VM for execution.

In [0]:
from azureml.train.estimator import Estimator


est = Estimator(source_directory=script_folder,
                script_params=None,
                compute_target=compute_target,
                entry_script='train.py',
                conda_packages=['keras', 'scikit-learn'])

In [0]:
run = exp.submit(config=est)

In [0]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'send_telemetry': False, 'childWidgetDisplay': 'popup', 'sdk_version': '1.0.23…

In [0]:
run.wait_for_completion(show_output=True) # specify True for a verbose log

RunId: mnist_azure_1554986419_350ece78

Streaming azureml-logs/20_image_build_log.txt

2019/04/11 12:40:30 Using acb_vol_14ac6d7c-ddb3-4a75-ade9-1794e11ee1f8 as the home volume
2019/04/11 12:40:30 Creating Docker network: acb_default_network, driver: 'bridge'
2019/04/11 12:40:30 Successfully set up Docker network: acb_default_network
2019/04/11 12:40:30 Setting up Docker configuration...
2019/04/11 12:40:31 Successfully set up Docker configuration
2019/04/11 12:40:31 Logging in to registry: ailabs20acrfcrzzlxk.azurecr.io
2019/04/11 12:40:32 Successfully logged into ailabs20acrfcrzzlxk.azurecr.io
2019/04/11 12:40:32 Executing step ID: acb_step_0. Working directory: '', Network: 'acb_default_network'
2019/04/11 12:40:32 Obtaining source code and scanning for dependencies...
2019/04/11 12:40:33 Successfully obtained source code and scanned for dependencies
2019/04/11 12:40:33 Launching container with name: acb_step_0
Sending build context to Docker daemon  45.06kB

Step 1/14 : FROM mcr.mi


grpcio-1.14.1        | 1.0 MB    |            |   0% [0m[91m
grpcio-1.14.1        | 1.0 MB    | ########   |  80% [0m[91m
grpcio-1.14.1        | 1.0 MB    | ########## | 100% [0m[91m

openssl-1.0.2r       | 3.2 MB    |            |   0% [0m[91m
openssl-1.0.2r       | 3.2 MB    | #######6   |  76% [0m[91m
openssl-1.0.2r       | 3.2 MB    | #########5 |  96% [0m[91m
openssl-1.0.2r       | 3.2 MB    | ########## | 100% [0m[91m

absl-py-0.7.0        | 156 KB    |            |   0% [0m[91m
absl-py-0.7.0        | 156 KB    | ########## | 100% [0m[91m

hdf5-1.10.4          | 5.3 MB    |            |   0% [0m[91m
hdf5-1.10.4          | 5.3 MB    | #####7     |  57% [0m[91m
hdf5-1.10.4          | 5.3 MB    | #######8   |  79% [0m[91m
hdf5-1.10.4          | 5.3 MB    | #########7 |  97% [0m[91m
hdf5-1.10.4          | 5.3 MB    | ########## | 100% [0m[91m

tensorflow-1.13.1    | 4 KB      |            |   0% [0m[91m
tensorflow-1.13.1    | 4 KB      | ########## | 10

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting azureml-defaults (from -r /azureml-setup/condaenv.531zq8t9.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/4b/db/dd3480414f0ff182154d9b78053159815b6ae82a76336b0da5362952826e/azureml_defaults-1.0.23-py2.py3-none-any.whl
Collecting azureml-core==1.0.23.* (from azureml-defaults->-r /azureml-setup/condaenv.531zq8t9.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/c0/44/422e114516ff2b4bafbaa3e53cd19e0bbbe36bf594981e5817610430e3a7/azureml_core-1.0.23-py2.py3-none-any.whl (811kB)
Collecting applicationinsights>=0.11.7 (from azureml-defaults->-r /azureml-setup/condaenv.531zq8t9.requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/de/bc/8e738cc3b74551c1a63889ff32c4456c22246ec89cfae3bf6a0a126a29c8/applicationinsights-0.11.8-py2.py3-none-any.whl (58kB)
Collecting pathspe

  Building wheel for pathspec (setup.py): started
  Building wheel for pathspec (setup.py): finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/45/cb/7e/ce6e6062c69446e39e328170524ca8213498bc66a74c6a210b
  Building wheel for pycparser (setup.py): started
  Building wheel for pycparser (setup.py): finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Successfully built pathspec pycparser
Installing collected packages: pathspec, azure-common, contextlib2, python-dateutil, idna, chardet, urllib3, requests, PyJWT, asn1crypto, pycparser, cffi, cryptography, adal, oauthlib, requests-oauthlib, isodate, msrest, msrestazure, azure-mgmt-containerregistry, azure-mgmt-resource, jeepney, SecretStorage, azure-mgmt-storage, pytz, jsonpickle, pyopenssl, backports.weakref, backports.tempfile, azure-nspkg, azure-mgmt-nspkg, azure-mgmt-keyvault, docker-pycreds, websocket-client, docker, pyasn1, ndg-htt

OMP: Info #250: KMP_AFFINITY: pid 99 tid 133 thread 1 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 99 tid 151 thread 2 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 99 tid 152 thread 3 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 99 tid 153 thread 4 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 99 tid 154 thread 5 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 99 tid 155 thread 6 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 99 tid 134 thread 7 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 99 tid 157 thread 9 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 99 tid 158 thread 10 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 99 tid 160 thread 12 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 99 tid 159 thread 11 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 99 tid 156 thread 8 bound to OS proc set 2
 - 135s - loss: 0.2325 - acc: 0.9224 - val_loss: 0.0392 - val_acc: 0.9864

{'endTimeUtc': '2019-04-11T13:02:41.967398Z',
 'logFiles': {'azureml-logs/20_image_build_log.txt': 'https://ailabs20storagepajlhxfp.blob.core.windows.net/azureml/ExperimentRun/dcid.mnist_azure_1554986419_350ece78/azureml-logs/20_image_build_log.txt?sv=2018-03-28&sr=b&sig=0%2B9FFc1P4kVkA40KXcefh3LgQ4iZyFnJzOSRGubVw8c%3D&st=2019-04-11T12%3A52%3A43Z&se=2019-04-11T21%3A02%3A43Z&sp=r',
  'azureml-logs/55_batchai_execution.txt': 'https://ailabs20storagepajlhxfp.blob.core.windows.net/azureml/ExperimentRun/dcid.mnist_azure_1554986419_350ece78/azureml-logs/55_batchai_execution.txt?sv=2018-03-28&sr=b&sig=4gO3IPhMf03R8AuPXxzIsoZCxq883VTjZjOsai6hVmA%3D&st=2019-04-11T12%3A52%3A43Z&se=2019-04-11T21%3A02%3A43Z&sp=r',
  'azureml-logs/60_control_log.txt': 'https://ailabs20storagepajlhxfp.blob.core.windows.net/azureml/ExperimentRun/dcid.mnist_azure_1554986419_350ece78/azureml-logs/60_control_log.txt?sv=2018-03-28&sr=b&sig=aQCW0o8qZ9wQfqNozFJkUKxcGCBYS%2Ffk4rWN3Msl9rI%3D&st=2019-04-11T12%3A52%3A43Z&se=20

### Step 6: Evaluate the model
Finally, the model is ready let us evaluate it.

In [0]:
print(run.get_metrics())
print(run.get_file_names())

{'accuracy': 0.9931}
['azureml-logs/20_image_build_log.txt', 'azureml-logs/55_batchai_execution.txt', 'azureml-logs/60_control_log.txt', 'azureml-logs/80_driver_log.txt', 'azureml-logs/azureml.log', 'outputs/model.h5']


In [0]:
# Don't forget to delete all resources in the end
ws.delete(delete_dependent_resources=True)

### Next Steps

Now that the model is trained, we will need to deploy it. In the next tutorial we will cover steps on how to deploy the saved model.