# Run an experiment on Azure Machine Learning Service
This notebook demonstrates how to train a model using Azure Machine Learning Service.
We will be working on a model that can predict which species an iris flower is based on the size of different parts of the flower. This notebook is based on the open source [iris flower dataset](https://archive.ics.uci.edu/ml/datasets/Iris).

## Define the model
In this step we're going to define a model that we want to train.
The input for the model is a vector with four features specifying the properties of each iris flower:
 
 - Sepal length
 - Sepal width
 - Petal length
 - Petal width
 
In order for the model to work we need to define its input as an `input_variable`. This variable should have the same size as the number of features that we want to use for making a prediction. In this case it should be 4, because we have 4 different features in our dataset.

The output layer of the model has three neurons, one for each species of flowers that we can predict.

In [1]:
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, sigmoid

model = Sequential([
    Dense(4, activation=sigmoid),
    Dense(3, activation=log_softmax)
])

features = input_variable(4)
z = model(features)

# Train the model and record it in the workspace
After we've created the model we can train it. We'll train the model and track it using the tracking logic provided by the Azure Machine Learning Environment.

## Loading the data
Before we can actually train the model, we need to load the data from disk. We will use pandas for this.
Pandas is widely used python library for working with data. It contains functions to load and process data 
as well as a large amount functions to perform statistical operations.

In [2]:
import pandas as pd
import numpy as np

df_source = pd.read_csv('iris.csv', 
    names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'], 
    index_col=False)

X = df_source.iloc[:, :4].values
y = df_source['species'].values

Our model doesn't take strings as values. It needs floating point values to do its job. So we need to encode the strings into a floating point representation. We can do this by mapping the species names to a one-hot encoded version of the species.

In [3]:
label_mapping = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica': 2
}

def one_hot(index, length):
    result = np.zeros(length)
    result[index] = 1.
    
    return result
    
y = [one_hot(label_mapping[v], 3) for v in y]

CNTK is configured to use 32-bit floats by default. Right the features are stored as 64-bit floats and the labels are stored as integers. In order to help CNTK make sense of this, we will have to convert our data to 32-bit floats.

One of the challenges with machine learning is the fact that your model will try to memorize every bit of data it saw. This is called overfitting and bad for your model as it is no longer able to correctly predict outcome correctly for samples it didn't see before. We want our model to learn a set of rules that predict the correct class of flower. 

In order for us to detect overfitting we need to split the dataset into a training and test set. This is done using a utility function found in the scikit-learn python package which is included with your standard anaconda installation.

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, stratify=y)

## Defining the target and loss
Let's define a target for our model and a loss function. The loss function measures the distance between the actual and predicted value. The loss is later used by the learner to optimize the parameters in the model.

In [5]:
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
from cntk.train.trainer import Trainer

label = input_variable(3)

loss = cross_entropy_with_softmax(z, label)
error_rate = classification_error(z, label)

learner = sgd(z.parameters, 0.001)
trainer = Trainer(z, (loss, error_rate), [learner])

# Train the model
We can train the model as normal. In order to track information about the model we need to setup a workspace and experiment in the Azure Machine Learning workspace that we've configured in the `config.json` in the same folder as this notebook. Please refer to chapter 7, Deploying models to production, to learn more on how to create this file.

In [6]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
experiment = Experiment(name='classify-flowers', workspace=ws)

Found the config file in: D:\projects\cntk-book\ch7\azure-ml-service\config.json


We can start tracking methods by calling the `start_logging` method on the experiment. This starts a new run instance that has all the tracking logic that we need for our experiment. We can use `log` to track metrics. We can also use `upload_file` to store outputs generated by our run. And finally we can register uploaded files as models in the model registry so we can deploy them to production.

In [7]:
import os 
from cntk import ModelFormat

os.makedirs('outputs', exist_ok=True)

with experiment.start_logging() as run:
    for _ in range(10):
        trainer.train_minibatch({ features: X_train, label: y_train })

        run.log('average_loss', trainer.previous_minibatch_loss_average)
        run.log('average_metric', trainer.previous_minibatch_evaluation_average)
        
    test_metric = trainer.test_minibatch( {features: X_test, label: y_test })
    
    run.log('test_metric', test_metric)
    
    z.save('outputs/model.onnx', ModelFormat.ONNX)
    run.upload_file('model.onnx', 'outputs/model.onnx')
    
    stored_model = run.register_model(model_name='classify_flowers', model_path='model.onnx')

  (sample.dtype, var.uid, str(var.dtype)))
  'training.' % var.uid)
  (sample.dtype, var.uid, str(var.dtype)))


## Deploy the model to production
Now that we have a trained model we can deploy it to production. We need to setup an image for this and a deploy the image as a webservice to the cloud. Let's start with the image first.

In [21]:
from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(
    execution_script="score.py", 
    runtime="python", 
    conda_file="conda_env.yml")

Once we have the configuration for the image we can invoke deploy_from_model with a deployment configuration to deploy the model as a Azure container instance to the cloud.

In [23]:
from azureml.core.webservice import AciWebservice, Webservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = Webservice.deploy_from_model(workspace=ws,
                                       name='classify-flowers-svc',
                                       deployment_config=aciconfig,
                                       models=[stored_model],
                                       image_config=image_config)

Creating image
Image creation operation finished for image classify-flowers-svc-1:1, operation "Succeeded"
Creating service
