<img src="http://yacineyakoubi-blog.com/wp-content/uploads/2018/11/mlworkbench_icon-300x300.png"/>

# Working with Azure ML Service Workspace
Azure Machine Learning provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models. 

<a href="https://github.com/Azure/MachineLearningNotebooks">This repository</a> contains example notebooks demonstrating the Azure Machine Learning Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.

<img src="https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/concept-azure-machine-learning-architecture/workflow.png"/>

In this Notebook, you will learn how to train models in a managed experiment.

## Library imports & configurations

In [None]:
# AZURE imports:
from azureml.core import Experiment, Workspace, Run
import azureml.core
from tqdm import tqdm

# SKLEARN imports:
from sklearn.datasets import load_boston
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Other imports:
import os
import json
import numpy as np
import pandas as pd

Check versions:

In [None]:
# Check core SDK version number
print("This notebook was created using SDK version 1.0.62,\n\
you are currently running version", azureml.core.VERSION)

<br>
Next, get Azure ML Service Workspace configuration from a json-file. This avoids hard-coding and makes the notebook more generic.

In [None]:
# Change working directory:
os.chdir('path')

# Load conf-file:
with open('ws_conf.json', 'r') as f:
    conf = json.load(f)

<br>
When the configuration has been loaded, we can proceed to connect to the ML Workspace (this will pop-up a new interactive window for login):

In [None]:
ws = Workspace.get(name = conf['ws_name'], 
                   subscription_id = conf['subscription_id'], 
                   resource_group = conf['resource_group'])

In [None]:
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

<br>

### Example data

Load the Boston houses exampe dataset from `sklearn` inbuilt datasets:

In [None]:
boston = load_boston()
X = boston.data
y = boston.target
feature_names = boston.feature_names

print('Numeber of rows in data:',X.shape[0],'\n')
print('Feature names:',feature_names,sep='\n')

<br>
Get statistical summary of the data:

In [None]:
df = pd.DataFrame(X, columns = feature_names)
df.describe().round(2)

<br>
Next, split data into `train` and `test`:

In [None]:
X_train, X_test, y_train, y_test = \
train_test_split(X, y,
                 test_size = 0.197,
                 random_state = 0)

data = {"train": {"X": X_train, "y": y_train},
        "test": {"X": X_test, "y": y_test}}

# Get data sizes:
{k : v['X'].shape[0] for k,v in data.items()}

<br>

### Prepare experiment
Here we'll define any objects that will be iterated over in a run of the experiment.

As an example, create a dictionary that contains several models:

In [None]:
estimators = {
    'linear_model' : Ridge(alpha = 0.2),
    'Random_Forest' : RandomForestRegressor(n_estimators = 10, random_state = 123),
    'kNN' : KNeighborsRegressor(n_neighbors = 5, leaf_size = 10, p = 1)
}
estimators

<br>

### Run experiment

First, create new Experiment or connect to existing one in your workspace: 

In [None]:
experiment = Experiment(workspace = ws, 
                        name = 'logging-api-test')
experiment

In the above table, there are is a link to access the experiment on Azure portal.

Next, start a new run in the experiment:

In [None]:
# start logging for the run
run = experiment.start_logging()

# access the run id for use later
run_id = run.id

In [None]:
run

Now that the run is active, iterate over models and log their performance. 

Also, save the model objects for later use:

In [None]:
for algo, model in estimators.items():
    
    # Fit model:
    model.fit(data["train"]["X"], 
              data["train"]["y"])
    
    # Make predictions for testing data:
    preds = model.predict(data["test"]["X"])
    
    # Logging:
    mse = mean_squared_error(preds, data["test"]["y"])
    print(algo,', MSE: ',round(mse,2),'\n',sep='')
    run.log('algorithm', algo)
    run.log('mse', mse)
    
    # Save the model to the outputs directory for capture:
    model_file_name = 'outputs/'+algo+'.pkl'

    joblib.dump(value = model, filename = model_file_name)

    # upload the model file explicitly into artifacts :
    run.upload_file(name = model_file_name, 
                    path_or_stream = model_file_name)


In [None]:
# End run:
run.complete()

We can check the status of the current run:

In [None]:
runs = list(experiment.get_runs())
[x for x in runs if x.id == run_id]

<br>
Now you can go to the Azure portal and view the run results.

<br>

<img src="https://cdn.thenewstack.io/media/2018/10/5ca8f804-az-ml-4-1024x393.png"/>