# IBM Watson Studio Deep Learning 
________
## Training Runs and Experiments with Notebooks and the `Watson Machine Learning python client`
Video Tutorial Series for Deep Learning 
_________
This notebook first shows you how to perform an experiment with two training run for your deep learning experiments, randomly sampling from a hyperparameter space. Then, we'll use Hyperparameter optimization methods based on `RBFOpt` to find a solution more quickly. 

## [I. Configure Watson Machine Learning](#wml)
* ### [Credentials and Authentication](#creds)

## [II. Write The Code](#code)
* ### [Training Definitions](#train_def)
* ### [Experiment Definition](#experiment)

## [III. Kick Off the Job](#kick)
* ### [Monitoring](#monitor)
________


<a name="cos"></a>
### Review the COS Data 

We see that the training data is in our bucket from earlier. We can load the data easily.

<a name="wml"></a>
## I. Configure Watson Machine Learning
We'll need to configure Watson Machine Learning in order to train, save, and deploy the model and experiment definition. This does not need to be strictly completed first, but we prefer to set up everything we can ahead of time :) 

* Associate Watson Machine Learning with your Project
* Update to the most recent version of the client
* Authenticate to the service


<a name="creds"></a>
**Associate the service** with your project and **add your credentials** now. 

In [4]:
# get your credentials
wml_credentials = {
  "url": "https://ibm-watson-ml.mybluemix.net",
  "username": "***",
  "password": "***",
  "instance_id": "***"
}


**Update the client.** The package is available via `pip` on `PyPI`. 

The most recent docs can be found here: http://wml-api-pyclient.mybluemix.net/

In [2]:
!pip install --upgrade --quiet watson-machine-learning-client

#### Import `watson-machine-learning-client` and authenticate to service instance

In [5]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient(wml_credentials)

# print the version
print(client.version)

1.0.109


<a name="code"></a>
## II. Write Your Code and Define Your Experiment

### PART ONE: Multiple Training Runs, Random Hypers
Fetch some example experiment code and configuration files. 
[IBM Watson Studio Deep Learning Coding Guidelines](https://dataplatform.ibm.com/docs/content/analyze-data/ml_dlaas_code_guidelines.html#reading-hyperparameters)

* Download the code from the GH repo. This code contains `py` files with the model architecture and helper functions we'll use for metrics
* Store your training definition

In [6]:
# get the experiment zip and write it to local
with open('experiment-random.zip', 'wb') as f:
    f.write(project.get_file('experiment.zip').read())

In [7]:
# your COS credentials
cos_credentials = {
  "apikey": "***",
  "cos_hmac_keys": {
    "access_key_id": "***",
    "secret_access_key": "***"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/7d7bba8d3af690913ac4403733b01605:5cbade09-286a-47de-ab17-7fc51ba1a373::",
  "iam_apikey_name": "auto-generated-apikey-dd670e5e-3668-4fb3-804e-b21fe014b81e",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "***",
  "resource_instance_id": "***"
}


api_key = cos_credentials['apikey']
service_instance_id = cos_credentials['resource_instance_id']
auth_endpoint = 'https://iam.bluemix.net/oidc/token'
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'

# our bucket names
buckets = ['fashion-mnist-training-data-massachi-1', 'fashion-mnist-results-data-massachi-1']

In [8]:
import ibm_boto3
from ibm_botocore.client import Config
import os
import json
import warnings
import time
cos = ibm_boto3.resource('s3',
                         ibm_api_key_id=api_key,
                         ibm_service_instance_id=service_instance_id,
                         ibm_auth_endpoint=auth_endpoint,
                         config=Config(signature_version='oauth'),
                         endpoint_url=service_endpoint)

<a name="train_def"></a>
### Training definitions

Training runs are the organizing principle for using deep learning functions in IBM Watson Machine Learning. A typical scenario might consist of dozens to hundreds of training runs. Each run is defined individually and consists of the following parts: the neural network defined by using one of the supported deep learning frameworks and the configuration for how to run your training including the number of GPUs and location of the IBM Cloud Object Storage that contains your data set.

More **docs on specifying the training definiton** here: https://dataplatform.ibm.com/docs/content/analyze-data/ml_dlaas_working_with_new_models.html?context=analytics

**Coding guidelines** for DL Experiments: https://dataplatform.ibm.com/docs/content/analyze-data/ml_dlaas_code_guidelines.html

In [9]:
# the model metadata is used by the Watson Machine Learning Client to run your training to your specifications
model_definition_1_metadata = {
            client.repository.DefinitionMetaNames.NAME: "FASHION-MNIST-TEST",
            client.repository.DefinitionMetaNames.FRAMEWORK_NAME: "tensorflow",
            client.repository.DefinitionMetaNames.FRAMEWORK_VERSION: "1.5",
            client.repository.DefinitionMetaNames.RUNTIME_NAME: "python",
            client.repository.DefinitionMetaNames.RUNTIME_VERSION: "3.5",
            client.repository.DefinitionMetaNames.EXECUTION_COMMAND: "python experiment.py"
            }

**Store the training definition**

We store the training definition for later use when we trigger training.

In [11]:
definition_1_details = client.repository.store_definition('experiment-random.zip', model_definition_1_metadata)

definition_1_url = client.repository.get_definition_url(definition_1_details)
definition_1_uid = client.repository.get_definition_uid(definition_1_details)
print(definition_1_url)

https://ibm-watson-ml.mybluemix.net/v3/ml_assets/training_definitions/226f65cd-dc0a-4af7-8215-f30a9f300320


**List your training definitions**

You can use the client to inspect training definitions that you've already saved



In [12]:
client.repository.list_definitions()

------------------------------------  ------------------------------------  ------------------------  ----------
GUID                                  NAME                                  CREATED                   FRAMEWORK
125accbc-995d-4548-b921-8590c18a1ead  UdacityCollabRecSystem-Collaborators  2018-04-26T17:04:50.543Z  mllib
6c2d2cd5-f35e-4569-9379-8f15d85e7252  FASHION-MNIST                         2018-05-08T22:34:05.407Z  tensorflow
114ecf90-ef33-4d49-a46c-e844d1a16d0c  new_experiment_builder                2018-05-08T22:44:30.363Z  tensorflow
a1622bf6-36d9-4189-a3ed-5636d490c9fd  new_experiment                        2018-05-08T22:45:35.467Z  tensorflow
4d8fa5e5-6494-48af-8aae-7db3a49e10c2  EADef_NOHPO                           2018-05-08T22:50:59.476Z  tensorflow
74bea46a-88a3-41e2-8b29-215221445336  MASSACHI_NOPO                         2018-05-08T23:09:30.145Z  tensorflow
5a77c58c-8666-4daa-abeb-245c7acaf665  NEW_DEF                               2018-05-08T23:14:03.448Z  

<a name="experiment"></a>
### Experiment Definition
Define and save the experiment. 

There are a few configuration parameters we need to set. 

In [13]:
# show the experiment params
client.repository.ExperimentMetaNames.show()

--------------------------  ----  --------
META_PROP NAME              TYPE  REQUIRED
NAME                        str   Y
TAGS                        list  N
DESCRIPTION                 str   N
AUTHOR_NAME                 str   N
EVALUATION_METHOD           str   N
EVALUATION_METRICS          list  N
TRAINING_REFERENCES         list  Y
TRAINING_DATA_REFERENCE     dict  Y
TRAINING_RESULTS_REFERENCE  dict  Y
--------------------------  ----  --------


#### Experiment configuration dictionary
Create experiment that will train models based on previously stored definitions.


First, we have `TRAINING_DATA_REFERENCE` which specifies the location of traininng data:

In [14]:
TRAINING_DATA_REFERENCE = {
                            "connection": {
                                "endpoint_url": service_endpoint,
                                "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                                "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
                            },
                            "source": {
                                "bucket": buckets[0],
                            },
                            "type": "s3"
}

Next, where to store the results. 

`TRAINING_RESULTS_REFERENCE` is the location of training results, including the logs, trained model, and any other outputs.

In [15]:
TRAINING_RESULTS_REFERENCE = {
                                "connection": {
                                    "endpoint_url": service_endpoint,
                                    "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                                    "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
                                },
                                "target": {
                                    "bucket": buckets[1],
                                },
                                "type": "s3"
}

**Configure your experiment.**

`TRAINING_REFERENCES` links previously stored training definitions and provides information about `compute_configuration` that will be used to run the training.

In [16]:
experiment_metadata = {
            client.repository.ExperimentMetaNames.NAME: "FASHION-MNIST-EXPERIMENT-RANDOM",
            client.repository.ExperimentMetaNames.DESCRIPTION: "Finding the best model for Fashion MNIST",
            client.repository.ExperimentMetaNames.EVALUATION_METHOD: "multiclass",
            client.repository.ExperimentMetaNames.EVALUATION_METRICS: ["val_acc"],
            client.repository.ExperimentMetaNames.TRAINING_DATA_REFERENCE: TRAINING_DATA_REFERENCE,
            client.repository.ExperimentMetaNames.TRAINING_RESULTS_REFERENCE: TRAINING_RESULTS_REFERENCE,
            client.repository.ExperimentMetaNames.TRAINING_REFERENCES: [
                        {
                            "name": "FASHION-MNIST-1",
                            "training_definition_url": definition_1_url,
                            "compute_configuration": {"name": "k80"}
                            
                            
                        },
                        {
                            "name": "FASHION-MNIST-2",
                            "training_definition_url": definition_1_url,
                            "compute_configuration": {"name": "k80"}
                            
                            
                        }
            
            ]
        }

Now that we've defined our experiment, we can **save the experiment** in our Watson Machine Learning Repository. 

In [17]:
experiment_details = client.repository.store_experiment(meta_props=experiment_metadata)

experiment_uid = client.repository.get_experiment_uid(experiment_details)
print(experiment_uid)

a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3


In [18]:
# list all of our experiments
client.repository.list_experiments()

------------------------------------  -------------------------------  ------------------------
GUID                                  NAME                             CREATED
3575183e-b308-4445-b5e9-48679a90f45e  MNIST experiment                 2018-04-27T12:40:33.938Z
3da79b55-3aa8-4658-838e-254cf4ecab06  APCODE                           2018-05-09T23:35:28.077Z
93762f51-ef85-4dd4-b511-6c572be25d6c  FASHION-MNIST-EXPERIMENT-RANDOM  2018-05-10T00:03:12.213Z
a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3  FASHION-MNIST-EXPERIMENT-RANDOM  2018-05-10T00:34:06.034Z
------------------------------------  -------------------------------  ------------------------


You can **update an experiment definition** by calling the below method.

In [19]:
# update an experiment if you'd like
updated_experiment_details = client.repository.update_experiment(experiment_uid, experiment_metadata)

You can also **get and delete** information about your experiments

In [20]:
# delete -- not run
# client.repository.delete(experiment_uid)

# get details
details = client.repository.get_experiment_details(experiment_uid)

In [21]:
details

{'entity': {'settings': {'author': {},
   'description': 'Finding the best model for Fashion MNIST',
   'evaluation_definition': {'method': 'multiclass',
    'metrics': [{'name': 'val_acc'}]},
   'name': 'FASHION-MNIST-EXPERIMENT-RANDOM'},
  'training_data_reference': {'connection': {'aws_access_key_id': 'dd670e5e36684fb3804eb21fe014b81e',
    'aws_secret_access_key': 'fdf4299fe4664d83ebfe5f1cbe916086e092673cca95e2db',
    'endpoint_url': 'https://s3-api.us-geo.objectstorage.softlayer.net'},
   'source': {'bucket': 'fashion-mnist-training-data-massachi-1'},
   'type': 's3'},
  'training_references': [{'command': 'python experiment.py',
    'compute_configuration': {'name': 'k80'},
    'name': 'FASHION-MNIST-1',
    'training_definition_url': 'https://ibm-watson-ml.mybluemix.net/v3/ml_assets/training_definitions/226f65cd-dc0a-4af7-8215-f30a9f300320'},
   {'command': 'python experiment.py',
    'compute_configuration': {'name': 'k80'},
    'name': 'FASHION-MNIST-2',
    'training_definit

<a name="kick"></a>
## III. Kick Off the Job
Let's start the training run

In [22]:
# run with async true
experiment_run_details = client.experiments.run(experiment_uid, asynchronous=True)

In [23]:
# let's check the details 
experiment_run_details

{'entity': {'experiment_run_status': {'current_at': '2018-05-10T00:35:06Z',
   'current_iteration': 1,
   'state': 'pending',
   'submitted_at': '2018-05-10T00:35:06Z'},
  'training_statuses': []},
 'experiment': {'evaluation_definition': {'method': 'multiclass',
   'metrics': [{'name': 'val_acc'}]},
  'guid': 'a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3',
  'url': '/v3/experiments/a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3'},
 'metadata': {'created_at': '2018-05-10T00:35:06Z',
  'guid': 'c3b66a9e-42fb-4bf9-9029-b56990faaa8e',
  'modified_at': '2018-05-10T00:35:06Z',
  'url': '/v3/experiments/a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3/runs/c3b66a9e-42fb-4bf9-9029-b56990faaa8e'}}

You can **list your running experiments** and check their state

In [24]:
client.experiments.list_runs()

------------------------------------  ------------------------------------  -------------------------------  ---------  --------------------
GUID (experiment)                     GUID (run)                            NAME (experiment)                STATE      CREATED
3575183e-b308-4445-b5e9-48679a90f45e  03533ec9-2846-4e5a-95ed-d2dd2d004fdd  MNIST experiment                 completed  2018-04-27T12:40:40Z
3da79b55-3aa8-4658-838e-254cf4ecab06  4bd67ea3-faba-45c9-817d-3431dba685a8  APCODE                           completed  2018-05-09T23:35:28Z
93762f51-ef85-4dd4-b511-6c572be25d6c  515f8134-548c-4ec9-9423-3817af68044f  FASHION-MNIST-EXPERIMENT-RANDOM  completed  2018-05-10T00:03:24Z
a5a54d95-25bb-4dec-afaf-b3c8da2ca1f3  c3b66a9e-42fb-4bf9-9029-b56990faaa8e  FASHION-MNIST-EXPERIMENT-RANDOM  pending    2018-05-10T00:35:06Z
------------------------------------  ------------------------------------  -------------------------------  ---------  --------------------


In [28]:
client.experiments.get_status('c3b66a9e-42fb-4bf9-9029-b56990faaa8e')

{'best_results': {'experiment_best_model': {'training_guid': 'training-mghlo4nig',
   'training_reference_name': 'FASHION-MNIST-2',
   'training_url': 'https://ibm-watson-ml.mybluemix.net/v3/ml_assets/training_definitions/226f65cd-dc0a-4af7-8215-f30a9f300320'},
  'training_reference_best_model': [{'training_guid': 'training-27h_TV7ig',
    'training_reference_name': 'FASHION-MNIST-1',
    'training_url': 'https://ibm-watson-ml.mybluemix.net/v3/ml_assets/training_definitions/226f65cd-dc0a-4af7-8215-f30a9f300320'},
   {'training_guid': 'training-mghlo4nig',
    'training_reference_name': 'FASHION-MNIST-2',
    'training_url': 'https://ibm-watson-ml.mybluemix.net/v3/ml_assets/training_definitions/226f65cd-dc0a-4af7-8215-f30a9f300320'}]},
 'current_at': '2018-05-10T00:35:06Z',
 'current_iteration': 1,
 'state': 'completed',
 'submitted_at': '2018-05-10T00:35:06Z'}

In [29]:
# get the experiment run uid
experiment_run_uid = client.experiments.get_run_uid(experiment_run_details)

# get the experiment details
experiment_details = client.experiments.get_details(experiment_uid)


Let's **check the status of our runs in this experiment**

In [30]:
# list all training runs
client.experiments.list_training_runs(experiment_run_uid)


------------------  ---------------  ---------  --------------------  --------------------  --------------------
GUID (training)     NAME             STATE      SUBMITTED             FINISHED              PERFORMANCE
training-27h_TV7ig  FASHION-MNIST-1  completed  2018-05-10T00:35:08Z  2018-05-10T00:38:38Z  test:acc=0.8392
                                                                                            test:loss=0.4454
                                                                                            test:val_acc=0.7992
                                                                                            test:val_loss=2.8896
training-mghlo4nig  FASHION-MNIST-2  completed  2018-05-10T00:35:08Z  2018-05-10T00:38:50Z  test:acc=0.8439
                                                                                            test:loss=0.4355
                                                                                            test:val_acc=0.8347
            