# IX - Keras+TensorFlow Single GPU Training
In the previous notebook we have been using Microsoft's Cognitive Toolkit (CNTK) as our Deep Learning framework. Batch Shipyard though is not limited to CNTK and can be used with any Deep Learning framework. A very popular combination is TensorFlow and Keras. Keras is an easy to use high level API for TensorFlow and Theano, which makes creating artificial neural networks easy. In this notebook we will quickly demonstrate how to train a CNN using Keras on Batch Shipyard. 

* [Setup](#section1)
* [Define our model](#section2)
* [Configure and create pool](#section3)
* [Configure job](#section4)
* [Submit job](#section5)
* [Delete job](#section6)

<a id='section1'></a>

## Setup

Create a simple alias for Batch Shipyard

In [1]:
%alias shipyard SHIPYARD_CONFIGDIR=config python $HOME/batch-shipyard/shipyard.py %l

Check that everything is working

In [2]:
shipyard --version

shipyard.py, version 2.8.0b1


Read in the account information we saved earlier

In [3]:
import json

def read_json(filename):
    with open(filename, 'r') as infile:
        return json.load(infile)
    
account_info = read_json('account_information.json')

STORAGE_ALIAS = account_info['STORAGE_ALIAS']

<a id='section2'></a>
## Define Our Model
The file below contains a simple CNN written in Keras. It will load the CIFAR 10 data and then train the model for a number of epochs and then evaludate it on the test set.

In [4]:
%%writefile cifar10_cnn.py
'''Train a simple deep CNN on the CIFAR10 small images dataset.
'''

from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

batch_size = 32
num_classes = 10
epochs = 20
data_augmentation = True

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for feature-wise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train,
                                     batch_size=batch_size),
                        steps_per_epoch=x_train.shape[0] // batch_size,
                        epochs=epochs,
                        validation_data=(x_test, y_test))

Writing cifar10_cnn.py


<a id='section3'></a>
## Configure  and Create Pool
Here we will simply be using a pool with a single node just as a demonstration. Upon pool creation Batch Shipyard will pull our model file above and place into the blob container **input**.

In [5]:
IMAGE_NAME = "masalvar/keras" # Keras image

In [6]:
INPUT_CONTAINER = 'input'
UPLOAD_DIR = 'dist_upload'

!rm -rf $UPLOAD_DIR
!mkdir -p $UPLOAD_DIR
!mv cifar10_cnn.py $UPLOAD_DIR
!ls -alF $UPLOAD_DIR

total 12
drwxr-xr-x  2 nbuser nbuser 4096 Jun 21 11:07 ./
drwx------ 15 nbuser nbuser 4096 Jun 21 11:07 ../
-rw-r--r--  1 nbuser nbuser 3445 Jun 21 11:07 cifar10_cnn.py


In [7]:
config = {
    "batch_shipyard": {
        "storage_account_settings": STORAGE_ALIAS
    },
    "global_resources": {
        "docker_images": [
            IMAGE_NAME
        ],
        "files": [
            {
                "source": {
                    "path": UPLOAD_DIR
                },
                "destination": {
                    "storage_account_settings": STORAGE_ALIAS,
                    "data_transfer": {
                        "container": INPUT_CONTAINER
                    }
                }
            }
        ]
    }
}

In [8]:
POOL_ID = 'gpupool-keras'

pool = {
    "pool_specification": {
        "id": POOL_ID,
        "vm_size": "STANDARD_NC6",
        "vm_count": {
            "dedicated": 1
        },
        "publisher": "Canonical",
        "offer": "UbuntuServer",
        "sku": "16.04-LTS",
        "ssh": {
            "username": "docker"
        },
        "reboot_on_start_task_failed": False,
        "block_until_all_global_resources_loaded": True,
        "input_data": {
            "azure_storage": [
                {
                    "storage_account_settings": STORAGE_ALIAS,
                    "container": INPUT_CONTAINER,
                    "destination": "$AZ_BATCH_NODE_SHARED_DIR/code"
                }
            ]
        },
        "transfer_files_on_pool_creation": True,
    }
}

In [9]:
import json
import os

def write_json_to_file(json_dict, filename):
    """ Simple function to write JSON dictionaries to files
    """
    with open(filename, 'w') as outfile:
        json.dump(json_dict, outfile)

In [10]:
write_json_to_file(config, os.path.join('config', 'config.json'))
write_json_to_file(pool, os.path.join('config', 'pool.json'))
print(json.dumps(config, indent=4, sort_keys=True))
print(json.dumps(pool, indent=4, sort_keys=True))

{
    "batch_shipyard": {
        "storage_account_settings": "mystorageaccount"
    }, 
    "global_resources": {
        "docker_images": [
            "masalvar/keras"
        ], 
        "files": [
            {
                "destination": {
                    "data_transfer": {
                        "container": "input"
                    }, 
                    "storage_account_settings": "mystorageaccount"
                }, 
                "source": {
                    "path": "dist_upload"
                }
            }
        ]
    }
}
{
    "pool_specification": {
        "block_until_all_global_resources_loaded": true, 
        "id": "gpupool-keras", 
        "input_data": {
            "azure_storage": [
                {
                    "container": "input", 
                    "destination": "$AZ_BATCH_NODE_SHARED_DIR/code", 
                    "storage_account_settings": "mystorageaccount"
                }
            ]
        }, 
        "offer": "U

Allocate the pool, please be patient for this step.

In [11]:
shipyard pool add -y

2017-06-21 11:08:00,217 INFO - creating table: shipyardregistry
2017-06-21 11:08:00,487 INFO - creating container: shipyardremotefs
2017-06-21 11:08:00,710 INFO - creating table: shipyardgr
2017-06-21 11:08:00,755 INFO - creating table: shipyarddht
2017-06-21 11:08:00,800 INFO - creating table: shipyardimages
2017-06-21 11:08:00,846 INFO - creating queue: shipyardgr-batche2914cdbba-gpupool-keras
2017-06-21 11:08:00,957 INFO - creating container: shipyardtor-batche2914cdbba-gpupool-keras
2017-06-21 11:08:01,005 INFO - creating table: shipyardtorrentinfo
2017-06-21 11:08:01,049 INFO - creating container: shipyardrf-batche2914cdbba-gpupool-keras
2017-06-21 11:08:01,094 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardregistry
2017-06-21 11:08:01,159 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardgr
2017-06-21 11:08:01,202 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardperf
2017-06-21 11:08:01,246 DEBUG - clearing table (pk=batche29

<a id='section4'></a>
## Configure Job
As before the dictionary below defines the job we will execute. 

In [12]:
TASK_ID = 'run_cifar10' # This should be changed per task

JOB_ID = 'keras-training-job'

COMMAND = 'bash -c "python -u $AZ_BATCH_NODE_SHARED_DIR/code/cifar10_cnn.py"'

jobs = {
    "job_specifications": [
        {
            "id": JOB_ID,
            "tasks": [
                {
                    "id": TASK_ID,
                    "image": IMAGE_NAME,
                    "remove_container_after_exit": True,
                    "command": COMMAND,
                    "gpu": True,
                }
            ],
        }
    ]
}

Write the jobs configuration to the `jobs.json` file:

In [13]:
write_json_to_file(jobs, os.path.join('config', 'jobs.json'))
print(json.dumps(jobs, indent=4, sort_keys=True))

{
    "job_specifications": [
        {
            "id": "keras-training-job", 
            "tasks": [
                {
                    "command": "bash -c \"python -u $AZ_BATCH_NODE_SHARED_DIR/code/cifar10_cnn.py\"", 
                    "gpu": true, 
                    "id": "run_cifar10", 
                    "image": "masalvar/keras", 
                    "remove_container_after_exit": true
                }
            ]
        }
    ]
}


<a id='section5'></a>

## Submit Job
Check that everything is ok with our pool before we submit our jobs.

In [14]:
shipyard pool listnodes

2017-06-18 12:29:29,592 DEBUG - listing nodes for pool gpupool-keras
2017-06-18 12:29:29,860 INFO - node_id=tvm-1392786932_1-20170618t122009z [state=ComputeNodeState.idle start_task_exit_code=0 scheduling_state=SchedulingState.enabled ip_address=10.0.0.4 vm_size=standard_nc6 dedicated=True total_tasks_run=0 running_tasks_count=0 total_tasks_succeeded=0]


Now that we have confirmed everything is working we can execute our job using the command below. The tail switch at the end will stream stdout from the node.

In [15]:
shipyard jobs add --tail stdout.txt

2017-06-18 12:29:33,183 INFO - Adding job keras-training-job to pool gpupool-keras
2017-06-18 12:29:33,654 INFO - uploading file /tmp/tmpdaw6Ni as u'shipyardtaskrf-keras-training-job/run_cifar10.shipyard.envlist'
2017-06-18 12:29:33,692 DEBUG - submitting 1 tasks (0 -> 0) to job keras-training-job
2017-06-18 12:29:33,989 INFO - submitted all 1 tasks to job keras-training-job
2017-06-18 12:29:33,989 DEBUG - attempting to stream file stdout.txt from job=keras-training-job task=run_cifar10
Downloading data from http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20



If something goes wrong you can run the following command to get the stderr output from the job.

In [14]:
shipyard data stream --filespec $JOB_ID,$TASK_ID,stderr.txt

2017-06-21 11:27:37,349 DEBUG - attempting to stream file stderr.txt from job=keras-training-job task=run_cifar10
Traceback (most recent call last):
  File "/home/nbuser/batch-shipyard/shipyard.py", line 1530, in <module>
    cli()
  File "/home/nbuser/anaconda2_410/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/nbuser/anaconda2_410/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/nbuser/anaconda2_410/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/nbuser/anaconda2_410/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/nbuser/anaconda2_410/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/nbuser/anaconda2_410/l

<a id='section6'></a>

## Delete job

To delete the job use the command below. Just be aware that this will get rid of all the files created by the job and tasks.

In [15]:
shipyard jobs del -y --termtasks --wait

2017-06-21 11:27:57,213 INFO - Deleting job: keras-training-job
2017-06-21 11:27:57,213 DEBUG - disabling job keras-training-job first due to task termination
2017-06-21 11:27:57,454 ERROR - keras-training-job job does not exist


In [16]:
shipyard pool del -y --wait

2017-06-21 11:28:00,363 INFO - Deleting pool: gpupool-keras
2017-06-21 11:28:00,649 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardregistry
2017-06-21 11:28:00,930 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardgr
2017-06-21 11:28:01,023 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardperf
2017-06-21 11:28:01,068 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyarddht
2017-06-21 11:28:01,114 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardimages
2017-06-21 11:28:01,208 DEBUG - clearing table (pk=batche2914cdbba$gpupool-keras): shipyardtorrentinfo
2017-06-21 11:28:01,256 DEBUG - deleting queue: shipyardgr-batche2914cdbba-gpupool-keras
2017-06-21 11:28:01,503 DEBUG - deleting container: shipyardtor-batche2914cdbba-gpupool-keras
2017-06-21 11:28:01,745 DEBUG - deleting container: shipyardrf-batche2914cdbba-gpupool-keras
2017-06-21 11:28:01,795 DEBUG - waiting for pool gpupool-keras to delete


## Next Steps
You can proceed to the [Notebook: Clean Up](05_Clean_Up.ipynb) if you are done for now, or proceed to one of the following additional Notebooks:
* [Notebook: Automatic Model Selection](06_Advanced_Auto_Model_Selection.ipynb)
* [Notebook: Tensorboard Visualization](07_Advanced_Tensorboard.ipynb) - note this requires running this notebook on your own machine
* [Notebook: Parallel and Distributed](08_Advanced_Parallel_and_Distributed.ipynb)