# Training in the cloud

This far, we've looked at the `Donkey` library (the data produced, tools, training and the CNN itself). We saw that the training script is a simple python application, [donkey2.py](https://github.com/wroscoe/donkey/blob/master/donkeycar/templates/donkey2.py), that creates and trains the CNN using the [Keras](https://keras.io) framework. Some of the hyperparameters are defined in a config file, [config_defaults.py](https://github.com/wroscoe/donkey/blob/master/donkeycar/templates/config_defaults.py) and easily customized. The end result of a training run is a trained Keras/TensorFlow model, which can be used to drive the car.

But why do the model have to be created and trained using the `Donkey` library and the `Vehicle` abstraction? As long as the end result is the same (a Keras/Tensorflow model), it shouldn't matter where and how it was trained.

In this chapter, we'll look into creating and training the CNN using `SageMaker` libraries instead of the Keras/Tensorflow libraries used in `Donkey` (see [Conda dependencies](https://github.com/wroscoe/donkey/blob/master/envs/ubuntu.yml)). Turns out, it is not much different from the `cnn.py` file we created in the last chapter. By moving the training to the cloud, it allows us to more easily focus on improving e.g:
- Training performance, e.g. newer framework versions, training acceleration (GPU(s)), training distribution (cluster of training instances), etc.
- Training visualization (from a learning perspective) using e.g. TensorBoard
- Network tuning
- Whatever you can come up with =)

## Porting Keras

The Keras library has been adopted by TensorFlow in newer version of that library. By porting our code from Keras to the TensorFlow Keras API, we'll have more flexibility and one less dependency. Yay!

The first step is to use the code from `cnn.py` in this notebook. SageMaker Jupyter notebooks runs a newer version of the TensorFlow library (see https://docs.aws.amazon.com/sagemaker/latest/dg/supported-versions.html), which presents us with more options, but can also lead to incompability issues with the version runing on the car.

FYI, print the TensorFlow version:

In [None]:
import tensorflow
print(tensorflow.__version__)

---

First, we need to port the original Keras library to the Keras API implemented by newer version of TensorFlow, see:
- https://www.tensorflow.org/api_docs/python/tf/keras

The Keras API is imported like:
```python
import tensorflow.python.keras
```

To port the code, just add `tensorflow.python.` to any import using `keras`. We'll also clean up some of the dependencies that weren't used, and rename the base class just for fun.

In [None]:
from tensorflow.python.keras.models import load_model
from tensorflow.python.keras.callbacks import ModelCheckpoint, EarlyStopping

class MyEstimator():
    '''
    The Estimator creates and trains the model.
    
    Renamed from MyPilot.
    '''
    def __init__(self):
        self.model = default_categorical()

    def train(self, train_gen, val_gen, saved_model_path,
              epochs=100, steps=100, train_split=0.8, verbose=1,
              min_delta=.0005, patience=5, use_early_stop=True):

        save_best = ModelCheckpoint(saved_model_path, 
                                    monitor='val_loss', 
                                    verbose=verbose, 
                                    save_best_only=True, 
                                    mode='min')
        
        early_stop = EarlyStopping(monitor='val_loss', 
                                   min_delta=min_delta, 
                                   patience=patience, 
                                   verbose=verbose, 
                                   mode='auto')

        callbacks_list = [save_best]

        if use_early_stop:
            callbacks_list.append(early_stop)
        
        hist = self.model.fit_generator(
                        train_gen, 
                        steps_per_epoch=steps, 
                        epochs=epochs, 
                        verbose=1, 
                        validation_data=val_gen,
                        callbacks=callbacks_list, 
                        validation_steps=steps*(1.0 - train_split))
        return hist

def default_categorical(): 
    from tensorflow.python.keras.models import Model
    from tensorflow.python.keras.layers import Convolution2D
    from tensorflow.python.keras.layers import Input, Dropout, Flatten, Dense
    
    img_in = Input(shape=(120, 160, 3), name='img_in')
    x = img_in

    x = Convolution2D(24, (5,5), strides=(2,2), activation='relu')(x)
    x = Convolution2D(32, (5,5), strides=(2,2), activation='relu')(x)
    x = Convolution2D(64, (5,5), strides=(2,2), activation='relu')(x)
    x = Convolution2D(64, (3,3), strides=(2,2), activation='relu')(x)
    x = Convolution2D(64, (3,3), strides=(1,1), activation='relu')(x)

    x = Flatten(name='flattened')(x)
    x = Dense(100, activation='relu')(x)
    x = Dropout(.1)(x)
    x = Dense(50, activation='relu')(x)
    x = Dropout(.1)(x)

    angle_out = Dense(15, activation='softmax', name='angle_out')(x)
    throttle_out = Dense(1, activation='relu', name='throttle_out')(x)
    
    model = Model(inputs=[img_in], outputs=[angle_out, throttle_out])
    model.compile(optimizer='adam',
                  loss={'angle_out': 'categorical_crossentropy', 'throttle_out': 'mean_absolute_error'},
                  loss_weights={'angle_out': 0.9, 'throttle_out': .001})

    return model

Nice. This class has no `Donkey` dependencies, which makes it easy to port.

## Train the ported model

### Install Donkey (again)

To train the new model, we still need to read car data (Tubs), which is most easily done by installing the Donkey library and use its utility classes for it

In [None]:
# List installed python libraries. tensorflow-gpu (GPU-optimized) should be installed.
!conda list | grep -i tensorflow
!conda list | grep -i donkey

# Clone donkey git
%cd ~/SageMaker
!rm -rf ~/SageMaker/donkey
!git clone https://github.com/wroscoe/donkey

# Donkey has dependencies to tensorflow (non-GPU) and keras, none of which we are interested in.
# Remove Keras and replace tensorflow with tensorflow-gpu
!sed -i -e '/keras==2.0.8/d' donkey/setup.py
!sed -i -e 's/tensorflow>=1.1/tensorflow-gpu>=1.4/g' donkey/setup.py

# Install Donkey
!pip uninstall donkeycar --yes
!pip install ./donkey
!pip show donkeycar

In [None]:
# Define some globals for now
BATCH_SIZE = 128
TEST_SPLIT = 0.8
EPOCHS = 5               # <---- NOTE! Using only 5 epochs for now, to speed up test-training...

import os
from donkeycar.parts.datastore import TubGroup
from donkeycar.utils import linear_bin

def train(tub_names, model_name):
    '''
    Convenience method for training using MyEstimator
    
    Requires the TubGroup class from Donkey to read Tub data.
    '''
    x_keys = ['cam/image_array']
    y_keys = ['user/angle', 'user/throttle']

    def rt(record):
        record['user/angle'] = linear_bin(record['user/angle'])
        return record

    tubgroup = TubGroup(tub_names)
    train_gen, val_gen = tubgroup.get_train_val_gen(x_keys,
                                                    y_keys,
                                                    record_transform=rt,
                                                    batch_size=BATCH_SIZE,
                                                    train_frac=TEST_SPLIT)

    model_path = os.path.expanduser(model_name)

    total_records = len(tubgroup.df)
    total_train = int(total_records * TEST_SPLIT)
    total_val = total_records - total_train
    print('train: %d, validation: %d' % (total_train, total_val))
    steps_per_epoch = total_train // BATCH_SIZE
    print('steps_per_epoch', steps_per_epoch)

    kl = MyEstimator()
    kl.train(train_gen,
             val_gen,
             saved_model_path=model_path,
             steps=steps_per_epoch,
             train_split=TEST_SPLIT,
             epochs=EPOCHS)

In [None]:
# Download Tub
sample_data_location = 's3://jayway-robocar-raw-data/samples'
!aws s3 cp {sample_data_location}/ore.zip /tmp/ore.zip
!mkdir -pv ~/SageMaker/data
!unzip /tmp/ore.zip -d ~/SageMaker/data/

In [None]:
# Invoke
!mkdir -pv ~/SageMaker/models
tub = '~/SageMaker/data/tub_8_18-02-09'
model = '~/SageMaker/models/my-cloud-model'

train(tub, model)

### Test the new model

Test the new model using either the car or simulator (see [donkey-train.ipynb](./donkey-train.ipynb#test-the-new-model)).

## Next

[Visualization](./donkey-board.ipynb)