# Using the Script Mode to train any TensorFlow script from GitHub in SageMaker

In this tutorial, we will show how is simple to train a TensorFlow script in SageMaker using the new Script Mode Tensorflow Container.

The example the we choosed is [Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow) but this same technique can be use to other scripts/repos including [TensorFlow Model Zoo](https://github.com/tensorflow/models) and [TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks).


## Seting up the environment
Let's start by creating a SageMaker session and specifying:
- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation [for how to create these](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the ```sagemaker.get_execution_role()``` with a the appropriate full IAM role arn string(s).


In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()

role = sagemaker.get_execution_role()

### Clone the repository

In [None]:
!git clone https://github.com/sherjilozair/char-rnn-tensorflow > /dev/null 2>&1

This repository includes a README.md with an overview of the project, requirements, and basic usage:

In [None]:
from IPython.display import display, Markdown, Latex
display(Markdown('char-rnn-tensorflow/README.md'))

### Getting the data

In [None]:
!mkdir sherlock
!wget https://sherlock-holm.es/stories/plain-text/cnus.txt --force-directories --output-document=sherlock/input.txt

### Upload the data for training

In [None]:
inputs = sagemaker_session.upload_data(path='sherlock', bucket=bucket, key_prefix='datasets/sherlock')

## Testing locally


Script Mode is still in developement phase. We will have to construct a Estimator to be able to use it with [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk).

In [None]:
import boto3
from sagemaker.estimator import Framework
from sagemaker.tensorflow import TensorFlow

class ScriptModeTensorFlow(Framework):
    """This class is temporary until the final version of Script Mode is released.
    """
    
    __framework_name__ = "tensorflow-scriptmode-beta"
    
    create_model = TensorFlow.create_model
    
    def __init__(self, py_version='py3', **kwargs):
        super(ScriptModeTensorFlow, self).__init__(**kwargs)
        self.py_version = py_version
        self.image_name = None
        self.framework_version = '1.10.0'


We can use [Local Mode](https://github.com/aws/sagemaker-python-sdk#local-mode) to simulate SageMaker locally before submit training:

In [None]:
hyperparameters = {'num_epochs': 1, 
                   'data_dir': '/opt/ml/input/data/training',
                   'save_dir': '/opt/ml/model'}

estimator = ScriptModeTensorFlow(entry_point='train.py',
                                 source_dir='char-rnn-tensorflow',
                                 train_instance_type='local', 
                                 train_instance_count=1,
                                 hyperparameters=hyperparameters,
                                 role=role)

estimator.fit({'training': inputs})

## How does it work

The cell above downloaded a Python 3 CPU container locally and used it to simulate SageMaker training. When training starts, the script mode will invoke the following command inside the container:
```bash
python *entry_point* --hyperparameter1 *hyperparameter value1* --hyperparameter2 *hyperparameter value2* ...
```

The entrypoint script will be invoke with each hyperparameter as a script argument. The command executed for the example above is:

```bash
python train.py --num_epochs 1 --data_dir /opt/ml/input/data/training --save_dir /opt/ml/model
```

**/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data was downloaded in this folder because **training** is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. 

**/opt/ml/model** is the directory the model should be saved inside the container. Any data saved in this folder will be saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.

# Training in SageMaker

You can change the estimator argument **train_instance_type** to any SageMaker ml instance available for training. For example:

In [None]:
estimator = ScriptModeTensorFlow(entry_point='train.py',
                                source_dir='char-rnn-tensorflow',
                                train_instance_type='ml.c4.xlarge', 
                                train_instance_count=1,
                                hyperparameters=hyperparameters,
                                role=role)

estimator.fit({'training': inputs})

# Installing additional requirements

## Installing pip packages

Script Mode will install your source_dir in the container as a [Python package](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py#L100). You can include a [requirements.txt file in the root folder of your source_dir to install any pip dependencies](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py#L111). You can, for example, install the lastest version of tensorflow in the container:

content of requirements.txt
```
tensorflow==1.11.0
```

# Installing apt-get packages and other dependencies
You can define a setup.py file in your source_dir to install other dependencies. The example below will install [TensorFlow for C](https://www.tensorflow.org/install/lang_c) in the container.

In [None]:
!mkdir tf_c

In [None]:
%%writefile tf_c/get-tf-c.sh

wget -q -t 3 https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.11.0.tar.gz
tar -xzvf libtensorflow-cpu-linux-x86_64-1.11.0.tar.gz -C /usr/local

ldconfig

gcc -I/usr/local/include -L/usr/local/lib hello_tf.c -ltensorflow -o hello_tf
cp hello_tf /usr/bin/

In [None]:
%%writefile tf_c/hello_tf.c

#include <stdio.h>
#include <tensorflow/c/c_api.h>

int main() {
  printf("Hello from TensorFlow C library version %s\n", TF_Version());
  return 0;
}

In [None]:
%%writefile tf_c/setup.py
from distutils.command.build_py import build_py as _build_py
from distutils.core import setup
import subprocess

class build_py(_build_py):
    def run(self):
        subprocess.check_output(['bash', './get-tf-c.sh'])

        super(build_py, self).run()


from setuptools import setup
setup(packages=[''],
      name="test",
      version='1.0.0',
      cmdclass={'build_py': build_py},
      include_package_data=True)

In [None]:
%%writefile tf_c/train_c.py

import subprocess

message = subprocess.check_output('hello_tf')
assert message == b'Hello from TensorFlow C library version 1.11.0\n'

In [None]:
estimator = ScriptModeTensorFlow(entry_point='train_c.py',
                                 source_dir='tf_c',
                                 train_instance_type='local', 
                                 train_instance_count=1,
                                 role=role)

estimator.fit({})