# ResNet CIFAR-10 with tensorboard

This notebook shows how to use TensorBoard, and how the training job writes checkpoints to a external bucket.
The model used for this notebook is a RestNet model, trained with the CIFAR-10 dataset.
See the following papers for more background:

[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.

[Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.

### Set up the environment

In [None]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Download the CIFAR-10 dataset
Downloading the test and training data will take around 5 minutes.

In [None]:
import utils

utils.cifar10_download()

### Upload the data to a S3 bucket

In [None]:
inputs = sagemaker_session.upload_data(path='/tmp/cifar10_data', key_prefix='data/cifar10')

**sagemaker_session.upload_data** will upload the CIFAR-10 dataset from your machine to a bucket named **sagemaker-{region}-{*your aws account number*}**, if you don't have this bucket yet, sagemaker_session will create it for you.

### Complete source code
- [source_dir/resnet_model.py](source_dir/resnet_model.py): ResNet model
- [source_dir/resnet_cifar_10.py](source_dir/resnet_cifar_10.py): main script used for training and hosting

## Create a training job using the sagemaker.TensorFlow estimator

In [None]:
from sagemaker.tensorflow import TensorFlow


source_dir = os.path.join(os.getcwd(), 'source_dir')
estimator = TensorFlow(entry_point='resnet_cifar_10.py',
                       source_dir=source_dir,
                       role=role,
                       hyperparameters={'min_eval_frequency': 10},
                       training_steps=1000, evaluation_steps=100,
                       train_instance_count=2, train_instance_type='ml.c4.xlarge', 
                       base_job_name='tensorboard-example')

estimator.fit(inputs, run_tensorboard_locally=True)

The **```fit```** method will create a training job named **```tensorboard-example-{unique identifier}```** in two **ml.c4.xlarge** instances. These instances will write checkpoints to the s3 bucket **```sagemaker-{your aws account number}```**.

If you don't have this bucket yet, **```sagemaker_session```** will create it for you. These checkpoints can be used for restoring the training job, and to analyze training job metrics using **TensorBoard**. 

The parameter **```run_tensorboard_locally=True```** will run **TensorBoard** in the machine that this notebook is running. Everytime a new checkpoint is created by the training job in the S3 bucket, **```fit```** will download the checkpoint to the temp folder that **TensorBoard** is pointing to.

When the **```fit```** method starts the training, it will log the port that **TensorBoard** is using to display the metrics. The default port is **6006**, but another port can be choosen depending on its availability. The port number will increase until finds an available port. After that the port number will printed in stdout.

It takes a few minutes to provision containers and start the training job.**TensorBoard** will start to display metrics shortly after that.

You can access **TensorBoard** locally at [http://localhost:6006](http://localhost:6006) or using your SageMaker notebook instance [proxy/6006/](/proxy/6006/)(TensorBoard will not work if forget to put the slash, '/', in end of the url). If TensorBoard started on a different port, adjust these URLs to match.This example uses the optional hyperparameter **```min_eval_frequency```** to generate training evaluations more often, allowing to visualize **TensorBoard** scalar data faster. You can find the available optional hyperparameters [here](https://github.com/aws/sagemaker-python-sdk#optional-hyperparameters)**.

# Deploy the trained model to prepare for predictions

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [None]:
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

# Cleaning up
To avoid incurring charges to your AWS account for the resources used in this tutorial you need to delete the **SageMaker Endpoint:**

In [None]:
sagemaker.Session().delete_endpoint(predictor.endpoint)