[View in Colaboratory](https://colab.research.google.com/github/christianmerkwirth/colabs/blob/master/TraingCifar10FromColabAndInCloud.ipynb)

In [0]:
!git clone https://github.com/tensorflow/models.git

In [0]:
import os
os.chdir('models/tutorials/image/cifar10_estimator/')

First let's convert the training data to tfrecord format.

In [0]:
!python generate_cifar10_tfrecords.py --data-dir=${PWD}/cifar-10-data

In [0]:
!ls -lah ${PWD}/cifar-10-data

In [0]:
LOG_DIR = "/tmp/cifar10"
!mkdir -p {LOG_DIR}

**Execute the cell below if you want to launch Tensorboard for monitoring the training on Colab. This starts a longer running install of the ngrok binary that might be unsafe. You can execute all subsequent cells in this Colab without starting Tensorboard. Use at own risk.**

In [0]:
# Launch Tensorboard to monitor the training on the Colab host instance.

# First we need to install ngrok.
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip 
!unzip -o ngrok-stable-linux-amd64.zip

print('Starting tensorboard')
# Start tensorboard on the Colab host instance.
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'.format(LOG_DIR)
)

print('Starting ngrok')
# Use ngrok to allow tunneling into localhost.
get_ipython().system_raw('./ngrok http 6006 &')

import time
print('Wait a moment for things to start up.')
time.sleep(5)

print('Retrieving tensorboard url')
# Last, show the Tensorboard url.
!curl -s http://localhost:4040/api/tunnels | python -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

**Run the model training locally (on the Colab host instance), using the python modules from the git repo.**

In [0]:
%set_env LOG_DIR={LOG_DIR}
!echo "Starting training in $LOG_DIR"
!python cifar10_main.py --job-dir=$LOG_DIR \
                        --data-dir=${PWD}/cifar-10-data  \
                        --num-gpus=1 \
                        --train-steps=6000

**Next we move from the Colab host instance to Google Cloud Platform (GCP).**


Authenticate to enable access to GCP.

In [0]:
from google.colab import auth
auth.authenticate_user()

Start the multigpu training. 

**WARNING. This training will run on GCP and will be billed against your GCP account.**

In [0]:
%%bash

gcloud config set compute/region us-central1
gcloud config set project MYAWESOMEPROJECT

# Make sure that the bucket is in the same region as the compute to allow fast
# data transfer.
MY_BUCKET=gs://MYAWESOMEBUCKET
# Now we copy the training data from the notebook local storage to the cloud
# bucket.
gsutil cp -r ${PWD}/cifar-10-data $MY_BUCKET

# Move up one directory level since we need to have the code packaged into a
# package directory.
cd ..
gcloud ml-engine jobs submit training cifarmultigpu_$(date +%s) \
    --runtime-version 1.4 \
    --job-dir=$MY_BUCKET/model_dirs/cifarmultigpu \
    --config cifar10_estimator/cmle_config.yaml \
    --package-path cifar10_estimator/ \
    --module-name cifar10_estimator.cifar10_main \
    -- \
    --data-dir=$MY_BUCKET/cifar-10-data \
    --num-gpus=4 \
    --train-steps=10000