# Flowers Image Classification with TPUs on Cloud ML Engine (ResNet)

This notebook demonstrates how to do image classification from scratch on a flowers dataset using TPUs and the resnet trainer.

In [4]:
import os
PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.8'

In [5]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Convert JPEG images to TensorFlow Records

My dataset consists of JPEG images in Google Cloud Storage. I have two CSV files that are formatted as follows:
   image-name, category

Instead of reading the images from JPEG each time, we'll convert the JPEG data and store it as TF Records.


In [6]:
%bash
gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | head -5 > /tmp/input.csv
cat /tmp/input.csv

gs://cloud-ml-data/img/flower_photos/daisy/754296579_30a9ae018c_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/dandelion/18089878729_907ed2c7cd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/284497199_93a01f48f6.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/3554992110_81d8c9b0bd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/daisy/4065883015_4bb6010cb7_n.jpg,daisy


In [7]:
%bash
gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv  | sed 's/,/ /g' | awk '{print $2}' | sort | uniq > /tmp/labels.txt
cat /tmp/labels.txt

daisy
dandelion
roses
sunflowers
tulips


## Enable TPU service account

Allow Cloud ML Engine to access the TPU and bill to your project

In [None]:
%bash
SVC_ACCOUNT=$(curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
    https://ml.googleapis.com/v1/projects/${PROJECT}:getConfig \
              | grep tpuServiceAccount | tr '"' ' ' | awk '{print $3}' )
echo "Enabling TPU service account $SVC_ACCOUNT to act as Cloud ML Service Agent"
gcloud projects add-iam-policy-binding $PROJECT \
    --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent
echo "Done"

## Run preprocessing

First try it out locally -- note that the inputs are all local paths

In [4]:
%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/imgclass
  
rm -rf /tmp/out
python -m trainer.preprocess \
       --trainCsv /tmp/input.csv \
       --validationCsv /tmp/input.csv \
       --labelsFile /tmp/labels.txt \
       --projectId $PROJECT \
       --outputDir /tmp/out

Read in 5 labels, from daisy to tulips


  from ._conv import register_converters as _register_converters
2018-06-11 18:02:18.451070: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


In [5]:
!ls -l /tmp/out

total 376
-rw-r--r-- 1 root root 192336 Jun 11 18:02 train-00000-of-00001.gz
-rw-r--r-- 1 root root 192361 Jun 11 18:02 validation-00000-of-00001.gz


In [7]:
!zcat /tmp/out/train-00000* | head

�l      �+��
��
0
image/filename

754296579_30a9ae018c_n.jpg

image/format

JPEG

gzip: stdout: Broken pipe


Now run it over full training and evaluation datasets.  This will happen in Cloud Dataflow.

In [None]:
%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/imgclass
gsutil -m rm -rf gs://${BUCKET}/tpu/resnet/data
python -m trainer.preprocess \
       --trainCsv gs://cloud-ml-data/img/flower_photos/train_set.csv \
       --validationCsv gs://cloud-ml-data/img/flower_photos/eval_set.csv \
       --labelsFile /tmp/labels.txt \
       --projectId $PROJECT \
       --outputDir gs://${BUCKET}/tpu/resnet/data

The above preprocessing step will take <b>15-20 minutes</b>. Wait for the job to finish before you proceed. Navigate to [Cloud Dataflow section of GCP web console](https://console.cloud.google.com/dataflow) to monitor job progress. You will see something like this <img src="dataflow.png" />

Alternately, you can simply copy my already preprocessed files and proceed to the next step:
<pre>
gsutil -m cp gs://cloud-training-demos/tpu/resnet/data/* gs://${BUCKET}/tpu/resnet/copied_data
</pre>

In [8]:
%bash
gsutil ls gs://${BUCKET}/tpu/resnet/data

gs://cloud-training-demos-ml/tpu/resnet/data/train-00000-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00001-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00002-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00003-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00004-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00005-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00006-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00007-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00008-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/train-00009-of-00010
gs://cloud-training-demos-ml/tpu/resnet/data/validation-00000-of-00004
gs://cloud-training-demos-ml/tpu/resnet/data/validation-00001-of-00004
gs://cloud-training-demos-ml/tpu/resnet/data/validation-00002-of-00004
gs://cloud-training-demos-ml/tpu/resnet/data/validation-00003-of-00004
gs://cloud-training-demos-ml/tpu/resnet/data/tmp/


## Train on the Cloud

Get the amoebanet code and package it up. This involves changing imports of the form:
<pre>
import resnet_model as model_lib
</pre>
to
<pre>
from . import resnet_model as model_lib
</pre>

Also, there are three hardcoded constants in the code for the model:
<pre>
NUM_TRAIN_IMAGES = 1281167
NUM_EVAL_IMAGES = 50000
LABEL_CLASSES = 1000
</pre>
We'll change them to match our dataset.
<p>
Then, submit to Cloud ML Engine

In [18]:
%bash
echo "NUM_TRAIN_IMAGES = $(gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | wc -l)"
echo "NUM_EVAL_IMAGES = $(gsutil cat gs://cloud-ml-data/img/flower_photos/eval_set.csv | wc -l)"
echo "LABEL_CLASSES = $(cat /tmp/labels.txt | wc -l)"

NUM_TRAIN_IMAGES = 3300
NUM_EVAL_IMAGES = 370
LABEL_CLASSES = 5


In [14]:
%bash
rm -rf tpu
git clone https://github.com/tensorflow/tpu
#cd tpu
#git checkout r${TFVERSION}  # correct version
#cd ..

# copy over
MODELCODE=tpu/models/official/resnet
rm -rf tmp
mkdir -p tmp/trainer
touch tmp/trainer/__init__.py
for FILE in $(ls $MODELCODE); do
    CMD="cat $MODELCODE/$FILE "
    for f2 in $(ls $MODELCODE); do
        MODULE=`echo $f2 | sed 's/.py//g'`
        CMD="$CMD | sed 's/^import ${MODULE}/from . import ${MODULE}/g' "
    done
    echo "WARNING! Harcoding #train=3300 #eval=370 #labels=5 -- Change as needed"
    CMD="$CMD | sed 's/^NUM_TRAIN_IMAGES = 1281167/NUM_TRAIN_IMAGES = 3300/g' "
    CMD="$CMD | sed 's/^NUM_EVAL_IMAGES = 50000/NUM_EVAL_IMAGES = 370/g' "
    CMD="$CMD | sed 's/^LABEL_CLASSES = 1000/LABEL_CLASSES = 5/g' "
    CMD="$CMD > tmp/trainer/$FILE"
    eval $CMD
done
cp imgclass/setup.py tmp
find tmp

tmp
tmp/setup.py
tmp/trainer
tmp/trainer/benchmark
tmp/trainer/resnet_main.py
tmp/trainer/imagenet_input.py
tmp/trainer/__init__.py
tmp/trainer/resnet_model.py
tmp/trainer/README.md
tmp/trainer/resnet_preprocessing.py


Cloning into 'tpu'...
cat: tpu/models/official/resnet/benchmark: Is a directory


In [None]:
%bash
TOPDIR=gs://${BUCKET}/tpu/resnet
OUTDIR=${TOPDIR}/trained
JOBNAME=imgclass_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR  # Comment out this line to continue training from the last time
gcloud ml-engine jobs submit training $JOBNAME \
  --region=$REGION \
  --module-name=trainer.resnet_main \
  --package-path=$(pwd)/tmp/trainer \
  --job-dir=$OUTDIR \
  --staging-bucket=gs://$BUCKET \
  --scale-tier=BASIC_TPU \
  --runtime-version=$TFVERSION \
  -- \
  --data_dir=${TOPDIR}/data \
  --model_dir=${OUTDIR} \
  --resnet_depth=18 \
  --train_batch_size=128 --eval_batch_size=32 --skip_host_call=True \
  --train_steps=1000 \
  --export_dir=${OUTDIR}/export

The above training job will take 12 minutes. 
Wait for the job to finish before you proceed. 
Navigate to [Cloud ML Engine section of GCP web console](https://console.cloud.google.com/mlengine) 
to monitor job progress.

<b> FAILS when exporting </b>

In [None]:
%bash
gsutil ls -l gs://${BUCKET}/tpu/resnet/trained/

## Deploying and predicting with model [doesn't work yet]

Deploy the model:

In [25]:
%bash
MODEL_NAME="flowers"
MODEL_VERSION=amoeba
MODEL_LOCATION=gs://${BUCKET}/tpu/amoeba/trained/
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
#gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION}

Deleting and deploying flowers amoeba from gs://cloud-training-demos-ml/tpu/amoeba/trained/ ... this will take a few minutes


ERROR: (gcloud.ml-engine.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: Deployment directory gs://cloud-training-demos-ml/tpu/amoeba/trained/ is expected to contain exactly one of: [saved_model.pb, saved_model.pbtxt].
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: 'Deployment directory gs://cloud-training-demos-ml/tpu/amoeba/trained/
      is expected to contain exactly one of: [saved_model.pb, saved_model.pbtxt].'
    field: version.deployment_uri


To predict with the model, let's take one of the example images that is available on Google Cloud Storage <img src="http://storage.googleapis.com/cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg" />

In [None]:
%writefile test.json
{"imageurl": "gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg"}

Send it to the prediction service

In [None]:
%bash
gcloud ml-engine predict --model=flowers --version=${MODEL_TYPE} --json-instances=./test.json

<pre>
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
</pre>