# Flowers Image Classification with TPUs on Cloud ML Engine

This notebook demonstrates how to do image classification from scratch on a flowers dataset using TPUs

In [64]:
import os
PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.8'

In [2]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Convert JPEG images to TensorFlow Records

My dataset consists of JPEG images in Google Cloud Storage. I have two CSV files that are formatted as follows:
   image-name, category

Instead of reading the images from JPEG each time, we'll convert the JPEG data and store it as TF Records.


In [26]:
%bash
gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | head -5 > /tmp/input.csv
cat /tmp/input.csv

gs://cloud-ml-data/img/flower_photos/daisy/754296579_30a9ae018c_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/dandelion/18089878729_907ed2c7cd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/284497199_93a01f48f6.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/3554992110_81d8c9b0bd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/daisy/4065883015_4bb6010cb7_n.jpg,daisy


In [54]:
%bash
gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv  | sed 's/,/ /g' | awk '{print $2}' | sort | uniq > /tmp/labels.txt
cat /tmp/labels.txt

daisy
dandelion
roses
sunflowers
tulips


## Enable TPU service account

Allow Cloud ML Engine to access the TPU and bill to your project

In [None]:
%bash
SVC_ACCOUNT=$(curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
    https://ml.googleapis.com/v1/projects/${PROJECT}:getConfig \
              | grep tpuServiceAccount | tr '"' ' ' | awk '{print $3}' )
echo "Enabling TPU service account $SVC_ACCOUNT to act as Cloud ML Service Agent"
gcloud projects add-iam-policy-binding $PROJECT \
    --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent
echo "Done"

## Run preprocessing

First try it out locally -- note that the inputs are all local paths

In [70]:
%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/imgclass
python -m trainer.preprocess \
       --trainCsv /tmp/input.csv \
       --validationCsv /tmp/input.csv \
       --labelsFile /tmp/labels.txt \
       --projectId $PROJECT \
       --outputDir /tmp/out

Read in 5 labels, from daisy to tulips


  from ._conv import register_converters as _register_converters
2018-06-08 16:28:26.028328: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


In [71]:
!ls -l /tmp/out

total 1096
-rw-r--r-- 1 root root 558761 Jun  8 16:28 train-00000-of-00001
-rw-r--r-- 1 root root 558761 Jun  8 16:28 validation-00000-of-00001


Now run it over full training and evaluation datasets.  This will happen in Cloud Dataflow.

In [None]:
%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/imgclass

python -m trainer.preprocess \
       --trainCsv gs://cloud-ml-data/img/flower_photos/train_set.csv \
       --validationCsv gs://cloud-ml-data/img/flower_photos/eval_set.csv \
       --labelsFile /tmp/labels.txt \
       --projectId $PROJECT \
       --outputDir gs://${BUCKET}/tpu/imgclass/data

The above preprocessing step will take <b>15-20 minutes</b>. Wait for the job to finish before you proceed. Navigate to [Cloud Dataflow section of GCP web console](https://console.cloud.google.com/dataflow) to monitor job progress. You will see something like this <img src="dataflow.png" />

Alternately, you can simply copy my already preprocessed files and proceed to the next step:
<pre>
gsutil -m cp gs://cloud-training-demos/tpu/imgclass/data/* gs://${BUCKET}/tpu/imgclass/copied_data
</pre>

In [None]:
%bash
gsutil ls gs://${BUCKET}/tpu/imgclass/data

## Train on the Cloud

Get the amoebanet code and package it up. This involves changing imports of the form:
<pre>
import amoeba_net_model as model_lib
</pre>
to
<pre>
from . import amoeba_net_model as model_lib
</pre>
Then, submit to Cloud ML Engine

In [100]:
%bash
#git clone https://github.com/tensorflow/tpu
MODELCODE=tpu/models/experimental/amoeba_net
rm -rf tmp
mkdir -p tmp/trainer
touch tmp/trainer/__init__.py
for FILE in $(ls $MODELCODE); do
    CMD="cat $MODELCODE/$FILE "
    for f2 in $(ls $MODELCODE); do
        MODULE=`echo $f2 | sed 's/.py//g'`
        CMD="$CMD | sed 's/^import ${MODULE}/from . import ${MODULE}/g' "
    done
    CMD="$CMD > tmp/trainer/$FILE"
    eval $CMD
done
cp imgclass/setup.py tmp
find tmp

tmp
tmp/setup.py
tmp/trainer
tmp/trainer/amoeba_net.py
tmp/trainer/model_builder.py
tmp/trainer/amoeba_net_model.py
tmp/trainer/__init__.py
tmp/trainer/README.md
tmp/trainer/network_utils_test.py
tmp/trainer/inception_preprocessing.py
tmp/trainer/model_specs.py
tmp/trainer/network_utils.py


In [None]:
%bash

OUTDIR=gs://${BUCKET}/tpu/imgclass/trained
JOBNAME=imgclass_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR  # Comment out this line to continue training from the last time
gcloud ml-engine jobs submit training $JOBNAME \
  --region=$REGION \
  --module-name=trainer.amoeba_net \
  --package-path=$(pwd)/tmp/trainer \
  --job-dir=$OUTDIR \
  --staging-bucket=gs://$BUCKET \
  --scale-tier=BASIC_TPU \
  --runtime-version=$TFVERSION \
  -- \
  --data_dir=gs://${BUCKET}/tpu/imgclass/data \
  --model_dir=${OUTDIR} \
  --num_epochs=3

The above training job will take 30-40 minutes. 
Wait for the job to finish before you proceed. 
Navigate to [Cloud ML Engine section of GCP web console](https://console.cloud.google.com/mlengine) 
to monitor job progress.

<b> CRASHED </b>

In [96]:
%bash
gsutil ls -l gs://${BUCKET}/tpu/imgclass/trained

         0  2018-06-08T18:32:59Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/
 762846055  2018-06-08T18:42:52Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/events.out.tfevents.1528482791.cmle-training-15752888831013639612
        40  2018-06-08T18:34:32Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/events.out.tfevents.1528482872.n-426ff60d-w-0.v2
        40  2018-06-08T18:37:59Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/events.out.tfevents.1528483078.n-426ff60d-w-0.v2
        40  2018-06-08T18:41:30Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/events.out.tfevents.1528483290.n-426ff60d-w-0.v2
 145156989  2018-06-08T18:42:24Z  gs://cloud-training-demos-ml/tpu/imgclass/trained/graph.pbtxt
TOTAL: 6 objects, 908003164 bytes (865.94 MiB)


## Deploying and predicting with model [doesn't work yet]

Deploy the model:

In [None]:
%bash
MODEL_NAME="flowers"
MODEL_VERSION=amoeba
MODEL_LOCATION=gs://${BUCKET}/tpu/imgclass/trained
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
#gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION}

To predict with the model, let's take one of the example images that is available on Google Cloud Storage <img src="http://storage.googleapis.com/cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg" />

In [None]:
%writefile test.json
{"imageurl": "gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg"}

Send it to the prediction service

In [None]:
%bash
gcloud ml-engine predict --model=flowers --version=${MODEL_TYPE} --json-instances=./test.json

<pre>
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
</pre>