# Flowers Image Classification with TensorFlow on Cloud ML Engine TPU

This notebook demonstrates how to do image classification from scratch on a flowers dataset using the Estimator API. Unlike [flowers_fromscratch.ipynb](the flowers_fromscratch notebook), here we do it on a TPU.

Therefore, this will work only if you have quota for TPUs (not in Qwiklabs). It will cost about $3 if you want to try it out.

In [None]:
%%bash
pip install apache-beam[gcp]

After doing a pip install, click on Reset Session so that the Python environment picks up the new package

In [155]:
import os
PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1
MODEL_TYPE = 'tpu'

# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['MODEL_TYPE'] = MODEL_TYPE
os.environ['TFVERSION'] = '1.8'  # Tensorflow version

In [127]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Preprocess JPEG images to TF Records

While using a GPU, it is okay to read the JPEGS directly from our input_fn. However, TPUs are too fast and it will be very wasteful to have the TPUs wait on I/O. Therefore, we'll preprocess the JPEGs into TF Records.

This runs on Cloud Dataflow and will take <b> 15-20 minutes </b>

In [3]:
%%bash
gcloud storage cat gs://cloud-ml-data/img/flower_photos/train_set.csv  | sed 's/,/ /g' | awk '{print $2}' | sort | uniq > /tmp/labels.txt

In [37]:
%%bash
gcloud storage cat gs://cloud-ml-data/img/flower_photos/train_set.csv | wc -l
gcloud storage cat gs://cloud-ml-data/img/flower_photos/eval_set.csv | wc -l

3300
370


In [None]:
%%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/flowersmodeltpu
gcloud storage rm --recursive --continue-on-error gs://${BUCKET}/tpu/flowers/data
python -m trainer.preprocess \
       --train_csv gs://cloud-ml-data/img/flower_photos/train_set.csv \
       --validation_csv gs://cloud-ml-data/img/flower_photos/eval_set.csv \
       --labels_file /tmp/labels.txt \
       --project_id $PROJECT \
       --output_dir gs://${BUCKET}/tpu/flowers/data

In [6]:
%%bash
gcloud storage ls gs://${BUCKET}/tpu/flowers/data/

gs://cloud-training-demos-ml/tpu/flowers/data/train-00000-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00001-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00002-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00003-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00004-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00005-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00006-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/train-00007-of-00008
gs://cloud-training-demos-ml/tpu/flowers/data/validation-00000-of-00003
gs://cloud-training-demos-ml/tpu/flowers/data/validation-00001-of-00003
gs://cloud-training-demos-ml/tpu/flowers/data/validation-00002-of-00003
gs://cloud-training-demos-ml/tpu/flowers/data/tmp/


## Run as a Python module

First run locally without --use_tpu -- don't be concerned if the process gets killed for using too much memory.

In [None]:
%%bash
WITHOUT_TPU="--train_batch_size=2  --train_steps=5"
OUTDIR=./flowers_trained
rm -rf $OUTDIR
export PYTHONPATH=${PYTHONPATH}:${PWD}/flowersmodeltpu
python -m flowersmodeltpu.task \
   --output_dir=$OUTDIR \
   --num_train_images=3300 \
   --num_eval_images=370 \
   $WITHOUT_TPU \
   --learning_rate=0.01 \
   --project=${PROJECT} \
   --train_data_path=gs://${BUCKET}/tpu/flowers/data/train* \
   --eval_data_path=gs://${BUCKET}/tpu/flowers/data/validation*

Then, run it on Cloud ML Engine with --use_tpu

In [153]:
%%bash
WITH_TPU="--train_batch_size=256  --train_steps=3000 --batch_norm --use_tpu"
WITHOUT_TPU="--train_batch_size=2  --train_steps=5"
OUTDIR=gs://${BUCKET}/flowers/trained_${MODEL_TYPE}_delete
JOBNAME=flowers_${MODEL_TYPE}_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gcloud storage rm --recursive --continue-on-error $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=flowersmodeltpu.task \
   --package-path=${PWD}/flowersmodeltpu \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC_TPU \
   --runtime-version=$TFVERSION \
   -- \
   --output_dir=$OUTDIR \
   --num_train_images=3300 \
   --num_eval_images=370 \
   $WITH_TPU \
   --learning_rate=0.01 \
   --project=${PROJECT} \
   --train_data_path=gs://${BUCKET}/tpu/flowers/data/train-* \
   --eval_data_path=gs://${BUCKET}/tpu/flowers/data/validation-*

gs://cloud-training-demos-ml/flowers/trained_tpu_delete us-central1 flowers_tpu_180827_182402
jobId: flowers_tpu_180827_182402
state: QUEUED


CommandException: 1 files/objects could not be removed.
Job [flowers_tpu_180827_182402] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe flowers_tpu_180827_182402

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs flowers_tpu_180827_182402


In [154]:
%%bash
MODEL_LOCATION=$(gcloud storage ls gs://${BUCKET}/flowers/trained_${MODEL_TYPE}/export/exporter | tail -1)
saved_model_cli show --dir $MODEL_LOCATION --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['classes']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: ()
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['class'] tensor_info:
        dtype: DT_STRING
        shape: (1)
        name: GatherV2:0
    outputs['classid'] tensor_info:
        dtype: DT_INT32
        shape: (1)
        name: Cast_1:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 5)
        name: Softmax:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: ()
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    

  from ._conv import register_converters as _register_converters
  from . import h5a, h5d, h5ds, h5f, h5fd, h5g, h5r, h5s, h5t, h5p, h5z
  from .. import h5g, h5i, h5o, h5r, h5t, h5l, h5p
  from . import _csparsetools
  from ._shortest_path import shortest_path, floyd_warshall, dijkstra,\
  from ._tools import csgraph_to_dense, csgraph_from_dense,\
  from ._traversal import breadth_first_order, depth_first_order, \
  from ._min_spanning_tree import minimum_spanning_tree
  from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching, \
  from ._solve_toeplitz import levinson
  from ._decomp_update import *
  from ._ufuncs import *
  from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
  from . import _bspl
  from .ckdtree import *
  from .qhull import *
  from . import _voronoi
  from . import _hausdorff
  from . import _ni_label
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  from pandas._libs import (hashtable as _hashtable,
  from pandas._l

## Monitoring training with TensorBoard

Use this cell to launch tensorboard

In [None]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://{}/flowers/trained_{}'.format(BUCKET, MODEL_TYPE))

In [None]:
for pid in TensorBoard.list()['pid']:
  TensorBoard().stop(pid)
  print 'Stopped TensorBoard with pid {}'.format(pid)

## Deploying and predicting with model

Deploy the model:

In [158]:
%%bash
MODEL_NAME="flowers"
MODEL_VERSION=${MODEL_TYPE}
MODEL_LOCATION=$(gcloud storage ls gs://${BUCKET}/flowers/trained_${MODEL_TYPE}/export/exporter | tail -1)
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
#gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud alpha ml-engine versions create ${MODEL_VERSION} --machine-type mls1-c4-m4 --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=$TFVERSION

Deleting and deploying flowers tpu from gs://cloud-training-demos-ml/flowers/trained_tpu/export/exporter/1534800612/ ... this will take a few minutes


Creating version (this might take a few minutes)......
........................................................................................................done.


To predict with the model, let's take one of the example images that is available on Google Cloud Storage <img src="http://storage.googleapis.com/cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg" />

In [None]:
%%bash
gcloud alpha ml-engine models list

The online prediction service expects images to be base64 encoded as described [here](https://cloud.google.com/ml-engine/docs/tensorflow/online-predict#binary_data_in_prediction_input).

In [None]:
%%bash
IMAGE_URL=gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg

# Copy the image to local disk.
gcloud storage cp $IMAGE_URL flower.jpg

# Base64 encode and create request message in json format.
python -c 'import base64, sys, json; img = base64.b64encode(open("flower.jpg", "rb").read()).decode(); print(json.dumps({"image_bytes":{"b64": img}}))' &> request.json

Send it to the prediction service

In [162]:
%%bash
gcloud ml-engine predict \
  --model=flowers2 \
  --version=${MODEL_TYPE} \
  --json-instances=./request.json

ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {
  "error": {
    "code": 500,
    "message": "Internal error encountered.",
    "status": "INTERNAL",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.DebugInfo",
        "detail": "[ORIGINAL ERROR] generic::internal: Prediction failed."
      }
    ]
  }
}



<pre>
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
</pre>