tf.neuron.saved_model.compile fails with batch_size > 1 #60

tahouse · 2020-01-08T20:38:51Z

Following the tutorial here, the compilation works with batch_size=1. When I changed below to batch_size=2, the compilation fails.

import tensorflow as tf


tf.keras.backend.set_learning_phase(0)
tf.keras.backend.set_image_data_format('channels_last')
model = tf.keras.applications.ResNet50(weights='imagenet')
sess = tf.keras.backend.get_session()
inputs = {'input': model.inputs[0]}
outputs = {'output': model.outputs[0]}

# save the model using tf.saved_model.simple_save
modeldir = "./resnet50/1"
tf.saved_model.simple_save(sess, modeldir, inputs, outputs)

# compile the model for Inferentia
neuron_modeldir = "./resnet50_inf2/1"
tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=2)

Output from compilation...

$ python compile.py
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-01-08 20:10:40.144936: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-01-08 20:10:40.150407: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000005000 Hz
2020-01-08 20:10:40.151658: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563e0cf6be80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-08 20:10:40.151678: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From compile.py:7: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From compile.py:14: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/site-packages/tensorflow_core/python/neuron/python/saved_model.py:136: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
WARNING:tensorflow:Failed to fuse subgraph neuron_op_d6f098c01c780733 with '/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpjj5xgykv/neuron_op_d6f098c01c780733/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpjj5xgykv/neuron_op_d6f098c01c780733/graph_def.neff --io-config "{\"inputs\": {\"input_10/_0:0\": [[2, 224, 224, 3], \"float32\"]}, \"outputs\": [\"probs/Softmax:0\"]}"'
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 555
INFO:tensorflow:Number of operations placed on Neuron runtime: 0
INFO:tensorflow:Successfully converted ./resnet50/1 to ./resnet50_inf2/1

The following are current latest versions of neuron packages running on c5.9xl with DLAMI v26 (ubuntu)

(aws_neuron_tensorflow_p36) ubuntu@ip-172-31-0-4:~$ pip list | grep neuron
neuron-cc                          1.0.5939.0+5849551057
tensorboard-neuron                 1.15.0.1.0.315.0
tensorflow-neuron                  1.15.0.1.0.803.0
You are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

aws_neuron_tensorflow_p36) ubuntu@ip-172-31-0-4:~$ apt list | grep neuron

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuron-runtime/unknown,now 1.0.4751.0 amd64 [installed]
aws-neuron-runtime-base/unknown,now 1.0.4587.0 amd64 [installed]
aws-neuron-tools/unknown,now 1.0.4587.0 amd64 [installed]
tensorflow-model-server-neuron/unknown,now 1.15.0.1.0.803.0 all [installed]

I updated following the release notes for DLAMI

#!/bin/bash

sudo apt-get update
sudo apt-get -y install aws-neuron-runtime-base
sudo apt-get -y install aws-neuron-runtime
sudo apt-get -y install aws-neuron-tools
sudo apt-get -y install tensorflow-model-server-neuron

source activate aws_neuron_tensorflow_p36
conda install numpy=1.17.2 --yes --quiet
conda update tensorflow-neuron

Is this the most up-to-date API for setting batch_size? https://github.com/aws/aws-neuron-sdk/blob/master/docs/tensorflow-neuron/api-compilation-python-api.md

The text was updated successfully, but these errors were encountered:

aws-taylor · 2020-01-10T22:13:52Z

Hello Tyler,

We have been able to reproduce this issue on our end and have opened a ticket internally. In general, we support batch sizes up to 4, however this particular network triggered an edge case in our memory allocator. We will work to fix this issue in a near release.

Regards,
Taylor

jeffhataws · 2020-02-21T23:24:05Z

Current compiler limitations mean batching with FP32 has different behavior compared to batching with FP16 because it take more memory to store the input data, so only ResNet50 FP32 batch=1 is compilable. Please see https://github.com/aws/aws-neuron-sdk/blob/master/docs/technotes/performance-tuning.md for more information on how to increase batch size with FP16 graph.

aws-taylor closed this as completed Jan 10, 2020

aws-taylor reopened this Jan 10, 2020

jeffhataws closed this as completed Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.neuron.saved_model.compile fails with batch_size > 1 #60

tf.neuron.saved_model.compile fails with batch_size > 1 #60

tahouse commented Jan 8, 2020 •

edited

Loading

aws-taylor commented Jan 10, 2020

jeffhataws commented Feb 21, 2020

tf.neuron.saved_model.compile fails with batch_size > 1 #60

tf.neuron.saved_model.compile fails with batch_size > 1 #60

Comments

tahouse commented Jan 8, 2020 • edited Loading

aws-taylor commented Jan 10, 2020

jeffhataws commented Feb 21, 2020

tahouse commented Jan 8, 2020 •

edited

Loading