TF Feature Columns - embedding_column not supported? #105

jeisinge · 2020-04-24T22:57:26Z

We have a TensorFlow Estimator SavedModel. When it is compiled and run on the serving infrastructure, we get the following error:

2020-04-24 22:51:49.219277: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.

We believe this might be due to operations associated with https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column or other feature columns. Are all TF Feature columns supported with Inf/Neuron?

The text was updated successfully, but these errors were encountered:

mrnikwaws · 2020-04-25T02:23:44Z

Thanks for reporting this problem. We'll investigate and get back to you.

Does this problem occur when run locally using Tensorflow 1.15, or only on serving infrastructure?

jeisinge · 2020-04-25T03:17:10Z

The exported SavedModel works on TF-Serving. After compilation, it no longer serves on Neuron.

The compilation had the following warnings:

 2020-04-24 21:43:18.973556: I bazel-out/k8-opt/genfiles/tensorflow/python/neuron/convert/segment.cc:460] There are 364 ops of 27 different types in the graph that are not compiled by neuron-cc: Tile, Assert, ExpandDims, Switch, PlaceholderWithDefault, Range, ParseExample, Const, GatherV2, NoOp, OneHot, Placeholder, HashTableV2, SquaredDifference, LookupTableFindV2, AsString, LookupTableSizeV2, SparseFillEmptyRows, SelectV2, Merge, SparseReshape, StringToHashBucketFast, Where, ArgMax, Bucketize, SparseSegmentMean, Unique, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
...
INFO:tensorflow:Number of operations in TensorFlow session: 21918
INFO:tensorflow:Number of operations after tf.neuron optimizations: 11759
INFO:tensorflow:Number of operations placed on Neuron runtime: 202

mrnikwaws · 2020-04-27T16:56:07Z

Hi Jeisinge,

It looks like this is the same issue reported in this stack overflow post: https://stackoverflow.com/questions/44236090/how-to-keep-lookup-tables-initialized-for-prediction-and-not-just-training. Please refer there for example code snippets.

Can you please try the following and let us know if the issue is resolved?

Add an initializer operation when you save the model OR
If you are using the tf.estimator API then load the model in python, add an initializer op and re-save it as a new SavedModel

jeisinge · 2020-04-27T18:41:23Z

I don't know if this is the same issue.

The poster is not exporting the SavedModel correctly --- it is not running anywhere. However, our SavedModel works well on TensorFlow Serving and in TensorFlow. Also, the high-level API appears to be very different - it is using tf.contrib.lookup in the model; we are using tf.feature_column.embedding_column in the Estimator; in particular, I believe Estimator does all of this for us as does the solution author:

NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default

Why does our Estimator SavedModel work well on TensorFlow Serving, but not compile on Neuron TF Serving?

mrnikwaws · 2020-04-28T21:28:47Z

Are we able to get some sample code (which does something minimal with the same problem), or can you share your code?

We currently run an optimization pass called convert_variables_to_constants which may cancel some control edges. So your problem may be combination of your code and our optimizations. However we are making best guesses without some sample code.

The following script generates a model that contains a table lookup operator and it works fine with Neuron. Nonetheless the reported error can be triggered if the with statement with sess.graph.control_dependencies([table.initializer]): is not there.

# table_lookup.py
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn


with tf.Session(graph=tf.Graph()) as sess:
    keys_tensor = tf.constant([1, 2])
    vals_tensor = tf.constant([3.0, 4.0])
    input_tensor = tf.placeholder(tf.int32, [2])
    feed_dict = {input_tensor.name: [1, 5]}
    table = tf.lookup.StaticHashTable(
        tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1.0)
    with sess.graph.control_dependencies([table.initializer]):
        lookup = table.lookup(input_tensor)
    tensor = lookup + 1.0
    out = tensor - 2.0

    print(sess.run(out, feed_dict))
    model_dir = './temp_model'
    shutil.rmtree(model_dir, ignore_errors=True)
    inputs = {input_tensor.name: input_tensor}
    outputs = {out.name: out}
    tf.saved_model.simple_save(sess, model_dir, inputs, outputs)

model_dir_neuron = './temp_model_neuron'
shutil.rmtree(model_dir_neuron, ignore_errors=True)
tfn.saved_model.compile(model_dir, model_dir_neuron)

with tf.Session(graph=tf.Graph()) as sess:
    meta_graph = tf.saved_model.loader.load(sess, ['serve'], model_dir_neuron)
    input_tensor = sess.graph.get_tensor_by_name(input_tensor.name)
    out = sess.graph.get_tensor_by_name(out.name)
    print(sess.run(out, feed_dict))

Regular output:

(newenv) [test@cdd examples]$ python table_lookup.py 
WARNING:tensorflow:From table_lookup.py:6: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-04-28 20:25:44.972150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-28 20:25:44.972190: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-28 20:25:44.972209: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cdd): /proc/driver/nvidia/version does not exist
2020-04-28 20:25:44.972539: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-28 20:25:44.983396: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300050000 Hz
2020-04-28 20:25:44.986656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x36a8970 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-28 20:25:44.986683: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From table_lookup.py:9: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[ 2. -2.]
WARNING:tensorflow:From table_lookup.py:23: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2020-04-28 20:25:45.703566: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 6 ops of 5 different types in the graph that are not compiled by neuron-cc: LookupTableImportV2, LookupTableFindV2, HashTableV2, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_794c60f0eaf84c4e with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 11
INFO:tensorflow:Number of operations after tf.neuron optimizations: 12
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:Successfully converted ./temp_model to ./temp_model_neuron
WARNING:tensorflow:From table_lookup.py:30: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
[ 2. -2.]

Output after taking out with statement:

2020-04-28 20:23:45.879770: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.         
Traceback (most recent call last):                                                                                                                                            
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                
    return fn(*args)                                                                                                                                                          
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                 
    target_list, run_metadata)                                                                                                                                                
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                     
    run_metadata)                                                                                                                                                             
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.                                                                                       
         [[{{node hash_table_Lookup/LookupTableFindV2}}]]

jeisinge · 2020-04-29T15:54:04Z

Unfortunately, our model is not easily extracted into sample code. If I get some time this weekend, I'll try to work up a sample example.

Also, I noticed that this example doesn't use https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column .

mrnikwaws · 2020-05-06T21:15:17Z

As discussed offline a fix for this problem has been created and will appear in a future release. Please re-open this issue if you have concerns

mrnikwaws closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF Feature Columns - embedding_column not supported? #105

TF Feature Columns - embedding_column not supported? #105

jeisinge commented Apr 24, 2020

mrnikwaws commented Apr 25, 2020

jeisinge commented Apr 25, 2020

mrnikwaws commented Apr 27, 2020

jeisinge commented Apr 27, 2020

mrnikwaws commented Apr 28, 2020

jeisinge commented Apr 29, 2020

mrnikwaws commented May 6, 2020

TF Feature Columns - embedding_column not supported? #105

TF Feature Columns - embedding_column not supported? #105

Comments

jeisinge commented Apr 24, 2020

mrnikwaws commented Apr 25, 2020

jeisinge commented Apr 25, 2020

mrnikwaws commented Apr 27, 2020

jeisinge commented Apr 27, 2020

mrnikwaws commented Apr 28, 2020

jeisinge commented Apr 29, 2020

mrnikwaws commented May 6, 2020