Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF Feature Columns - embedding_column not supported? #105

Closed
jeisinge opened this issue Apr 24, 2020 · 7 comments
Closed

TF Feature Columns - embedding_column not supported? #105

jeisinge opened this issue Apr 24, 2020 · 7 comments

Comments

@jeisinge
Copy link

We have a TensorFlow Estimator SavedModel. When it is compiled and run on the serving infrastructure, we get the following error:

2020-04-24 22:51:49.219277: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.

We believe this might be due to operations associated with https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column or other feature columns. Are all TF Feature columns supported with Inf/Neuron?

@mrnikwaws
Copy link
Contributor

Thanks for reporting this problem. We'll investigate and get back to you.

Does this problem occur when run locally using Tensorflow 1.15, or only on serving infrastructure?

@jeisinge
Copy link
Author

The exported SavedModel works on TF-Serving. After compilation, it no longer serves on Neuron.

The compilation had the following warnings:

 2020-04-24 21:43:18.973556: I bazel-out/k8-opt/genfiles/tensorflow/python/neuron/convert/segment.cc:460] There are 364 ops of 27 different types in the graph that are not compiled by neuron-cc: Tile, Assert, ExpandDims, Switch, PlaceholderWithDefault, Range, ParseExample, Const, GatherV2, NoOp, OneHot, Placeholder, HashTableV2, SquaredDifference, LookupTableFindV2, AsString, LookupTableSizeV2, SparseFillEmptyRows, SelectV2, Merge, SparseReshape, StringToHashBucketFast, Where, ArgMax, Bucketize, SparseSegmentMean, Unique, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
...
INFO:tensorflow:Number of operations in TensorFlow session: 21918
INFO:tensorflow:Number of operations after tf.neuron optimizations: 11759
INFO:tensorflow:Number of operations placed on Neuron runtime: 202

@mrnikwaws
Copy link
Contributor

Hi Jeisinge,

It looks like this is the same issue reported in this stack overflow post: https://stackoverflow.com/questions/44236090/how-to-keep-lookup-tables-initialized-for-prediction-and-not-just-training. Please refer there for example code snippets.

Can you please try the following and let us know if the issue is resolved?

  • Add an initializer operation when you save the model OR
  • If you are using the tf.estimator API then load the model in python, add an initializer op and re-save it as a new SavedModel

@jeisinge
Copy link
Author

I don't know if this is the same issue.

The poster is not exporting the SavedModel correctly --- it is not running anywhere. However, our SavedModel works well on TensorFlow Serving and in TensorFlow. Also, the high-level API appears to be very different - it is using tf.contrib.lookup in the model; we are using tf.feature_column.embedding_column in the Estimator; in particular, I believe Estimator does all of this for us as does the solution author:

NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default

Why does our Estimator SavedModel work well on TensorFlow Serving, but not compile on Neuron TF Serving?

@mrnikwaws
Copy link
Contributor

Are we able to get some sample code (which does something minimal with the same problem), or can you share your code?

We currently run an optimization pass called convert_variables_to_constants which may cancel some control edges. So your problem may be combination of your code and our optimizations. However we are making best guesses without some sample code.

The following script generates a model that contains a table lookup operator and it works fine with Neuron. Nonetheless the reported error can be triggered if the with statement with sess.graph.control_dependencies([table.initializer]): is not there.

# table_lookup.py
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn


with tf.Session(graph=tf.Graph()) as sess:
    keys_tensor = tf.constant([1, 2])
    vals_tensor = tf.constant([3.0, 4.0])
    input_tensor = tf.placeholder(tf.int32, [2])
    feed_dict = {input_tensor.name: [1, 5]}
    table = tf.lookup.StaticHashTable(
        tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1.0)
    with sess.graph.control_dependencies([table.initializer]):
        lookup = table.lookup(input_tensor)
    tensor = lookup + 1.0
    out = tensor - 2.0

    print(sess.run(out, feed_dict))
    model_dir = './temp_model'
    shutil.rmtree(model_dir, ignore_errors=True)
    inputs = {input_tensor.name: input_tensor}
    outputs = {out.name: out}
    tf.saved_model.simple_save(sess, model_dir, inputs, outputs)

model_dir_neuron = './temp_model_neuron'
shutil.rmtree(model_dir_neuron, ignore_errors=True)
tfn.saved_model.compile(model_dir, model_dir_neuron)

with tf.Session(graph=tf.Graph()) as sess:
    meta_graph = tf.saved_model.loader.load(sess, ['serve'], model_dir_neuron)
    input_tensor = sess.graph.get_tensor_by_name(input_tensor.name)
    out = sess.graph.get_tensor_by_name(out.name)
    print(sess.run(out, feed_dict))

Regular output:

(newenv) [test@cdd examples]$ python table_lookup.py 
WARNING:tensorflow:From table_lookup.py:6: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-04-28 20:25:44.972150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-28 20:25:44.972190: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-28 20:25:44.972209: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cdd): /proc/driver/nvidia/version does not exist
2020-04-28 20:25:44.972539: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-28 20:25:44.983396: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300050000 Hz
2020-04-28 20:25:44.986656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x36a8970 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-28 20:25:44.986683: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From table_lookup.py:9: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[ 2. -2.]
WARNING:tensorflow:From table_lookup.py:23: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2020-04-28 20:25:45.703566: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 6 ops of 5 different types in the graph that are not compiled by neuron-cc: LookupTableImportV2, LookupTableFindV2, HashTableV2, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_794c60f0eaf84c4e with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 11
INFO:tensorflow:Number of operations after tf.neuron optimizations: 12
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:Successfully converted ./temp_model to ./temp_model_neuron
WARNING:tensorflow:From table_lookup.py:30: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
[ 2. -2.]

Output after taking out with statement:

2020-04-28 20:23:45.879770: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.         
Traceback (most recent call last):                                                                                                                                            
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                
    return fn(*args)                                                                                                                                                          
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                 
    target_list, run_metadata)                                                                                                                                                
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                     
    run_metadata)                                                                                                                                                             
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.                                                                                       
         [[{{node hash_table_Lookup/LookupTableFindV2}}]]

@jeisinge
Copy link
Author

Unfortunately, our model is not easily extracted into sample code. If I get some time this weekend, I'll try to work up a sample example.

Also, I noticed that this example doesn't use https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column .

@mrnikwaws
Copy link
Contributor

As discussed offline a fix for this problem has been created and will appear in a future release. Please re-open this issue if you have concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants