Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Graph Fusion] An error occurred when Auto Graph Fusion enabled in modelzoo's DIEN. #144

Closed
Duyi-Wang opened this issue Apr 1, 2022 · 1 comment
Assignees

Comments

@Duyi-Wang
Copy link
Contributor

Duyi-Wang commented Apr 1, 2022

An error occurred when Auto Graph Fusion enabled in modelzoo's DIEN.

Reproduce the issue
The code and dataset is provide in docker image, docker pull cesg-prc-registry.cn-beijing.cr.aliyuncs.com/cesg-ali/deeprec-modelzoo:220401-dien-issue
The DeepRec installed in the image is built on f4368d6
And run following code to reproduce the issue.

/root/modelzoo/DIEN
python train.py --steps 100 --no_eval --op_fusion True

Other info / logs

2022-04-01 02:58:35.554337: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_else_const head/gradients/head/loss/xentropy/Select_grad/zeros_like
2022-04-01 02:58:35.554414: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/head/loss/xentropy/Select_grad/Select_1
2022-04-01 02:58:35.554462: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/head/loss/xentropy/Select_grad/tuple/control_dependency_1
2022-04-01 02:58:35.554552: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_else_const head/gradients/attention_layer/Select_grad/zeros_like
2022-04-01 02:58:35.554612: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/attention_layer/Select_grad/Select_1
2022-04-01 02:58:35.554668: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/attention_layer/Select_grad/tuple/control_dependency_1
2022-04-01 02:58:35.554933: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/input_layer/UID_embedding/UID_embedding_weights_grad/zeros_like
2022-04-01 02:58:35.554993: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/input_layer/UID_embedding/UID_embedding_weights_grad/Select
2022-04-01 02:58:35.555030: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/input_layer/UID_embedding/UID_embedding_weights_grad/tuple/control_dependency
2022-04-01 02:58:35.555062: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_4_grad/zeros_like
2022-04-01 02:58:35.555117: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_4_grad/Select
2022-04-01 02:58:35.555166: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_4_grad/tuple/control_dependency
2022-04-01 02:58:35.555187: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_5_grad/zeros_like
2022-04-01 02:58:35.555234: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_5_grad/Select
2022-04-01 02:58:35.555279: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_5_grad/tuple/control_dependency
2022-04-01 02:58:35.555318: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/rnn_1/gru1/while/Select_grad/zeros_like
2022-04-01 02:58:35.555383: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/rnn_1/gru1/while/Select_grad/Select
2022-04-01 02:58:35.555449: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/rnn_1/gru1/while/Select_grad/tuple/control_dependency
2022-04-01 02:58:35.555466: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_grad/zeros_like
2022-04-01 02:58:35.555530: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_grad/Select
2022-04-01 02:58:35.555594: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_grad/tuple/control_dependency
2022-04-01 02:58:35.555610: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_1_grad/zeros_like
2022-04-01 02:58:35.555673: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_1_grad/Select
2022-04-01 02:58:35.555737: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_1_grad/tuple/control_dependency
2022-04-01 02:58:35.555764: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_2_grad/zeros_like
2022-04-01 02:58:35.555842: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_2_grad/Select
2022-04-01 02:58:35.555920: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_2_grad/tuple/control_dependency
2022-04-01 02:58:35.555937: I ./tensorflow/core/graph/template_select_pruning_base.h:70] Found match op by select_pruning_then_const head/gradients/input_layer/embedding_lookup_3_grad/zeros_like
2022-04-01 02:58:35.556015: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_3_grad/Select
2022-04-01 02:58:35.556092: I ./tensorflow/core/graph/template_select_pruning_base.h:77] remove node: head/gradients/input_layer/embedding_lookup_3_grad/tuple/control_dependency
2022-04-01 02:58:35.556208: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/input_layer/UID_embedding/UID_embedding_weights]
2022-04-01 02:58:35.556260: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup]
2022-04-01 02:58:35.556278: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup_1]
2022-04-01 02:58:35.556294: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup_2]
2022-04-01 02:58:35.556312: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup_3]
2022-04-01 02:58:35.556330: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup_4]
2022-04-01 02:58:35.556346: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar] match op[input_layer/embedding_lookup_5]
2022-04-01 02:58:35.556676: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar] match op[head/loss/xentropy/Select]
2022-04-01 02:58:35.556988: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar_in_grad] match op[head/gradients/head/loss/xentropy/Select_grad/Select]
2022-04-01 02:58:35.557014: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar_in_grad] match op[head/gradients/head/loss/xentropy/Select_1_grad/Select]
2022-04-01 02:58:35.557041: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar_in_grad] match op[head/gradients/rnn_2/gru2/while/Select_1_grad/Select]
2022-04-01 02:58:35.557072: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar_in_grad] match op[head/gradients/attention_layer/Select_grad/Select]
2022-04-01 02:58:35.557095: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_else_scalar_in_grad] match op[head/gradients/rnn_1/gru1/while/Select_1_grad/Select]
2022-04-01 02:58:35.557373: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/head/loss/xentropy/Select_1_grad/Select_1]
2022-04-01 02:58:35.557415: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/input_layer/UID_embedding/UID_embedding_weights_grad/Select_1]
2022-04-01 02:58:35.557431: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/rnn_2/gru2/while/Select_1_grad/Select_1]
2022-04-01 02:58:35.557453: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_4_grad/Select_1]
2022-04-01 02:58:35.557466: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_5_grad/Select_1]
2022-04-01 02:58:35.557495: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/rnn_1/gru1/while/Select_1_grad/Select_1]
2022-04-01 02:58:35.557509: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/rnn_1/gru1/while/Select_grad/Select_1]
2022-04-01 02:58:35.557523: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_grad/Select_1]
2022-04-01 02:58:35.557536: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_1_grad/Select_1]
2022-04-01 02:58:35.557560: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_2_grad/Select_1]
2022-04-01 02:58:35.557572: I ./tensorflow/core/graph/template_select_base.h:36] Fusion template[select_then_scalar_in_grad] match op[head/gradients/input_layer/embedding_lookup_3_grad/Select_1]
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-04-01 02:58:37.395205: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] function_optimizer failed: Invalid argument: {{node head/gradients/rnn_2/gru2/while/add_1_grad/Reshape}} has inputs from different frames. The input {{node head/gradients/rnn_2/gru2/while/add_1_grad/BroadcastGradientArgs/StackPopV2}} is in frame 'head/gradients/rnn_2/gru2/while/while_context'. The input {{node head/gradients/rnn_2/gru2/while/add_1_grad/Sum}} is in frame ''.
2022-04-01 02:58:37.878245: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] function_optimizer failed: Invalid argument: {{node head/gradients/rnn_2/gru2/while/Switch_3_grad/b_switch}} has inputs from different frames. The input {{node head/gradients/rnn_2/gru2/while/Switch_3_grad_1/NextIteration}} is in frame ''. The input {{node head/gradients/rnn_2/gru2/while/Exit_3_grad/b_exit}} is in frame 'head/gradients/rnn_2/gru2/while/while_context'.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{node fused_op_3_select_else_scalar_in_grad}} has inputs from different frames. The input {{node head/gradients/rnn_2/gru2/while/Select_1_grad/Select/StackPopV2}} is in frame 'head/gradients/rnn_2/gru2/while/while_context'. The input {{node head/clip_by_norm_25/Greater/y}} is in frame ''.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 1147, in <module>
    main()
  File "train.py", line 927, in main
    checkpoint_dir, tf_config, server)
  File "train.py", line 786, in train
    sess.run([model.loss, model.train_op])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 804, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1309, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1410, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1395, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1468, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1226, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{node fused_op_3_select_else_scalar_in_grad}} has inputs from different frames. The input node head/gradients/rnn_2/gru2/while/Select_1_grad/Select/StackPopV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  is in frame 'head/gradients/rnn_2/gru2/while/while_context'. The input node head/clip_by_norm_25/Greater/y (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  is in frame ''.
@liutongxuan
Copy link
Member

Fixed by #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants