Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topological sort failed #9

Closed
YeTianJHU opened this issue Nov 5, 2018 · 6 comments
Closed

topological sort failed #9

YeTianJHU opened this issue Nov 5, 2018 · 6 comments

Comments

@YeTianJHU
Copy link

YeTianJHU commented Nov 5, 2018

When I was trying to generate gifs from a pre-trained model on the bair dataset, this error happens:

evaluation samples from 0 to 8
2018-11-05 13:18:39.651172: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:39.658586: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:41.185743: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Segmentation fault (core dumped)

I'm using tensorflow 1.10.1, CUDA 9.0 and cuDNN 7.1 on Ubuntu 16.04. The CUDA/ cuDNN are installed properly. My GPUs are dual nvidia titan v. I also tried tensorflow 1.6.0 but the same result. Do you have any ideas about this error? Thanks in advance!

Update: I also tried cuDNN 7.0.5 but still have this problem. Thanks!

@Glooow1024
Copy link

When I was trying to train a savp model, I met the same problem. Have you solved this issue?

@YeTianJHU
Copy link
Author

No. I think it may relate to some cuDNN version issues. Still working on it.

@alexlee-gk
Copy link
Owner

The first 2 errors about topological order doesn't seem to be the issue and you can probably ignore it.

As you said, the problem is likely cudnn. According to here, you should use cuDNN SDK >= 7.2. Can you upgrade cudnn?

@YeTianJHU
Copy link
Author

Problem solved. I need to reinstall tensorflow after upgrading the cudnn. Many thanks!!

@Glooow1024
Copy link

Hi Alex,

I use tensorflow 1.11.0, CUDA 9.0.176 and cuDNN 7.3.1 on Ubuntu 16.04. My GPUs are nvidia titan Xp. When I was trying to train a savp model with command

CUDA_VISIBLE_DEVICES=0,1 python scripts/train.py --input_dir data/bair --dataset bair \
  --model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json \
  --output_dir logs/bair_action_free/ours_savp \
  --gpu_mem_frac 0.7

the program seems to stop runing and never move forward like in an endless loop, after outputing

2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

What's the possible reason for this?
More information is below

2018-11-08 08:13:09.286148: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:13:09.515356: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
session.run took 22.4s
recording summary
done
recording image summary
done
progress  global step 0  epoch 0  step 2560
discrim_video_sn_gan_loss (1.0238764, 1.0)
discrim_video_sn_vae_gan_loss (0.895689, 1.0)
gen_l1_loss (0.0804427, 100.0)
gen_video_sn_gan_loss (1.0158763, 1.0)
gen_video_sn_vae_gan_loss (0.8958128, 1.0)
gen_kl_loss (0.045274347, 0.0)
learning_rate 0.0002
saving model to logs/bair_action_free/ours_savp
done
2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

@datianshi21
Copy link

Same problem here, please help
2019-05-14 15:09:28.745521: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:09:30.421876: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:13:00.040767: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.9.2 locally INFO:tensorflow:loss = 0.7136042, step = 0 2019-05-14 15:13:45.954118: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:13:47.336729: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants