Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in multi-gpu distributed training #12

Open
yppr opened this issue Aug 22, 2022 · 0 comments
Open

error in multi-gpu distributed training #12

yppr opened this issue Aug 22, 2022 · 0 comments

Comments

@yppr
Copy link

yppr commented Aug 22, 2022

Hi, I can run the code with only one GPU. However, errors exist when I use the multi-gpu distributed training. The errors are listed as follows:
raceback (most recent call last):
File "train_spatial_query.py", line 538, in
train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device, tensorboard_writer, args.exp_name)
File "train_spatial_query.py", line 235, in train
fake_img, latents, mean_path_length
File "train_spatial_query.py", line 97, in g_path_regularize
grad, = autograd.grad(outputs=tmp, inputs=latents, create_graph=True)
File "/home/nrr/.conda/envs/stylegan/lib/python3.7/site-packages/torch/autograd/init.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I use four GTX3080 to train the model. The pytorch version is 1.10.2. Could you kindly help me solve the problem. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant