error in multi-gpu distributed training #12

yppr · 2022-08-22T08:00:53Z

Hi, I can run the code with only one GPU. However, errors exist when I use the multi-gpu distributed training. The errors are listed as follows:
raceback (most recent call last):
File "train_spatial_query.py", line 538, in
train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device, tensorboard_writer, args.exp_name)
File "train_spatial_query.py", line 235, in train
fake_img, latents, mean_path_length
File "train_spatial_query.py", line 97, in g_path_regularize
grad, = autograd.grad(outputs=tmp, inputs=latents, create_graph=True)
File "/home/nrr/.conda/envs/stylegan/lib/python3.7/site-packages/torch/autograd/init.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I use four GTX3080 to train the model. The pytorch version is 1.10.2. Could you kindly help me solve the problem. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in multi-gpu distributed training #12

error in multi-gpu distributed training #12

yppr commented Aug 22, 2022

error in multi-gpu distributed training #12

error in multi-gpu distributed training #12

Comments

yppr commented Aug 22, 2022