Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training unbalance on different GPUs? #8

Open
Epiphqny opened this issue Dec 2, 2019 · 2 comments
Open

Training unbalance on different GPUs? #8

Epiphqny opened this issue Dec 2, 2019 · 2 comments

Comments

@Epiphqny
Copy link

Epiphqny commented Dec 2, 2019

I used 8 gpus to train the model, but most memory is placed on the first GPU and i can not fully utilize other gpus, is threre any solution? thanks!

@Epiphqny Epiphqny changed the title training unbalanced on different GPUs? Training unbalance on different GPUs? Dec 2, 2019
@PingoLH
Copy link
Owner

PingoLH commented Dec 2, 2019

Hello, good question! I've also faced this problem before. You can try this one:
#distribute model on first 7 gpus
model = torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6])
images = images.to(device)
#send output and label to the last gpu
labels = labels.to(device).cuda(7)
optimizer.zero_grad()
outputs = model(images).cuda(7)
#after computing the loss, send loss back to gpu 0 for backpropagation
loss = loss_fn(input=outputs, target=labels).cuda(0)

@Epiphqny
Copy link
Author

Epiphqny commented Dec 3, 2019

Hello, good question! I've also faced this problem before. You can try this one:
#distribute model on first 7 gpus
model = torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6])
images = images.to(device)
#send output and label to the last gpu
labels = labels.to(device).cuda(7)
optimizer.zero_grad()
outputs = model(images).cuda(7)
#after computing the loss, send loss back to gpu 0 for backpropagation
loss = loss_fn(input=outputs, target=labels).cuda(0)

I have tried this code, but the memory of the GPU 7 still limits the batch size, and other gpu memory can not be fully utilized, then there is no need to use multi-gpu....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants