Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only extracting part of the intermediate feature with DataParallel #1

Open
wydwww opened this issue Mar 27, 2021 · 5 comments
Open

Comments

@wydwww
Copy link

wydwww commented Mar 27, 2021

Hi @antoinebrl,

I am using torch.nn.DataParallel on a 2-GPU machine with a batch size of N. Data parallel training will split the input data batch into 2 pieces sequentially and sends them to GPUs.

When using torchextractor to obtain the intermediate feature, the input data size and the output size are both N as expected, but the feature size becomes N/2. Does this mean we only extract the features of one GPU? I'm not sure because I didn't find an exact match.

Can you please explain why this happens? Maybe the normal behavior is returning features from all GPUs or from a specified one?

A minimal example to reproduce:

import torch
import torchvision
import torchextractor as tx

model = torchvision.models.resnet18(pretrained=True)
model_gpu = torch.nn.DataParallel(torchvision.models.resnet18(pretrained=True))
model_gpu.cuda()

model = tx.Extractor(model, ["layer1"])
model_gpu = tx.Extractor(model_gpu, ["module.layer1"])
dummy_input = torch.rand(8, 3, 224, 224)
_, features = model(dummy_input)
_, features_gpu = model_gpu(dummy_input)
feature_shapes = {name: f.shape for name, f in features.items()}
print(feature_shapes)
feature_shapes_gpu = {name: f.shape for name, f in features_gpu.items()}
print(feature_shapes_gpu)

# {'layer1': torch.Size([8, 64, 56, 56])}
# {'module.layer1': torch.Size([4, 64, 56, 56])}
@wydwww
Copy link
Author

wydwww commented Mar 29, 2021

@wydwww
Copy link
Author

wydwww commented Mar 29, 2021

I made a quick fix by changing

feature_maps[module_name] = output

to

feature_maps[str(input[0].device)][module_name] = output

and

self.feature_maps = {}

to

# nested dictionary
from collections import defaultdict
self.feature_maps = defaultdict(lambda: defaultdict(dict))

Now the test example will output the features from each device:

print(features_gpu['cuda:0']["module.layer1"].shape)
print(features_gpu['cuda:1']["module.layer1"].shape)

# torch.Size([4, 64, 56, 56])
# torch.Size([4, 64, 56, 56])

Can you please address this issue in torchextractor? Thanks.

@antoinebrl
Copy link
Owner

Hi @wydwww!
Interesting issue. Thanks for reporting it with some code examples!
I will investigate and see how it behaves. I will also check with other distrusted computations too.

@wydwww
Copy link
Author

wydwww commented Mar 30, 2021

@antoinebrl Thanks. Please see my updated reply. I missed a line in my previous fix.

@wydwww
Copy link
Author

wydwww commented Jul 12, 2021

Gently ping @antoinebrl
Is there any update for distributed data parallel training? I read from the torch.nn.parallel.DistributedDataParallel documentation that

Forward and backward hooks defined on module and its submodules won’t be invoked anymore, unless the hooks are initialized in the forward() method.

Wondering if torchextractor can work with DistributedDataParallel.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants