Only extracting part of the intermediate feature with DataParallel #1

wydwww · 2021-03-27T13:07:06Z

I am using torch.nn.DataParallel on a 2-GPU machine with a batch size of N. Data parallel training will split the input data batch into 2 pieces sequentially and sends them to GPUs.

When using torchextractor to obtain the intermediate feature, the input data size and the output size are both N as expected, but the feature size becomes N/2. Does this mean we only extract the features of one GPU? I'm not sure because I didn't find an exact match.

Can you please explain why this happens? Maybe the normal behavior is returning features from all GPUs or from a specified one?

A minimal example to reproduce:

import torch
import torchvision
import torchextractor as tx

model = torchvision.models.resnet18(pretrained=True)
model_gpu = torch.nn.DataParallel(torchvision.models.resnet18(pretrained=True))
model_gpu.cuda()

model = tx.Extractor(model, ["layer1"])
model_gpu = tx.Extractor(model_gpu, ["module.layer1"])
dummy_input = torch.rand(8, 3, 224, 224)
_, features = model(dummy_input)
_, features_gpu = model_gpu(dummy_input)
feature_shapes = {name: f.shape for name, f in features.items()}
print(feature_shapes)
feature_shapes_gpu = {name: f.shape for name, f in features_gpu.items()}
print(feature_shapes_gpu)

# {'layer1': torch.Size([8, 64, 56, 56])}
# {'module.layer1': torch.Size([4, 64, 56, 56])}

The text was updated successfully, but these errors were encountered:

wydwww · 2021-03-29T03:06:04Z

I found some discussions about this hook with DataParallel issue.

https://discuss.pytorch.org/t/aggregating-the-results-of-forward-backward-hook-on-nn-dataparallel-multi-gpu/28981
https://discuss.pytorch.org/t/register-forward-hook-with-multiple-gpus/12115

wydwww · 2021-03-29T05:21:57Z

I made a quick fix by changing

torchextractor/torchextractor/extractor.py

Line 37 in b48ea75

feature_maps[module_name] = output

to

feature_maps[str(input[0].device)][module_name] = output

and

torchextractor/torchextractor/extractor.py

Line 122 in b48ea75

self.feature_maps = {}

to

# nested dictionary
from collections import defaultdict
self.feature_maps = defaultdict(lambda: defaultdict(dict))

Now the test example will output the features from each device:

print(features_gpu['cuda:0']["module.layer1"].shape)
print(features_gpu['cuda:1']["module.layer1"].shape)

# torch.Size([4, 64, 56, 56])
# torch.Size([4, 64, 56, 56])

Can you please address this issue in torchextractor? Thanks.

antoinebrl · 2021-03-29T21:21:59Z

Hi @wydwww!
Interesting issue. Thanks for reporting it with some code examples!
I will investigate and see how it behaves. I will also check with other distrusted computations too.

wydwww · 2021-03-30T03:18:53Z

@antoinebrl Thanks. Please see my updated reply. I missed a line in my previous fix.

wydwww · 2021-07-12T08:19:58Z

Gently ping @antoinebrl
Is there any update for distributed data parallel training? I read from the torch.nn.parallel.DistributedDataParallel documentation that

Forward and backward hooks defined on module and its submodules won’t be invoked anymore, unless the hooks are initialized in the forward() method.

Wondering if torchextractor can work with DistributedDataParallel.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only extracting part of the intermediate feature with DataParallel #1

Only extracting part of the intermediate feature with DataParallel #1

wydwww commented Mar 27, 2021

wydwww commented Mar 29, 2021

wydwww commented Mar 29, 2021 •

edited

antoinebrl commented Mar 29, 2021

wydwww commented Mar 30, 2021

wydwww commented Jul 12, 2021

Only extracting part of the intermediate feature with DataParallel #1

Only extracting part of the intermediate feature with DataParallel #1

Comments

wydwww commented Mar 27, 2021

wydwww commented Mar 29, 2021

wydwww commented Mar 29, 2021 • edited

antoinebrl commented Mar 29, 2021

wydwww commented Mar 30, 2021

wydwww commented Jul 12, 2021

wydwww commented Mar 29, 2021 •

edited