Skip to content

Conversation

@SeanNaren
Copy link
Owner

Fixes #475

utils.py Outdated
:param model: The training model
:return: The model without parallel wrapper
"""
return model.module if isinstance(model, torch.nn.parallel.DistributedDataParallel) else model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.nn.parallel.DistributedDataParallel MUST be replaced with apex.parallel.DistributedDataParallel and this is what happened when I executed the following print statements inside train.py:

print('isinstance(model, torch.nn.parallel.DistributedDataParallel): ', isinstance(model, torch.nn.parallel.DistributedDataParallel))
print('isinstance(model, apex.parallel.DistributedDataParallel): ', isinstance(model, apex.parallel.DistributedDataParallel))

The output that I got is:

isinstance(model, torch.nn.parallel.DistributedDataParallel):  False
isinstance(model, apex.parallel.DistributedDataParallel):  True

@SeanNaren
Copy link
Owner Author

I've seen a much cleaner solution to this, checking if the model has a module attribute. I'll implement this instead!

@SeanNaren
Copy link
Owner Author

Added the commit, will test it and merge, @farisalasmary if you're able to test as well that would be good!

@SeanNaren SeanNaren merged commit 450062b into master Apr 6, 2020
@SeanNaren SeanNaren deleted the feature/distributed_wrapper branch April 6, 2020 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError: 'DistributedDataParallel' object has no attribute 'version' at the time of model checkpointing

3 participants