-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StopIteration
Exception in Conformer Encoder because of next(self.parameters())
#4430
Comments
StopIteration
Exception in Conformer Encoder because of next(self.parameters())StopIteration
Exception in Conformer Encoder because of next(self.parameters())
@grazder could you give an example script of what you're trying to do? Also, we typically use DDP for multi-gpu. |
According to pytorch ligntinings design, it is not advised to pass around device parameter. It also invites more errors due to tensors created on device not belonging to correct device of parameters under DDP. It is to be noted that DDP is the only recommended way to perform distributed computing in pytorch, and Nemo ASR does not support DP at all (most of our classes are unpickleable) |
Here is repro script:
Output Exception:
|
Yes, as stated we don't encourage use of DP. Please use DDP. |
Hi, I get
StopIteration
exception here:NeMo/nemo/collections/asr/modules/conformer_encoder.py
Line 231 in 41f27a5
I call this method from
forward_for_export
here:NeMo/nemo/collections/asr/modules/conformer_encoder.py
Line 249 in 41f27a5
I found that
self.parameters()
is zero length. Found this happens when I use multiple GPUs in DP mode. When I run the code on one GPU I getself.parameters() > 0
.Issue about it: pytorch/pytorch#40457
Maybe it would be better if the device was passed into the method, like so:
Because it's strange to get such problems when getting a device, when it can be easily passed as a parameter.
The text was updated successfully, but these errors were encountered: