Fix a crash in NeMo 2.0 during module._apply(lambda t: t.cpu())#1502
Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
timmoon10
left a comment
There was a problem hiding this comment.
Can you explain the race condition that this is fixing? From what I can tell, Float8Tensor.cpu should already synchronize the GPU:
TransformerEngine/transformer_engine/pytorch/tensor/quantized_tensor.py
Lines 321 to 323 in e70f913
The actual problem is later in
Float8Tensor._set_data:We are passing a CPU tensor into the quantize kernel, and I don't think we ever move it to GPU. This doesn't explain why this PR fixes the IMA, so I could have missed something.
If my interpretation is the actual root cause, the quickest fix is to modify Float8Tensor._set_data with:
self.data = self._quantizer.quantize(tensor.to(device=self.device))More long-term fixes are to handle CPU tensors in the quantize function or to support CPU Float8Tensors.
|
@timmoon10 I also don't fully understand the race condition, it just happened to work. I will dig into it, and reply here my findings. |
|
Findings:
|
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
|
I decide to revert the |
|
/te-ci pytorch |
* Fix a crash with module._apply(lambda t: t.cpu()) Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Add comments Signed-off-by: Guyue Huang <guyueh@nvidia.com> * Make sure tensor is moved to dst device before quantizer quantizes Signed-off-by: Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Description
In Nemo 2.0 during job exit, lightning calls a
module._apply(lambda t: t.cpu())on the GPT model which triggers an illegal memory access error in the TE dequantize kernel. This PR fixes the issue.Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: