Samyamr/largest partitioned params calculation fix#1150
Conversation
largest partitioned params was getting calculated incorrectly
deepspeed/runtime/zero/stage3.py
Outdated
|
|
||
| #Largest partitioned param | ||
| largest_partitioned_param_numel = max(self.fp16_partitioned_groups_flat_numel) | ||
| largest_partitioned_param_numel = max([max([tensor.numel() for tensor in fp16_partitioned_group]) for fp16_partitioned_group in self.fp16_partitioned_groups]) |
There was a problem hiding this comment.
Does self.fp16_partitioned_groups_flat_numel need to be updated?
|
The change uses tensor.numel() for retrieving the sizes. However, if we already built the model in the context of deepspeed.zero.Init(), tensor.numel() or tensor.data get pointing only the placeholder, after partitioning and offloading the original tensor into ds_tensor and ds_numel, accordingly. In some cases, this can cause some problems, including the initialization of a deepspeed model engine after deepspeed.zero.Init(). Please check out this situation and confirm the change is valid. The original version, self.fp16_partitioned_groups_flat_numel [seems that it got retrieved from ds_numel. ](The change uses tensor.numel() for retrieving the sizes. However, if we already built the model in the context of deepspeed.zero.Init(), tensor.numel() or tensor.data get pointing only the placeholder, after partitioning and offloading the original tensor into ds_tensor and ds_numel, accordingly. In some cases, this can cause some problems, including the initialization of a deepspeed model engine after deepspeed.zero.Init(). Please check out this situation and confirm the change is valid. The original version, self.fp16_partitioned_groups_flat_numel seems that it got retrieved from ds_numel. https://github.com/microsoft/DeepSpeed/blob/7567c76c05626c5acd8b5700bedfc412c55d5354/deepspeed/runtime/zero/stage3.py#L1151) |
largest_partiitoned_params is calculated incorrectly making it much larger than it has to be. The PR fixes the calculation.