
I can't train gpt with 3D-parallel and ZeRO-stage2 at the same time.
It seems peline-parallel in conflict with ZeRO-stage2. I use the pipeline example here: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM-v1.1.5-3D_parallelism.
looking forward to your reply