Deepspeed communication pattern clarification

Queries regarding communcation calls in Deepspeed for Stage1(Zero1) and Stage2(Zero2):

1.  As per paper, Stage1(Zero1) has Optimizer split implementation and we should see below calls on 2 GPU's(1 layer GPT2/345m model) with NCCL:
    1 call - Reduce_Scatter for Gradients 
    1 call - All_Gather for optimizer
    
	As per details mentioned in one of prior github issues(Deepspeed) that reduce_scatter for stage 1 is not implemented. Thats why it calls all_reduce for stage1 Zero optimizer. Is it still not implemented? 
	
    Based on the logs we are seeing All_Reduce calls in the backpass + Reduce_Scatter calls post backpass before optimizer step. So can you explain why reduce_scatter needed here as we can just do all_reduce for the gradient and all_gather for optimizer? 
	
	We are seeing an overflow occur at the end of Optimizer step(we expected to see All_Gather comms at the end of the optimizer which we dont observe). Can you pls explain why overflow occurs as mentioned in the below logs:  
	rank 0 detected overflow nan in tensor 0:1 shape torch.Size([50304, 1024])
	[deepspeed] OVERFLOW! Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296

    
2.  For Stage2(Zero2), Zero with Optimizer+gradient split- In the backpass the gradient should get scattered to gpu that owns that gradient chunk for reduce, and then perform all gather for optimizer(as per paper). In the run logs we are seeing:
	No calls in the backpass which means no comms and compute overlap in backpass because of enabling fusion buffer. As a result we have see only 2 All_Reduce calls post backpass for gradients. Pls confirm if our understanding is correct? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed communication pattern clarification #264

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Deepspeed communication pattern clarification #264

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions