Skip to content

A request for clarity around 3D Parallelism in DeepSpeed #673

@stas00

Description

@stas00

Let's start with saying that based on my reading of various papers Model Parallelism (MP) is a very inconsistent term. One can slice vertically or horizontally. One can implement a naive slow version or speed it up with pipelining, and almost none of them is really parallel. I tried to summarize and demo a few of the basic options here: huggingface/transformers#8771 (comment)

So then DeepSpeed talks a lot about 3D parallelism, the blog posts like this state multiple times that DeepSpeed uses 3D parallelism.

Then @samyam kindly reviewed the draft of the upcoming blog post about DeepSpeed integration in transformers, where he suggests that no, DeepSpeed doesn't do 3D parallelism.

To me it does look like DeepSpeed implements all 3:

  1. DP - yes
  2. PP - yes
  3. MP - yes, the key innovation of sharding is a form of horizontal MP.

So please correct me if I'm wrong in that DeepSpeed isn't already doing 3D.

Quotes from the blog post: https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/

Trillion parameter model training with 3D parallelism: DeepSpeed enables a flexible combination of three parallelism approaches—ZeRO-powered data parallelism, pipeline parallelism, and tensor-slicing model parallelism. 3D parallelism adapts to the varying needs of workload requirements to power extremely large models with over a trillion parameters while achieving near-perfect memory-scaling and throughput-scaling efficiency. In addition, its improved communication efficiency allows users to train multi-billion-parameter models 2–7x faster on regular clusters with limited network bandwidth.

and then later:

DeepSpeed has combined three powerful technologies to enable training trillion-scale models and to scale to thousands of GPUs: data parallel training, model parallel training, and pipeline parallel training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions