[Disco] Loading-time sharding support by LeshengJin · Pull Request #15826 · apache/tvm

LeshengJin · 2023-09-26T21:30:42Z

In our previous implementation, parameter sharding relies on pre-quantization weight processing, meaning each set of quantized weights corresponds strictly to a hardcoded constant num_shards, and re-quantization is strictly required upon each change of #GPUs, e.g. from 4-GPU to 8-GPU setting. This PR makes it possible to move parameter sharding to post-quantization loading-time. During loading, we iterate over all parameters and apply the sharding operation based on the provided sharding information.

To make this happen, this PR makes an enhancement to the existing shard_info.json to include the sharding function being used at loading time. Each parameter is attached to a list of loading-time preprocessing methods that are serially applied to it to transform this parameter to the desired shape, as shown in the example below:

shard_info = {
  "x_0": [ # name of the parameter
    [ # a list of preprocessing functions to be applied
      "tests.disco.shard_dim_1",  # name of the sharding function
      [(num_shards, 64, 64), "float16"],  # output shape/dtype of `tests.disco.shard_dim_1`
      num_shards,  # extra inputs to `tests.disco.shard_dim_1`
    ],
  ],
  "x_1": [...],
}

To parameter x_0, it means we will call method tests.disco.shard_dim_1 which has the signature:

def shard_dim_1(
  input: NDArray,
  num_shards, # extra inputs
  output: NDArray, # and its shape is (num_shards, 64, 64), and dtype is "float16"
) -> None: ...

This approach simplifies parameter sharding for users and ensures correctness.

junrushao · 2023-09-27T01:34:15Z

CC: @jinhongyii

src/runtime/disco/loader.cc

junrushao

Let's unify sharding and reordering into a single function - no need for shard3d any more :)

jinhongyii · 2023-09-30T23:01:31Z

Can we separate reorganizing function and sharding function again? I'm thinking of some situation where there can be 2 reorganizing function ( doing nothing and combine qkv), and n different sharding functions( different tensor shape and different sharding dimension). In this case, we will need at most 2n functions in total in IRModule. Another benefit of separating them is that DistIR will not need to handle merging reorganizing function and sharding function.

In our previous implementation, parameter sharding relies on pre-quantization weight processing, meaning each set of quantized weights corresponds strictly to a hardcoded constant `num_shards`, and re-quantization is strictly required upon each change of #GPUs, e.g. from 4-GPU to 8-GPU setting. This PR makes it possible to move parameter sharding to post-quantization loading-time. During loading, we iterate over all parameters and apply the sharding operation based on the provided sharding information. To make this happen, this PR makes an enhancement to the existing `shard_info.json` to include the sharding function being used at loading time. Each parameter is attached to a list of loading-time preprocessing methods that are serially applied to it to transform this parameter to the desired shape, as shown in the example below: ```python shard_info = { "x_0": [ # name of the parameter [ # a list of preprocessing functions to be applied "tests.disco.shard_dim_1", # name of the sharding function [(num_shards, 64, 64), "float16"], # output shape/dtype of `tests.disco.shard_dim_1` num_shards, # extra inputs to `tests.disco.shard_dim_1` ], ], "x_1": [...], } ``` To parameter `x_0`, it means we will call method `tests.disco.shard_dim_1` which has the signature: ```python def shard_dim_1( input: NDArray, num_shards, # extra inputs output: NDArray, # and its shape is (num_shards, 64, 64), and dtype is "float16" ) -> None: ... ``` This approach simplifies parameter sharding for users and ensures correctness.

LeshengJin changed the title ~~[Disco] Advanced sharding support~~ [Disco] Loading-time sharding support Sep 26, 2023

LeshengJin force-pushed the Disco/shard branch 2 times, most recently from 0e1a24c to 191d3f7 Compare September 27, 2023 00:39

LeshengJin marked this pull request as ready for review September 27, 2023 01:14

LeshengJin mentioned this pull request Sep 27, 2023

Finalize Multi-GPU Inference mlc-ai/mlc-llm#985

Merged

jinhongyii reviewed Sep 27, 2023

View reviewed changes

src/runtime/disco/loader.cc Outdated Show resolved Hide resolved

junrushao requested changes Sep 28, 2023

View reviewed changes

LeshengJin force-pushed the Disco/shard branch 3 times, most recently from 52450e8 to 594f7e0 Compare September 28, 2023 06:04

junrushao force-pushed the Disco/shard branch 3 times, most recently from 24bfad2 to 959dd58 Compare October 1, 2023 15:12

junrushao approved these changes Oct 1, 2023

View reviewed changes

junrushao force-pushed the Disco/shard branch 2 times, most recently from 98177e5 to bfd621d Compare October 1, 2023 16:18

jinhongyii approved these changes Oct 1, 2023

View reviewed changes

junrushao force-pushed the Disco/shard branch 2 times, most recently from 6730cfe to 10b3c18 Compare October 3, 2023 01:29

junrushao force-pushed the Disco/shard branch from 10b3c18 to 251e42a Compare October 3, 2023 01:38

junrushao merged commit 88a08ae into apache:unity Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Disco] Loading-time sharding support#15826

[Disco] Loading-time sharding support#15826
junrushao merged 1 commit intoapache:unityfrom
LeshengJin:Disco/shard

LeshengJin commented Sep 26, 2023 •

edited by junrushao

Loading

Uh oh!

junrushao commented Sep 27, 2023

Uh oh!

Uh oh!

junrushao left a comment

Uh oh!

jinhongyii commented Sep 30, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LeshengJin commented Sep 26, 2023 • edited by junrushao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junrushao commented Sep 27, 2023

Uh oh!

Uh oh!

junrushao left a comment

Choose a reason for hiding this comment

Uh oh!

jinhongyii commented Sep 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LeshengJin commented Sep 26, 2023 •

edited by junrushao

Loading

jinhongyii commented Sep 30, 2023 •

edited

Loading