Switch order of ds init_inference and pipeline construction to save m… #373

siddvenk · 2022-12-08T23:46:35Z

The order of deepspeed.init_inference and hf.pipeline matters.

If hf pipeline is created first, model is replicated fully across ranks with deepspeed init inference (each rank is trying to split it via tensor parallelism, resulting in rank number of sharded models).

If deepspeed.init_inference happens first, the model is used as is by the hf pipeline since it has already been loaded fully and memory footprint is as expected.

It's probably worth bringing this up with DeepSpeed-MII as well, since they create the HF pipeline first, and then do ds.init_inference https://github.com/microsoft/DeepSpeed-MII/blob/main/mii/models/load_models.py

…emory

Switch order of ds init_inference and pipeline construction to save m…

9d30ed0

…emory

siddvenk requested review from zachgk and frankfliu as code owners December 8, 2022 23:46

lanking520 approved these changes Dec 8, 2022

View reviewed changes

lanking520 merged commit a010c50 into deepjavalibrary:master Dec 8, 2022

siddvenk deleted the ds-handler branch December 9, 2022 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch order of ds init_inference and pipeline construction to save m… #373

Switch order of ds init_inference and pipeline construction to save m… #373

siddvenk commented Dec 8, 2022 •

edited

Loading

Switch order of ds init_inference and pipeline construction to save m… #373

Switch order of ds init_inference and pipeline construction to save m… #373

Conversation

siddvenk commented Dec 8, 2022 • edited Loading

siddvenk commented Dec 8, 2022 •

edited

Loading