Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch order of ds init_inference and pipeline construction to save m… #373

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

siddvenk
Copy link
Contributor

@siddvenk siddvenk commented Dec 8, 2022

The order of deepspeed.init_inference and hf.pipeline matters.

If hf pipeline is created first, model is replicated fully across ranks with deepspeed init inference (each rank is trying to split it via tensor parallelism, resulting in rank number of sharded models).

If deepspeed.init_inference happens first, the model is used as is by the hf pipeline since it has already been loaded fully and memory footprint is as expected.

It's probably worth bringing this up with DeepSpeed-MII as well, since they create the HF pipeline first, and then do ds.init_inference https://github.com/microsoft/DeepSpeed-MII/blob/main/mii/models/load_models.py

@lanking520 lanking520 merged commit a010c50 into deepjavalibrary:master Dec 8, 2022
@siddvenk siddvenk deleted the ds-handler branch December 9, 2022 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants