Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Updating deepspeed handler for more models #359

Merged
merged 3 commits into from
Dec 7, 2022

Conversation

siddvenk
Copy link
Contributor

@siddvenk siddvenk commented Dec 5, 2022

Description

This PR adds support for built in handlers to serve models using the DeepSpeed engine.

2 handlers are added/modified:

  • Stable Diffusion handler
  • Transformer/LLM handler

Testing for Stable Diffusion and most language models except bloom have been tested. Things left to be done:

  • Stable Diffusion is a separate handler as I felt it was cleaner this way. It is different enough that implementing it within a generic deepspeed handler felt messy. But, right now it's tied to DS so we may want to modify it so it can work with accelerate or deepspeed
    • Currently I have only tested the text2image task from HF. Others should be easy to add, but might require additional input/output processing
  • In the deepspeed handler
    • Support user supplied injection policy
    • Support user provided MPU
    • Better support for batch inference (right now it works, but the usage/interface could probably be improved)
    • DS config is written for 0.7.6. It can be modified to be compatible with 0.7.5 if that's the version we plan on including in the next release
    • Adding tests

@siddvenk siddvenk changed the title Updating deepspeed handler for more models [WIP] Updating deepspeed handler for more models Dec 5, 2022
self.data_type = None
self.max_tokens = None
self.device = None
self.world_size = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

world_size should be identical to tensor_parallel_degree

f"tensor_parallel_degree: {self.tensor_parallel_degree}")
self.initialized = True

def parse_properties(self, properties):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def parse_properties(self, properties):
def _parse_properties(self, properties):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"max_out_tokens": self.max_tokens,
}

def validate_model_type_and_task(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def validate_model_type_and_task(self):
def _validate_model_type_and_task(self):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

self.model_id = properties.get("model_id")
self.task = properties.get("task")
self.data_type = properties.get("data_type", "fp32")
self.max_tokens = int(properties.get("max_tokens", 1024))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should be a proper default to max_tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deepspeed uses 1024 as the default, so I think keeping it the same as that makes sense.

Though, i think task specific pipelines on HF side might override that and set other defaults.

"enable_cuda_graph": properties.get("enable_cuda_graph", "false").lower() == "true",
"triangular_masking": properties.get("triangular_masking", "true").lower() == "true",
"checkpoint": properties.get("checkpoint"),
"base_dir": properties.get("base_dir"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why user need to set base_dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If base_dir is not set, DS will look for checkpoint in current directory. If checkpoint is in some child dir then base_dir helps to identify it (though it works if you specify the full path to checkpoint and leave base_dir empty). Not sure if we need to expose this, but if user is coming from DS world they might expect this.

self.model_dir = None
self.model_id = None
self.data_type = None
self.max_tokens = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need min_tokens

@lanking520 lanking520 marked this pull request as ready for review December 7, 2022 16:59
@lanking520 lanking520 merged commit b0f7364 into deepjavalibrary:master Dec 7, 2022
@siddvenk siddvenk deleted the deepspeed branch December 8, 2022 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants