-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Updating deepspeed handler for more models #359
Conversation
self.data_type = None | ||
self.max_tokens = None | ||
self.device = None | ||
self.world_size = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
world_size should be identical to tensor_parallel_degree
f"tensor_parallel_degree: {self.tensor_parallel_degree}") | ||
self.initialized = True | ||
|
||
def parse_properties(self, properties): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def parse_properties(self, properties): | |
def _parse_properties(self, properties): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"max_out_tokens": self.max_tokens, | ||
} | ||
|
||
def validate_model_type_and_task(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def validate_model_type_and_task(self): | |
def _validate_model_type_and_task(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
self.model_id = properties.get("model_id") | ||
self.task = properties.get("task") | ||
self.data_type = properties.get("data_type", "fp32") | ||
self.max_tokens = int(properties.get("max_tokens", 1024)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should be a proper default to max_tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deepspeed uses 1024 as the default, so I think keeping it the same as that makes sense.
Though, i think task specific pipelines on HF side might override that and set other defaults.
"enable_cuda_graph": properties.get("enable_cuda_graph", "false").lower() == "true", | ||
"triangular_masking": properties.get("triangular_masking", "true").lower() == "true", | ||
"checkpoint": properties.get("checkpoint"), | ||
"base_dir": properties.get("base_dir"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why user need to set base_dir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If base_dir is not set, DS will look for checkpoint in current directory. If checkpoint is in some child dir then base_dir helps to identify it (though it works if you specify the full path to checkpoint and leave base_dir empty). Not sure if we need to expose this, but if user is coming from DS world they might expect this.
self.model_dir = None | ||
self.model_id = None | ||
self.data_type = None | ||
self.max_tokens = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need min_tokens
Description
This PR adds support for built in handlers to serve models using the DeepSpeed engine.
2 handlers are added/modified:
Testing for Stable Diffusion and most language models except bloom have been tested. Things left to be done: