-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc] adding release notes for docs.djl.ai #2032
Conversation
- Improved our error handling for container responses for rolling batch. Check this [doc](https://github.com/deepjavalibrary/djl-serving/blob/e4d7e5da822a8c11b13e79eaeaec4101fe678b69/serving/docs/lmi/user_guides/lmi_input_output_schema.md#error-responses) to know more | ||
-New CX capability: | ||
- We introduce OPTION_TGI_COMPAT env which enables you to get the same response format as TGI. [doc]() | ||
- We also now support SSE text/event-stream data format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sse text/event-stream) Is there a way to enable this beyond the tgi compat env var? if so we should probably link a doc, if not I think we can remove it.
- TensorRT-LLM periodically crashes during model compilation | ||
- TensorRT-LLM AWQ quantization currently crashes due to an internal error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need some more info here.
- periodic crashes -> can this happen for any model? does a retry usually succeed?
- awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
periodic crashes -> can this happen for any model? does a retry usually succeed? - Yes it can happen for any model. This is during model compilation, so we dont retry.
awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release? - Done
- Inference CX for rolling batch: | ||
- Token id changed from list into integer in rolling batch response. | ||
- Error handling: “finish_reason: error” during rolling batch inference | ||
- DeepSpeed container has been deprecated, functionality is generally available in the LMI container now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DeepSpeed support has been removed. Please follow the migration guide to transition to one of our other supported backends https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/announcements/deepspeed-deprecation.md"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with Rishabh's comment
|
||
### Release date: June 6, 2024 | ||
|
||
Check out our latest [Large Model Inference Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uri better point to 0.28.0-dlc branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes all the uris now point to 0.28.0-dlc branch
@siddvenk Addressed all the comments and uris now point to 0.28.0-dlc branch |
Description
Brief description of what this PR is about