Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] adding release notes for docs.djl.ai #2032

Merged
merged 4 commits into from
Jun 14, 2024

Conversation

sindhuvahinis
Copy link
Contributor

Description

Brief description of what this PR is about

  • If this change is a backward incompatible change, why must this change be made?
  • Interesting edge cases to note here

@sindhuvahinis sindhuvahinis requested review from zachgk, frankfliu and a team as code owners June 6, 2024 22:34
serving/docs/lmi/user_guides/release_notes.md Show resolved Hide resolved
serving/docs/lmi/user_guides/release_notes.md Outdated Show resolved Hide resolved
serving/docs/lmi/user_guides/release_notes.md Outdated Show resolved Hide resolved
- Improved our error handling for container responses for rolling batch. Check this [doc](https://github.com/deepjavalibrary/djl-serving/blob/e4d7e5da822a8c11b13e79eaeaec4101fe678b69/serving/docs/lmi/user_guides/lmi_input_output_schema.md#error-responses) to know more
-New CX capability:
- We introduce OPTION_TGI_COMPAT env which enables you to get the same response format as TGI. [doc]()
- We also now support SSE text/event-stream data format.
Copy link
Contributor

@siddvenk siddvenk Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sse text/event-stream) Is there a way to enable this beyond the tgi compat env var? if so we should probably link a doc, if not I think we can remove it.

Comment on lines 63 to 64
- TensorRT-LLM periodically crashes during model compilation
- TensorRT-LLM AWQ quantization currently crashes due to an internal error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need some more info here.

  • periodic crashes -> can this happen for any model? does a retry usually succeed?
  • awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

periodic crashes -> can this happen for any model? does a retry usually succeed? - Yes it can happen for any model. This is during model compilation, so we dont retry.
awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release? - Done

serving/docs/lmi/user_guides/release_notes.md Outdated Show resolved Hide resolved
- Inference CX for rolling batch:
- Token id changed from list into integer in rolling batch response.
- Error handling: “finish_reason: error” during rolling batch inference
- DeepSpeed container has been deprecated, functionality is generally available in the LMI container now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DeepSpeed support has been removed. Please follow the migration guide to transition to one of our other supported backends https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/announcements/deepspeed-deprecation.md"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with Rishabh's comment


### Release date: June 6, 2024

Check out our latest [Large Model Inference Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uri better point to 0.28.0-dlc branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all the uris now point to 0.28.0-dlc branch

@sindhuvahinis
Copy link
Contributor Author

@siddvenk Addressed all the comments and uris now point to 0.28.0-dlc branch

@sindhuvahinis sindhuvahinis merged commit aaa788a into deepjavalibrary:master Jun 14, 2024
2 checks passed
@sindhuvahinis sindhuvahinis deleted the dock branch June 20, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants