[doc] adding release notes for docs.djl.ai #2032

sindhuvahinis · 2024-06-06T22:34:57Z

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

serving/docs/lmi/user_guides/release_notes.md

siddvenk · 2024-06-07T04:33:34Z

serving/docs/lmi/user_guides/release_notes.md

+        - Improved our error handling for container responses for rolling batch. Check this [doc](https://github.com/deepjavalibrary/djl-serving/blob/e4d7e5da822a8c11b13e79eaeaec4101fe678b69/serving/docs/lmi/user_guides/lmi_input_output_schema.md#error-responses) to know more
+      -New CX capability:
+        - We introduce OPTION_TGI_COMPAT env which enables you to get the same response format as TGI. [doc]()
+        - We also now support SSE text/event-stream data format. 


(sse text/event-stream) Is there a way to enable this beyond the tgi compat env var? if so we should probably link a doc, if not I think we can remove it.

siddvenk · 2024-06-07T04:36:09Z

serving/docs/lmi/user_guides/release_notes.md

+  - TensorRT-LLM periodically crashes during model compilation
+  - TensorRT-LLM AWQ quantization currently crashes due to an internal error


I think we might need some more info here.

periodic crashes -> can this happen for any model? does a retry usually succeed?

awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release?

periodic crashes -> can this happen for any model? does a retry usually succeed? - Yes it can happen for any model. This is during model compilation, so we dont retry.
awq quantization -> i think we can say "runtime" quantization specifically. Also, since this is an issue on our end, maybe we can say it will be fixed in an upcomming release? - Done

serving/docs/lmi/user_guides/release_notes.md

siddvenk · 2024-06-07T04:37:16Z

serving/docs/lmi/user_guides/release_notes.md

+- Inference CX for rolling batch:
+  - Token id changed from list into integer in rolling batch response.
+  - Error handling: “finish_reason: error” during rolling batch inference
+- DeepSpeed container has been deprecated, functionality is generally available in the LMI container now


"DeepSpeed support has been removed. Please follow the migration guide to transition to one of our other supported backends https://github.com/deepjavalibrary/djl-serving/blob/master/serving/docs/lmi/announcements/deepspeed-deprecation.md"

Updated with Rishabh's comment

lanking520 · 2024-06-07T16:16:20Z

serving/docs/lmi/user_guides/release_notes.md

+
+### Release date: June 6, 2024
+
+Check out our latest [Large Model Inference Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers).


The uri better point to 0.28.0-dlc branch

Yes all the uris now point to 0.28.0-dlc branch

sindhuvahinis · 2024-06-14T00:04:02Z

@siddvenk Addressed all the comments and uris now point to 0.28.0-dlc branch

[doc] adding release notes for docs.djl.ai

c6dfd57

sindhuvahinis requested review from zachgk, frankfliu and a team as code owners June 6, 2024 22:34

zachgk reviewed Jun 6, 2024

View reviewed changes

serving/docs/lmi/user_guides/release_notes.md Show resolved Hide resolved

add new lines

004e7e5

siddvenk reviewed Jun 7, 2024

View reviewed changes

lanking520 reviewed Jun 7, 2024

View reviewed changes

sindhuvahinis added 2 commits June 7, 2024 11:47

add docs

e9a1ae0

0.28.0 uris

81faf11

siddvenk approved these changes Jun 14, 2024

View reviewed changes

sindhuvahinis merged commit aaa788a into deepjavalibrary:master Jun 14, 2024
2 checks passed

sindhuvahinis deleted the dock branch June 20, 2024 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] adding release notes for docs.djl.ai #2032

[doc] adding release notes for docs.djl.ai #2032

sindhuvahinis commented Jun 6, 2024

siddvenk Jun 7, 2024 •

edited

Loading

siddvenk Jun 7, 2024

sindhuvahinis Jun 13, 2024

siddvenk Jun 7, 2024

sindhuvahinis Jun 13, 2024

lanking520 Jun 7, 2024

sindhuvahinis Jun 13, 2024

sindhuvahinis commented Jun 14, 2024

		- TensorRT-LLM periodically crashes during model compilation
		- TensorRT-LLM AWQ quantization currently crashes due to an internal error


		### Release date: June 6, 2024

		Check out our latest [Large Model Inference Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers).

[doc] adding release notes for docs.djl.ai #2032

[doc] adding release notes for docs.djl.ai #2032

Conversation

sindhuvahinis commented Jun 6, 2024

Description

siddvenk Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

siddvenk Jun 7, 2024

Choose a reason for hiding this comment

sindhuvahinis Jun 13, 2024

Choose a reason for hiding this comment

siddvenk Jun 7, 2024

Choose a reason for hiding this comment

sindhuvahinis Jun 13, 2024

Choose a reason for hiding this comment

lanking520 Jun 7, 2024

Choose a reason for hiding this comment

sindhuvahinis Jun 13, 2024

Choose a reason for hiding this comment

sindhuvahinis commented Jun 14, 2024

siddvenk Jun 7, 2024 •

edited

Loading