feat: Response streaming over gRPC #4170

Bec-k · 2023-06-23T10:22:57Z

Feature request

Would be nice to have a streaming feature for generation API, so that response would stream token per token and won't wait until full response is generated. gRPC have built-in support for streaming responses, proto code generation also does that. Only work is required in your server, to pipe tokens into the stream.

Motivation

This feature would allow to stream response while it is generating, instead of waiting until it is fully generated.

Other

No response

aarnphm · 2023-06-23T11:37:17Z

This would requires BentoML gRPC feature to support streaming, which it is not currently

aarnphm · 2023-09-06T19:10:19Z

Streaming is now supported via SSE. gRPC streaming will requires streaming support for gRPC on BentoML. I'm going to transfer this to BentoML for now since SSE should be sufficient enough for most use case.

Bec-k · 2023-09-07T08:07:01Z

Any documentation is available for that?

Bec-k · 2023-09-07T08:09:30Z

I guess this? https://docs.bentoml.org/en/latest/guides/streaming.html

Bec-k · 2023-09-07T08:11:18Z

Streaming is now supported via SSE. gRPC streaming will requires streaming support for gRPC on BentoML. I'm going to transfer this to BentoML for now since SSE should be sufficient enough for most use case.

Well, not really. There are a lot of AI pipelines happening internally between servers. There are mostly kafka or gRPC streaming used to communicate between them.

npuichigo · 2023-09-09T10:43:32Z

@aarnphm Any roadmap or plan to support grpc streaming?

parano · 2023-09-19T00:01:55Z

Hi @npuichigo @Bec-k - I would love to connect and hear more about your use case regarding gRPC streaming support, this could really help the team & community to prioritize supporting it. Could you drop me a DM in our community Slack?

aarnphm closed this as completed Sep 6, 2023

aarnphm reopened this Sep 6, 2023

aarnphm transferred this issue from bentoml/OpenLLM Sep 6, 2023

aarnphm mentioned this issue Sep 7, 2023

feat: Support gRPC in BentoML API Server #703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Response streaming over gRPC #4170

feat: Response streaming over gRPC #4170

Bec-k commented Jun 23, 2023

aarnphm commented Jun 23, 2023

aarnphm commented Sep 6, 2023 •

edited

Bec-k commented Sep 7, 2023

Bec-k commented Sep 7, 2023

Bec-k commented Sep 7, 2023

npuichigo commented Sep 9, 2023

parano commented Sep 19, 2023

feat: Response streaming over gRPC #4170

feat: Response streaming over gRPC #4170

Comments

Bec-k commented Jun 23, 2023

Feature request

Motivation

Other

aarnphm commented Jun 23, 2023

aarnphm commented Sep 6, 2023 • edited

Bec-k commented Sep 7, 2023

Bec-k commented Sep 7, 2023

Bec-k commented Sep 7, 2023

npuichigo commented Sep 9, 2023

parano commented Sep 19, 2023

aarnphm commented Sep 6, 2023 •

edited