Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Response streaming over gRPC #4170

Open
Bec-k opened this issue Jun 23, 2023 · 7 comments
Open

feat: Response streaming over gRPC #4170

Bec-k opened this issue Jun 23, 2023 · 7 comments

Comments

@Bec-k
Copy link

Bec-k commented Jun 23, 2023

Feature request

Would be nice to have a streaming feature for generation API, so that response would stream token per token and won't wait until full response is generated. gRPC have built-in support for streaming responses, proto code generation also does that. Only work is required in your server, to pipe tokens into the stream.

Motivation

This feature would allow to stream response while it is generating, instead of waiting until it is fully generated.

Other

No response

@aarnphm
Copy link
Member

aarnphm commented Jun 23, 2023

This would requires BentoML gRPC feature to support streaming, which it is not currently

@aarnphm
Copy link
Member

aarnphm commented Sep 6, 2023

Streaming is now supported via SSE. gRPC streaming will requires streaming support for gRPC on BentoML. I'm going to transfer this to BentoML for now since SSE should be sufficient enough for most use case.

@aarnphm aarnphm closed this as completed Sep 6, 2023
@aarnphm aarnphm reopened this Sep 6, 2023
@aarnphm aarnphm transferred this issue from bentoml/OpenLLM Sep 6, 2023
@Bec-k
Copy link
Author

Bec-k commented Sep 7, 2023

Any documentation is available for that?

@Bec-k
Copy link
Author

Bec-k commented Sep 7, 2023

@Bec-k
Copy link
Author

Bec-k commented Sep 7, 2023

Streaming is now supported via SSE. gRPC streaming will requires streaming support for gRPC on BentoML. I'm going to transfer this to BentoML for now since SSE should be sufficient enough for most use case.

Well, not really. There are a lot of AI pipelines happening internally between servers. There are mostly kafka or gRPC streaming used to communicate between them.

@npuichigo
Copy link

@aarnphm Any roadmap or plan to support grpc streaming?

@parano
Copy link
Member

parano commented Sep 19, 2023

Hi @npuichigo @Bec-k - I would love to connect and hear more about your use case regarding gRPC streaming support, this could really help the team & community to prioritize supporting it. Could you drop me a DM in our community Slack?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants