[Feature Request] Streaming Conversational Responses #1727

Bajocode · 2024-05-13T15:35:22Z

Description

OpenAI and the majority of LLM's allow for response streaming.

To get responses sooner, you can 'stream' the completion as it's being generated. This allows you to start printing or processing the beginning of the completion before the full completion is finished.

Expected Behavior

Allow for conversation API response streaming (as this is a built in feature of the majority of LLM's)

Actual Behavior

When you request a conversation response, the entire completion is generated before being sent back in a single response. If you're generating long completions, waiting for the response can take many seconds.

Metadata

Typesense Version: 26.0

tommmyy · 2024-05-30T20:46:00Z

Voting on the issue: For most use cases, showing just a loader while the response is being generated is not sufficient. It makes the conversational bot almost unsuitable for production use.

lmatejka · 2024-05-31T09:28:40Z

Also vote for this issue. I presented our demo based on typsense RAG last week and it was little bit annoying for customers to wait for long response. If they see stream of letters (they're used to it now) it will look much better.

Bajocode changed the title ~~Streaming Conversational Responses~~ [Feature Request] Streaming Conversational Responses May 18, 2024

kishorenc added the feature:vectorsearch label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Streaming Conversational Responses #1727

[Feature Request] Streaming Conversational Responses #1727

Bajocode commented May 13, 2024

tommmyy commented May 30, 2024

lmatejka commented May 31, 2024

[Feature Request] Streaming Conversational Responses #1727

[Feature Request] Streaming Conversational Responses #1727

Comments

Bajocode commented May 13, 2024

Description

Expected Behavior

Actual Behavior

Metadata

tommmyy commented May 30, 2024

lmatejka commented May 31, 2024