Skip to content

Fix(components): conversational retrieval qa streaming regression#6089

Open
estebanjosse wants to merge 9 commits intoFlowiseAI:mainfrom
estebanjosse:fix/conversational-retrieval-streaming-regression
Open

Fix(components): conversational retrieval qa streaming regression#6089
estebanjosse wants to merge 9 commits intoFlowiseAI:mainfrom
estebanjosse:fix/conversational-retrieval-streaming-regression

Conversation

@estebanjosse
Copy link
Copy Markdown
Contributor

Summary

This PR fixes a streaming regression in the Conversational Retrieval QA Chain introduced in 3.1.x, where responses were no longer streamed token-by-token and were instead returned only once completed.

In addition to restoring streaming, this PR refactors the streaming implementation to make it more maintainable, consistent with other chains, and significantly easier to test.


Root cause

The previous implementation relied on a manual JSON patch-based streaming mechanism using streamLog() and applyPatch to reconstruct streamed output.

This approach:

  • does not reliably propagate token-level streaming events
  • introduces complex state management
  • makes the streaming behavior difficult to reason about and test

Fix

This PR replaces the JSON patch-based streaming logic with a callback handler-based approach using CustomChainHandler, aligning the implementation with ConversationChain.

Key changes:

  • Remove streamLog() / JSON patch parsing (applyPatch)
  • Introduce callback-based token streaming via CustomChainHandler
  • Ensure proper propagation of streaming events to the frontend

This results in:

  • restored token-by-token streaming
  • simpler and more maintainable code
  • consistent streaming behavior across chains

Refactoring

  • Simplified streaming logic by removing manual patch/state handling
  • Reworked chain composition using RunnableSequence and RunnableMap for better clarity and modularity
  • Cleaned up imports and removed unused code

Tests

  • ✅ Added a dedicated test harness (ConversationalRetrievalQAChain.test.ts)
  • ✅ Uses test doubles for:
    • streaming
    • memory
    • retriever
  • ✅ Verified streaming behavior with red → green test cycle
  • ✅ Covers:
    • token streaming
    • source document emission
    • correct output structure

Behavioral improvements

  • Ensures source documents are streamed only when appropriate
  • Prevents leakage of condensed question text in streamed tokens
  • Introduces a clear output contract via ConversationalRetrievalQAResult

Impact

  • Restores expected streaming behavior for Conversational Retrieval QA Chain
  • No impact on other chains or flows
  • Improves maintainability and testability of the node

Notes

This change removes reliance on streamLog() for streaming and aligns the implementation with the callback-based pattern already used in other parts of the codebase.

@estebanjosse
Copy link
Copy Markdown
Contributor Author

Closes #6070

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ConversationalRetrievalQAChain to use answerChain.invoke instead of streamLog, implementing a CustomChainHandler for streaming and restructuring the internal chain to better handle source documents. A new test suite is also introduced to verify streaming and document retrieval. The reviewer identified a redundant call to serializeHistory when checking for chat history, suggesting a more efficient check using the history array's length.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant