Skip to content

✨ Feat: Enhance final answer generation with streaming support#2873

Merged
Dallas98 merged 4 commits intodevelopfrom
develop_fix_final_answer_stream
Apr 28, 2026
Merged

✨ Feat: Enhance final answer generation with streaming support#2873
Dallas98 merged 4 commits intodevelopfrom
develop_fix_final_answer_stream

Conversation

@Zhi-a
Copy link
Copy Markdown
Contributor

@Zhi-a Zhi-a commented Apr 27, 2026

✨ Feat: Enhance final answer generation with streaming support

  • Introduced a new method to build messages for final answer generation, incorporating task prompts and memory messages.
  • Updated the max steps handling to utilize streaming for real-time answer generation, improving user experience.
  • Enhanced error handling during final answer generation to provide fallback messages in case of failures.

最大步数时,流式输出final answer
image

Zhi-a added 2 commits April 27, 2026 15:49
- Introduced a new method to build messages for final answer generation, incorporating task prompts and memory messages.
- Updated the max steps handling to utilize streaming for real-time answer generation, improving user experience.
- Enhanced error handling during final answer generation to provide fallback messages in case of failures.
- Modified the condition for displaying the max steps warning to trigger when the message is complete and contains maxStepsInfo.
- Improved clarity in the code comments to better reflect the updated logic.
Copilot AI review requested due to automatic review settings April 27, 2026 08:34
@Zhi-a Zhi-a requested review from Dallas98 and WMC001 as code owners April 27, 2026 08:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the “max steps reached” behavior by generating the final answer via an LLM streaming call so the UI can display real-time thinking tokens, and refines when the frontend shows the max-steps warning.

Changes:

  • Added a helper to build the message list used for “final answer” generation.
  • Overrode max-steps handling to call the model in streaming mode and record a final ActionStep.
  • Updated the chat UI to only show the “max steps reached” warning after the assistant message is complete.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
sdk/nexent/core/agents/core_agent.py Adds final-answer message builder and a streaming _handle_max_steps_reached implementation.
frontend/app/[locale]/chat/streaming/chatStreamFinalMessage.tsx Shows max-steps warning only when the assistant message is complete.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +661 to +662
# Fallback to error message if streaming fails
model_output = f"Error in generating final LLM output: {e}"
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback model_output = f"Error in generating final LLM output: {e}" is returned as the final answer, which can expose internal exception details to end users. Prefer returning a user-safe generic message (and keep the full exception detail in logs).

Suggested change
# Fallback to error message if streaming fails
model_output = f"Error in generating final LLM output: {e}"
# Return a user-safe fallback message if final answer generation fails.
# Keep the detailed exception information in logs for debugging.
model_output = "I’m sorry, but I couldn’t generate a final answer due to an internal error."

Copilot uses AI. Check for mistakes.
Comment on lines +667 to +672
final_memory_step.token_usage = TokenUsage(
input_tokens=total_input_tokens,
output_tokens=total_output_tokens
)
final_memory_step.action_output = model_output

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final_memory_step.action_output = model_output looks inconsistent with the rest of this file, where LLM text is stored on model_output / model_output_message and action_output is used for executed code/tool outputs. Consider storing the generated final-answer text on the model-output fields so step serialization/monitoring/UI logic doesn’t treat it as an action/tool result.

Copilot uses AI. Check for mistakes.
Comment on lines 601 to 603
# _handle_max_steps_reached already yields the final step internally
# and sets action_step.error, so don't yield again to avoid duplicate error
final_answer = self._handle_max_steps_reached(task)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears outdated: _handle_max_steps_reached no longer yields a final step internally, and it doesn’t set action_step.error (it creates/appends a new ActionStep). Please update/remove this comment to match the current control flow to avoid misleading future maintainers.

Copilot uses AI. Check for mistakes.
Comment on lines +639 to +652
# Track accumulated content and token usage for streaming
accumulated_content = []
total_input_tokens = 0
total_output_tokens = 0
role = None

try:
# Use streaming call (model.__call__) to generate final answer
# This will trigger observer.add_model_new_token() and
# observer.add_model_reasoning_content() in OpenAIModel
chat_message: ChatMessage = self.model(messages)

# Update role and content from the completed message
role = chat_message.role
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _handle_max_steps_reached, MessageRole is imported but never used, and accumulated_content / role are declared but unused. Please remove these to avoid confusion and keep the method aligned with its actual streaming mechanism (observer callbacks).

Suggested change
# Track accumulated content and token usage for streaming
accumulated_content = []
total_input_tokens = 0
total_output_tokens = 0
role = None
try:
# Use streaming call (model.__call__) to generate final answer
# This will trigger observer.add_model_new_token() and
# observer.add_model_reasoning_content() in OpenAIModel
chat_message: ChatMessage = self.model(messages)
# Update role and content from the completed message
role = chat_message.role
# Track token usage for the final answer generation
total_input_tokens = 0
total_output_tokens = 0
try:
# Use model call to generate the final answer.
# Streaming updates are emitted through observer callbacks in the model implementation.
chat_message: ChatMessage = self.model(messages)
# Store the completed message content

Copilot uses AI. Check for mistakes.
Comment on lines +628 to +631
# Build messages for final answer generation
memory_messages = self.write_memory_to_messages()
messages = _build_final_answer_messages(task, self.prompt_templates, memory_messages)

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_final_answer_messages(...) can raise (e.g., missing final_answer templates, missing keys inside templates, or Jinja rendering errors via StrictUndefined). Right now this happens before the try/except, so reaching max steps could crash the run instead of returning a best-effort final answer. Consider moving message construction into the try block and falling back to a minimal prompt when templates are unavailable.

Copilot uses AI. Check for mistakes.
Zhi-a added 2 commits April 27, 2026 16:49
- Introduced a new test suite for the _build_final_answer_messages function, covering various scenarios including basic message structure, skipping the first memory message, handling empty memory, and template rendering with task variables.
- Enhanced the test setup by mocking necessary modules to ensure isolated testing of the function's behavior.
…into develop_fix_final_answer_stream

# Conflicts:
#	sdk/nexent/core/agents/core_agent.py
@Dallas98 Dallas98 merged commit c998b4e into develop Apr 28, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants