✨ Feat: Enhance final answer generation with streaming support#2873
✨ Feat: Enhance final answer generation with streaming support#2873
Conversation
- Introduced a new method to build messages for final answer generation, incorporating task prompts and memory messages. - Updated the max steps handling to utilize streaming for real-time answer generation, improving user experience. - Enhanced error handling during final answer generation to provide fallback messages in case of failures.
- Modified the condition for displaying the max steps warning to trigger when the message is complete and contains maxStepsInfo. - Improved clarity in the code comments to better reflect the updated logic.
There was a problem hiding this comment.
Pull request overview
This PR enhances the “max steps reached” behavior by generating the final answer via an LLM streaming call so the UI can display real-time thinking tokens, and refines when the frontend shows the max-steps warning.
Changes:
- Added a helper to build the message list used for “final answer” generation.
- Overrode max-steps handling to call the model in streaming mode and record a final
ActionStep. - Updated the chat UI to only show the “max steps reached” warning after the assistant message is complete.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| sdk/nexent/core/agents/core_agent.py | Adds final-answer message builder and a streaming _handle_max_steps_reached implementation. |
| frontend/app/[locale]/chat/streaming/chatStreamFinalMessage.tsx | Shows max-steps warning only when the assistant message is complete. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Fallback to error message if streaming fails | ||
| model_output = f"Error in generating final LLM output: {e}" |
There was a problem hiding this comment.
The fallback model_output = f"Error in generating final LLM output: {e}" is returned as the final answer, which can expose internal exception details to end users. Prefer returning a user-safe generic message (and keep the full exception detail in logs).
| # Fallback to error message if streaming fails | |
| model_output = f"Error in generating final LLM output: {e}" | |
| # Return a user-safe fallback message if final answer generation fails. | |
| # Keep the detailed exception information in logs for debugging. | |
| model_output = "I’m sorry, but I couldn’t generate a final answer due to an internal error." |
| final_memory_step.token_usage = TokenUsage( | ||
| input_tokens=total_input_tokens, | ||
| output_tokens=total_output_tokens | ||
| ) | ||
| final_memory_step.action_output = model_output | ||
|
|
There was a problem hiding this comment.
final_memory_step.action_output = model_output looks inconsistent with the rest of this file, where LLM text is stored on model_output / model_output_message and action_output is used for executed code/tool outputs. Consider storing the generated final-answer text on the model-output fields so step serialization/monitoring/UI logic doesn’t treat it as an action/tool result.
| # _handle_max_steps_reached already yields the final step internally | ||
| # and sets action_step.error, so don't yield again to avoid duplicate error | ||
| final_answer = self._handle_max_steps_reached(task) |
There was a problem hiding this comment.
This comment appears outdated: _handle_max_steps_reached no longer yields a final step internally, and it doesn’t set action_step.error (it creates/appends a new ActionStep). Please update/remove this comment to match the current control flow to avoid misleading future maintainers.
| # Track accumulated content and token usage for streaming | ||
| accumulated_content = [] | ||
| total_input_tokens = 0 | ||
| total_output_tokens = 0 | ||
| role = None | ||
|
|
||
| try: | ||
| # Use streaming call (model.__call__) to generate final answer | ||
| # This will trigger observer.add_model_new_token() and | ||
| # observer.add_model_reasoning_content() in OpenAIModel | ||
| chat_message: ChatMessage = self.model(messages) | ||
|
|
||
| # Update role and content from the completed message | ||
| role = chat_message.role |
There was a problem hiding this comment.
In _handle_max_steps_reached, MessageRole is imported but never used, and accumulated_content / role are declared but unused. Please remove these to avoid confusion and keep the method aligned with its actual streaming mechanism (observer callbacks).
| # Track accumulated content and token usage for streaming | |
| accumulated_content = [] | |
| total_input_tokens = 0 | |
| total_output_tokens = 0 | |
| role = None | |
| try: | |
| # Use streaming call (model.__call__) to generate final answer | |
| # This will trigger observer.add_model_new_token() and | |
| # observer.add_model_reasoning_content() in OpenAIModel | |
| chat_message: ChatMessage = self.model(messages) | |
| # Update role and content from the completed message | |
| role = chat_message.role | |
| # Track token usage for the final answer generation | |
| total_input_tokens = 0 | |
| total_output_tokens = 0 | |
| try: | |
| # Use model call to generate the final answer. | |
| # Streaming updates are emitted through observer callbacks in the model implementation. | |
| chat_message: ChatMessage = self.model(messages) | |
| # Store the completed message content |
| # Build messages for final answer generation | ||
| memory_messages = self.write_memory_to_messages() | ||
| messages = _build_final_answer_messages(task, self.prompt_templates, memory_messages) | ||
|
|
There was a problem hiding this comment.
_build_final_answer_messages(...) can raise (e.g., missing final_answer templates, missing keys inside templates, or Jinja rendering errors via StrictUndefined). Right now this happens before the try/except, so reaching max steps could crash the run instead of returning a best-effort final answer. Consider moving message construction into the try block and falling back to a minimal prompt when templates are unavailable.
- Introduced a new test suite for the _build_final_answer_messages function, covering various scenarios including basic message structure, skipping the first memory message, handling empty memory, and template rendering with task variables. - Enhanced the test setup by mocking necessary modules to ensure isolated testing of the function's behavior.
…into develop_fix_final_answer_stream # Conflicts: # sdk/nexent/core/agents/core_agent.py
✨ Feat: Enhance final answer generation with streaming support
最大步数时,流式输出final answer
