✨ Feat: Enhance final answer generation with streaming support by Zhi-a · Pull Request #2873 · ModelEngine-Group/nexent

Zhi-a · 2026-04-27T08:34:44Z

✨ Feat: Enhance final answer generation with streaming support

Introduced a new method to build messages for final answer generation, incorporating task prompts and memory messages.
Updated the max steps handling to utilize streaming for real-time answer generation, improving user experience.
Enhanced error handling during final answer generation to provide fallback messages in case of failures.

最大步数时，流式输出final answer

- Introduced a new method to build messages for final answer generation, incorporating task prompts and memory messages. - Updated the max steps handling to utilize streaming for real-time answer generation, improving user experience. - Enhanced error handling during final answer generation to provide fallback messages in case of failures.

- Modified the condition for displaying the max steps warning to trigger when the message is complete and contains maxStepsInfo. - Improved clarity in the code comments to better reflect the updated logic.

Copilot

Pull request overview

This PR enhances the “max steps reached” behavior by generating the final answer via an LLM streaming call so the UI can display real-time thinking tokens, and refines when the frontend shows the max-steps warning.

Changes:

Added a helper to build the message list used for “final answer” generation.
Overrode max-steps handling to call the model in streaming mode and record a final ActionStep.
Updated the chat UI to only show the “max steps reached” warning after the assistant message is complete.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
sdk/nexent/core/agents/core_agent.py	Adds final-answer message builder and a streaming `_handle_max_steps_reached` implementation.
frontend/app/[locale]/chat/streaming/chatStreamFinalMessage.tsx	Shows max-steps warning only when the assistant message is complete.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T08:40:04Z

+            # Fallback to error message if streaming fails
+            model_output = f"Error in generating final LLM output: {e}"


The fallback model_output = f"Error in generating final LLM output: {e}" is returned as the final answer, which can expose internal exception details to end users. Prefer returning a user-safe generic message (and keep the full exception detail in logs).

Suggested change

# Fallback to error message if streaming fails

model_output = f"Error in generating final LLM output: {e}"

# Return a user-safe fallback message if final answer generation fails.

# Keep the detailed exception information in logs for debugging.

model_output = "I’m sorry, but I couldn’t generate a final answer due to an internal error."

Copilot · 2026-04-27T08:40:04Z

+        final_memory_step.token_usage = TokenUsage(
+            input_tokens=total_input_tokens,
+            output_tokens=total_output_tokens
+        )
+        final_memory_step.action_output = model_output
+


final_memory_step.action_output = model_output looks inconsistent with the rest of this file, where LLM text is stored on model_output / model_output_message and action_output is used for executed code/tool outputs. Consider storing the generated final-answer text on the model-output fields so step serialization/monitoring/UI logic doesn’t treat it as an action/tool result.

Copilot · 2026-04-27T08:40:04Z

            # _handle_max_steps_reached already yields the final step internally
            # and sets action_step.error, so don't yield again to avoid duplicate error
            final_answer = self._handle_max_steps_reached(task)


This comment appears outdated: _handle_max_steps_reached no longer yields a final step internally, and it doesn’t set action_step.error (it creates/appends a new ActionStep). Please update/remove this comment to match the current control flow to avoid misleading future maintainers.

Copilot · 2026-04-27T08:40:05Z

+        # Track accumulated content and token usage for streaming
+        accumulated_content = []
+        total_input_tokens = 0
+        total_output_tokens = 0
+        role = None
+
+        try:
+            # Use streaming call (model.__call__) to generate final answer
+            # This will trigger observer.add_model_new_token() and
+            # observer.add_model_reasoning_content() in OpenAIModel
+            chat_message: ChatMessage = self.model(messages)
+
+            # Update role and content from the completed message
+            role = chat_message.role


In _handle_max_steps_reached, MessageRole is imported but never used, and accumulated_content / role are declared but unused. Please remove these to avoid confusion and keep the method aligned with its actual streaming mechanism (observer callbacks).

Suggested change

# Track accumulated content and token usage for streaming

accumulated_content = []

total_input_tokens = 0

total_output_tokens = 0

role = None

try:

# Use streaming call (model.__call__) to generate final answer

# This will trigger observer.add_model_new_token() and

# observer.add_model_reasoning_content() in OpenAIModel

chat_message: ChatMessage = self.model(messages)

# Update role and content from the completed message

role = chat_message.role

# Track token usage for the final answer generation

total_input_tokens = 0

total_output_tokens = 0

try:

# Use model call to generate the final answer.

# Streaming updates are emitted through observer callbacks in the model implementation.

chat_message: ChatMessage = self.model(messages)

# Store the completed message content

Copilot · 2026-04-27T08:40:05Z

+        # Build messages for final answer generation
+        memory_messages = self.write_memory_to_messages()
+        messages = _build_final_answer_messages(task, self.prompt_templates, memory_messages)
+


_build_final_answer_messages(...) can raise (e.g., missing final_answer templates, missing keys inside templates, or Jinja rendering errors via StrictUndefined). Right now this happens before the try/except, so reaching max steps could crash the run instead of returning a best-effort final answer. Consider moving message construction into the try block and falling back to a minimal prompt when templates are unavailable.

- Introduced a new test suite for the _build_final_answer_messages function, covering various scenarios including basic message structure, skipping the first memory message, handling empty memory, and template rendering with task variables. - Enhanced the test setup by mocking necessary modules to ensure isolated testing of the function's behavior.

…into develop_fix_final_answer_stream # Conflicts: # sdk/nexent/core/agents/core_agent.py

Zhi-a added 2 commits April 27, 2026 15:49

🔧 Update max steps warning logic in chat stream final message component

f9d7f0b

- Modified the condition for displaying the max steps warning to trigger when the message is complete and contains maxStepsInfo. - Improved clarity in the code comments to better reflect the updated logic.

Copilot AI review requested due to automatic review settings April 27, 2026 08:34

Zhi-a requested review from Dallas98 and WMC001 as code owners April 27, 2026 08:34

Copilot started reviewing on behalf of Zhi-a April 27, 2026 08:35 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Zhi-a added 2 commits April 27, 2026 16:49

Merge branch 'develop' of https://github.com/ModelEngine-Group/nexent …

48f664c

…into develop_fix_final_answer_stream # Conflicts: # sdk/nexent/core/agents/core_agent.py

Dallas98 approved these changes Apr 28, 2026

View reviewed changes

Dallas98 merged commit c998b4e into develop Apr 28, 2026
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Feat: Enhance final answer generation with streaming support#2873

✨ Feat: Enhance final answer generation with streaming support#2873
Dallas98 merged 4 commits intodevelopfrom
develop_fix_final_answer_stream

Zhi-a commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Fallback to error message if streaming fails
		model_output = f"Error in generating final LLM output: {e}"

-            # Fallback to error message if streaming fails
-            model_output = f"Error in generating final LLM output: {e}"
+            # Return a user-safe fallback message if final answer generation fails.
+            # Keep the detailed exception information in logs for debugging.
+            model_output = "I’m sorry, but I couldn’t generate a final answer due to an internal error."

Conversation

Zhi-a commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants