Skip to content

[Tech Debt] Quickstart process_chat_response silently drops output on JSON parse failure #780

@Sxnan

Description

@Sxnan

Search before asking

  • I searched in the issues and found nothing similar.

Description

Found during react_agent doc verification (#741) while running an end-to-end
agent with a real LLM.

The quickstart sample process_chat_response action in
python/flink_agents/examples/quickstart/agents/review_analysis_agent.py:107-128
catches every exception, logs it, and emits no OutputEvent:

@action(ChatResponseEvent.EVENT_TYPE)
@staticmethod
def process_chat_response(event: Event, ctx: RunnerContext) -> None:
    chat_response = ChatResponseEvent.from_event(event)
    try:
        json_content = json.loads(chat_response.response.content)
        ctx.send_event(
            OutputEvent(
                output=ProductReviewAnalysisRes(
                    id=ctx.short_term_memory.get("id"),
                    score=json_content["score"],
                    reasons=json_content["reasons"],
                )
            )
        )
    except Exception:
        logging.exception(
            f"Error processing chat response {chat_response.response.content}"
        )
        # To fail the agent, you can raise an exception here.

In end-to-end testing against Tongyi qwen-plus, one of two reviews triggered
a malformed JSON response from the LLM and the agent silently produced no
output for that key — the input was lost without any user-visible signal
beyond a log line.

This is the recommended sample for users learning the workflow-style agent;
production users will copy this pattern. "Log and drop" is risky in
streaming pipelines because there is no surface (DLQ event, error event,
metric) for downstream operators to detect the missing record.

The same pattern is also present in:

  • examples/.../agents/ReviewAnalysisAgent.java
  • examples/.../agents/TableReviewAnalysisAgent.java

Suggested fix

At minimum, rewrite the trailing comment into a clear paragraph explaining
the choice and offering production alternatives:

  • emit an OutputEvent carrying an error sentinel value
  • emit a custom error event
  • raise to fail the input

Better: change the sample to demonstrate one of the safer patterns above,
since copy-paste of except Exception: log() is a well-known anti-pattern
even outside Flink Agents.

How to reproduce

Run the YAML quickstart with a chat model that occasionally returns
malformed JSON (or send an input that confuses the model). The agent
produces fewer outputs than inputs without any user-visible failure.

Version and environment

Flink Agents 0.3.0 (main).

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

fixVersion/0.3.0The feature or bug should be implemented/fixed in the 0.3.0 version.priority/blockerIndicates the PR or issue that should block the release until it gets resolved.tech debt[Issue Type] User-unaware issues, such as code refactor and infrastructure maintenance.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions