Penalize no agent response by tara-servicenow · Pull Request #70 · ServiceNow/eva

tara-servicenow · 2026-04-21T00:44:22Z

Tested on run with perturbations, before 9/50 would rerun due to "inactivity timeout", with this change they were all identified as agent errors and no reruns were needed based on the conversation finished check. The diagnostic metric shows those 9 failed on agent turn response, all other succeeded. I looked at a few of the examples it flagged and listened to the audio and it was as expected, the agent did not respond to a user turn that i could clearly hear.

…onse

gabegma · 2026-04-23T16:44:00Z

+### Evaluation Methodology
+
+1. Compute `last_audio_speaker` as whichever side (`"user"` or `"assistant"`) has the latest audio end-timestamp across all turns. Returns `None` if neither side recorded audio.
+2. Flag the record as a missed turn iff `conversation_ended_reason == "inactivity_timeout"` **and** `last_audio_speaker == "user"`.


Suggested change

2. Flag the record as a missed turn iff `conversation_ended_reason == "inactivity_timeout"` **and** `last_audio_speaker == "user"`.

2. Flag the record as a missed turn if `conversation_ended_reason == "inactivity_timeout"` **and** `last_audio_speaker == "user"`.

gabegma · 2026-04-23T16:56:01Z

+            if ctx.conversation_finished:  # type: ignore[attr-defined]
+                gate_passed.append(record_id)
+                continue
+            if is_agent_timeout_on_user_turn(


I had in mind that we would modify the conversation_valid_end definition to include either the agent timeout or that the end tool call is properly called. Any reason for doing it manually here rather than in the metric directly?

gabegma · 2026-04-23T16:56:26Z


        config_data = json.loads(config_path.read_text())
+        # Backwards compat: remap any legacy metric names saved in an older config.json.
+        from eva.metrics.legacy_aliases import rename_metric_keys, rename_metric_list


Could we move the import to the top? I think we are importing 3 times now.

gabegma · 2026-04-23T16:58:07Z

+            if ctx is None:
+                not_finished.append(record_id)
+                continue
+            if ctx.conversation_finished:  # type: ignore[attr-defined]


Could we rename this context variable to conversation_valid_end since I think this is what it now represents?

- Docs: 'iff' -> 'if' in conversation_correctly_finished.md - Hoist legacy_aliases imports to module tops (eva.metrics.runner, eva.orchestrator.runner) - ConversationValidEndMetric now scores 1.0 on agent_timeout_on_user_turn as well as goodbye - Rename _ProcessorContext/MetricContext.conversation_finished -> conversation_valid_end and compute it as (goodbye OR agent_timeout_on_user_turn) - Simplify ValidationRunner._classify to a single valid_end check; keep agent_timeout set for terminal flagging

gabegma

LGTM!! Minor detail but we could update the doc for conversation_valid_end now that it includes agent time-out failures.

tara-servicenow added 7 commits April 20, 2026 13:17

Initial implementation

e47bcd9

Change to diagnostic metric

c5b307a

Merge branch 'main' of github.com:ServiceNow/eva into pr/tara/no_resp…

344a5d2

…onse

Add tests for diagnostic metric and agent timeout

a093f6f

Rename

89b8f6f

Remove docs that shouldn't be pushed

f7f2722

Add metric doc for conversation correctly finished doc

a1563d5

gabegma reviewed Apr 23, 2026

View reviewed changes

tara-servicenow added 2 commits April 23, 2026 19:32

Merge branch 'main' into pr/tara/no_response

0868a51

gabegma approved these changes Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Penalize no agent response#70

Penalize no agent response#70
tara-servicenow wants to merge 9 commits intomainfrom
pr/tara/no_response

tara-servicenow commented Apr 21, 2026 •

edited

Loading

Uh oh!

gabegma Apr 23, 2026

Uh oh!

gabegma Apr 23, 2026

Uh oh!

gabegma Apr 23, 2026

Uh oh!

gabegma Apr 23, 2026

Uh oh!

gabegma left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	2. Flag the record as a missed turn iff `conversation_ended_reason == "inactivity_timeout"` and `last_audio_speaker == "user"`.
	2. Flag the record as a missed turn if `conversation_ended_reason == "inactivity_timeout"` and `last_audio_speaker == "user"`.

Conversation

tara-servicenow commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabegma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gabegma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gabegma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gabegma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gabegma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tara-servicenow commented Apr 21, 2026 •

edited

Loading