interleave metric and nonblock IO by raghavm243512 · Pull Request #76 · ServiceNow/eva

raghavm243512 · 2026-04-23T22:01:48Z

Changes:

allow validation and metrics to run on successful conversations even if other conversations need rerun. These will be saved and used at the final pass once all retries are done or success is for all records

Much smaller changes:
Release semaphore before doing expensive writes for audio

Don't block main thread with IO. Files reads are large enough where this will help somewhat, Probably not a lot though.

JosephMarinier · 2026-04-25T17:15:26Z

+                *[_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids],
+                return_exceptions=True,


Since we except Exception inside _run_and_pipeline(), we don't expect _run_and_pipeline() to raise anything. Can you make that intentional by removing return_exceptions=True?

Also, it's a tiny detail, but since we're here, we don't need to instantiate a list; we can simply use a generator (in parentheses).

Suggested change

*[_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids],

return_exceptions=True,

*(_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids)

JosephMarinier · 2026-04-25T17:16:45Z

+                if isinstance(item, Exception):
+                    logger.error(f"Unexpected pipeline coroutine exception: {item}")
+                    continue


Again, that shouldn't happen, so I think we can clean that up. Also, if that even happened, this wouldn't add the sample to either finished_ids or not_finished_ids, and it would disappear, which seems a bit broken. Does that make sense?

JosephMarinier · 2026-04-25T17:17:20Z

+        if not self._audio_buffer and self.user_audio_buffer and self.assistant_audio_buffer:
+            diff_bytes = abs(len(self.user_audio_buffer) - len(self.assistant_audio_buffer))
+            diff_ms = diff_bytes / (2 * self._audio_sample_rate) * 1000
+            if diff_ms > 500:
+                logger.warning(
+                    f"Audio buffer length mismatch: user={len(self.user_audio_buffer)} "
+                    f"assistant={len(self.assistant_audio_buffer)} "
+                    f"diff={diff_ms:.0f}ms — mixed recording may be temporally skewed"
+                )
+            from eva.assistant.audio_bridge import pcm16_mix  # lazy: avoids circular import at module load
+
+            self._audio_buffer = bytearray(pcm16_mix(bytes(self.user_audio_buffer), bytes(self.assistant_audio_buffer)))
+        elif not self._audio_buffer and self.user_audio_buffer:
+            self._audio_buffer = bytearray(self.user_audio_buffer)
+        elif not self._audio_buffer and self.assistant_audio_buffer:
+            self._audio_buffer = bytearray(self.assistant_audio_buffer)


Detail: I think a single if not self._audio_buffer would be clearer than repeating 3 times.

Suggested change

if not self._audio_buffer and self.user_audio_buffer and self.assistant_audio_buffer:

diff_bytes = abs(len(self.user_audio_buffer) - len(self.assistant_audio_buffer))

diff_ms = diff_bytes / (2 * self._audio_sample_rate) * 1000

if diff_ms > 500:

logger.warning(

f"Audio buffer length mismatch: user={len(self.user_audio_buffer)} "

f"assistant={len(self.assistant_audio_buffer)} "

f"diff={diff_ms:.0f}ms — mixed recording may be temporally skewed"

)

from eva.assistant.audio_bridge import pcm16_mix # lazy: avoids circular import at module load

self._audio_buffer = bytearray(pcm16_mix(bytes(self.user_audio_buffer), bytes(self.assistant_audio_buffer)))

elif not self._audio_buffer and self.user_audio_buffer:

self._audio_buffer = bytearray(self.user_audio_buffer)

elif not self._audio_buffer and self.assistant_audio_buffer:

self._audio_buffer = bytearray(self.assistant_audio_buffer)

if not self._audio_buffer:

if self.user_audio_buffer and self.assistant_audio_buffer:

diff_bytes = abs(len(self.user_audio_buffer) - len(self.assistant_audio_buffer))

diff_ms = diff_bytes / (2 * self._audio_sample_rate) * 1000

if diff_ms > 500:

logger.warning(

f"Audio buffer length mismatch: user={len(self.user_audio_buffer)} "

f"assistant={len(self.assistant_audio_buffer)} "

f"diff={diff_ms:.0f}ms — mixed recording may be temporally skewed"

)

from eva.assistant.audio_bridge import pcm16_mix # lazy: avoids circular import at module load

self._audio_buffer = bytearray(

pcm16_mix(bytes(self.user_audio_buffer), bytes(self.assistant_audio_buffer))

)

elif self.user_audio_buffer:

self._audio_buffer = bytearray(self.user_audio_buffer)

elif self.assistant_audio_buffer:

self._audio_buffer = bytearray(self.assistant_audio_buffer)

JosephMarinier · 2026-04-25T17:25:18Z

+        if isinstance(self.pipeline_config, SpeechToSpeechConfig):
+            self.audit_log.save_transcript_jsonl(transcript_path)
+        elif not transcript_path.exists():


Suggested change

if isinstance(self.pipeline_config, SpeechToSpeechConfig):

self.audit_log.save_transcript_jsonl(transcript_path)

elif not transcript_path.exists():

if isinstance(self.pipeline_config, SpeechToSpeechConfig) or not transcript_path.exists():

JosephMarinier · 2026-04-25T17:27:33Z

                logger.error(f"Error saving agent perf stats: {e}", exc_info=True)

-        # Call base class to save audit_log, audio, scenario DBs, latencies
-        await super().save_outputs()


Duplicating most of super().save_outputs(), instead of calling it, worries me about them going out of sync. Have you considered refactoring so that you can keep calling super().save_outputs()?

JosephMarinier · 2026-04-25T17:30:00Z

+                    # Phase 6: Fire metrics immediately if passed
+                    if vr.passed and metrics_runner is not None:
+                        rdir = self.output_dir / "records" / output_id
+                        task = asyncio.create_task(metrics_runner._run_and_save_record(output_id, rdir))


Now that we call _run_and_save_record() from two places outside of running.py, should we remove the "private" underscore prefix?

JosephMarinier · 2026-04-25T17:49:53Z

-        # STEP 7: Run full metrics on successful records
+        # STEP 7: Await background metrics, then run final aggregation pass.
+        # Background tasks already wrote metrics.json for records validated during the loop.
+        # The final run() skips already-computed records and only does summary aggregation.


Since we're now mutating metrics_runner.record_ids, I think we need to be extra careful with the ordering of the steps. Would adding a comment like that make sense?

Suggested change

# The final run() skips already-computed records and only does summary aggregation.

# The final run() skips already-computed records and only does summary aggregation.

# IMPORTANT: Must await all background tasks BEFORE mutating metrics_runner.record_ids

# below — running tasks read record_ids to filter which records to process.

interleave metric and nonblock IO

a701647

raghavm243512 marked this pull request as ready for review April 23, 2026 22:01

raghavm243512 added 2 commits April 23, 2026 16:06

validation and semaphore release

2cc37d2

merge main

28b0c0f

raghavm243512 closed this Apr 24, 2026

raghavm243512 reopened this Apr 24, 2026

JosephMarinier reviewed Apr 25, 2026

View reviewed changes

JosephMarinier approved these changes Apr 25, 2026

View reviewed changes

raghavm243512 added 2 commits April 25, 2026 11:53

address comments

5e805fd

merge main

0a593bb

raghavm243512 force-pushed the pr/rm/speedup branch from 2399c98 to 0a593bb Compare April 25, 2026 19:29

Apply pre-commit

c68fb12

raghavm243512 closed this Apr 25, 2026

raghavm243512 reopened this Apr 25, 2026

JosephMarinier approved these changes Apr 25, 2026

View reviewed changes

raghavm243512 added this pull request to the merge queue Apr 25, 2026

Merged via the queue into main with commit d4fabff Apr 25, 2026
1 check passed

raghavm243512 deleted the pr/rm/speedup branch April 25, 2026 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interleave metric and nonblock IO#76

interleave metric and nonblock IO#76
raghavm243512 merged 6 commits intomainfrom
pr/rm/speedup

raghavm243512 commented Apr 23, 2026 •

edited

Loading

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

JosephMarinier Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		*[_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids],
		return_exceptions=True,

	*[_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids],
	return_exceptions=True,
	*(_run_and_pipeline(output_id_to_record[oid], oid) for oid in pending_output_ids)

Conversation

raghavm243512 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raghavm243512 commented Apr 23, 2026 •

edited

Loading