Claude/perf test fixes response size by maximusunc · Pull Request #45 · TranslatorSRI/TestHarness

maximusunc · 2026-06-01T17:17:55Z

No description provided.

The performance summary outgrew Slack's 3000-char per-section text limit when the runner started emitting per-outcome stats, so the incoming webhook silently rejected the POST and the human-readable summary stopped appearing in Slack. Split each notification across multiple section blocks at line boundaries and log non-2xx webhook responses so the next time this happens it isn't invisible.

get_ars_responses shared a single start_time across every child poll, so once one ARA exhausted MAX_ARA_TIME the next children entered their poll loop with the deadline already past and were recorded as timed out without ever being checked. Set start_time per child so each ARA is polled independently against the full MAX_ARA_TIME window.

A query can come back with a successful status but a payload that isn't a full TRAPI message (eg an error body), and the summary had no way to surface that. Track the byte size of each completed query's final response and report per-outcome min/max/avg plus a distinct-size count, with an explicit warning when responses with the same outcome came back at different sizes. For ARS, the /trace poll only carries status metadata, so fetch the merged_version PK on completion and use that response's content length as the recorded size. ARAs already returned the final TRAPI directly, so no extra request is needed there. A custom QUERY event listener collects per-outcome sizes and pipes them through results into the result collector.

The summary previously listed every distinct Locust failure row with its raw error message, which could push the message past Slack's 3000-char per-section limit on a noisy run. Print just the total occurrence count and the distinct-row count; the full failure bodies are already included in the uploaded performance_stats JSON.

The text summary in Slack didn't show the time-series shape of a run and (after moving failure details out of the summary) no longer captured distinct error types either. Locust already collects the data for both: env.runner.stats.history holds per-second snapshots, and locust.html.get_html_report renders the full UI page programmatically. For each performance target the runner now also returns the history list and the Locust HTML report. ResultCollector exposes a generator that yields (filename, bytes) pairs - a stacked-subplot PNG built with matplotlib (RPS+failures, p50/p95 response time, user count) and the Locust HTML report itself. main.py uploads each artifact via a new Slacker.upload_binary_file helper, isolating failures so one bad upload doesn't block the rest. The Slack text summary's failures line now points to the HTML report for the full Locust-identical failures table.

claude and others added 9 commits May 24, 2026 20:09

Run black

dca7011

Fix test

9b392eb

Run updated black

6093daa

Bump patch version

9bde55c

maximusunc merged commit 3d6771c into main Jun 1, 2026
2 checks passed

maximusunc deleted the claude/perf-test-fixes-response-size-NXu2O branch June 1, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/perf test fixes response size#45

Claude/perf test fixes response size#45
maximusunc merged 9 commits into
mainfrom
claude/perf-test-fixes-response-size-NXu2O

maximusunc commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maximusunc commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants