Skip to content

Claude/perf test fixes response size#45

Merged
maximusunc merged 9 commits into
mainfrom
claude/perf-test-fixes-response-size-NXu2O
Jun 1, 2026
Merged

Claude/perf test fixes response size#45
maximusunc merged 9 commits into
mainfrom
claude/perf-test-fixes-response-size-NXu2O

Conversation

@maximusunc
Copy link
Copy Markdown
Collaborator

No description provided.

claude and others added 9 commits May 24, 2026 20:09
The performance summary outgrew Slack's 3000-char per-section text
limit when the runner started emitting per-outcome stats, so the
incoming webhook silently rejected the POST and the human-readable
summary stopped appearing in Slack. Split each notification across
multiple section blocks at line boundaries and log non-2xx webhook
responses so the next time this happens it isn't invisible.
get_ars_responses shared a single start_time across every child poll, so
once one ARA exhausted MAX_ARA_TIME the next children entered their
poll loop with the deadline already past and were recorded as timed
out without ever being checked. Set start_time per child so each ARA
is polled independently against the full MAX_ARA_TIME window.
A query can come back with a successful status but a payload that
isn't a full TRAPI message (eg an error body), and the summary had no
way to surface that. Track the byte size of each completed query's
final response and report per-outcome min/max/avg plus a distinct-size
count, with an explicit warning when responses with the same outcome
came back at different sizes.

For ARS, the /trace poll only carries status metadata, so fetch the
merged_version PK on completion and use that response's content length
as the recorded size. ARAs already returned the final TRAPI directly,
so no extra request is needed there. A custom QUERY event listener
collects per-outcome sizes and pipes them through results into the
result collector.
The summary previously listed every distinct Locust failure row with
its raw error message, which could push the message past Slack's
3000-char per-section limit on a noisy run. Print just the total
occurrence count and the distinct-row count; the full failure bodies
are already included in the uploaded performance_stats JSON.
The text summary in Slack didn't show the time-series shape of a run
and (after moving failure details out of the summary) no longer
captured distinct error types either. Locust already collects the data
for both: env.runner.stats.history holds per-second snapshots, and
locust.html.get_html_report renders the full UI page programmatically.

For each performance target the runner now also returns the history
list and the Locust HTML report. ResultCollector exposes a generator
that yields (filename, bytes) pairs - a stacked-subplot PNG built
with matplotlib (RPS+failures, p50/p95 response time, user count) and
the Locust HTML report itself. main.py uploads each artifact via a
new Slacker.upload_binary_file helper, isolating failures so one bad
upload doesn't block the rest.

The Slack text summary's failures line now points to the HTML report
for the full Locust-identical failures table.
@maximusunc maximusunc merged commit 3d6771c into main Jun 1, 2026
2 checks passed
@maximusunc maximusunc deleted the claude/perf-test-fixes-response-size-NXu2O branch June 1, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants