ci: integrate pr-agent-context with coverage XML and refresh flow by shaypal5 · Pull Request #2 · DataHackIL/SynthBanshee

shaypal5 · 2026-04-07T10:13:46Z

Summary

CI workflow: test jobs now upload raw .coverage.py{version} artifacts; a new coverage job combines them and produces coverage.xml; a new pr-agent-context job posts an AI-agent-readable context comment on every PR
Refresh workflow: new pr-agent-context-refresh.yml re-posts the comment (in append mode, hiding the previous one) whenever a check completes, a review is submitted, or a review comment changes
Template: .github/pr-agent-context-template.md with the standard section layout (preamble, opening instructions, copilot comments, review comments, failing checks, patch coverage)
Coverage config: [tool.coverage.*] sections added to pyproject.toml so coverage combine and coverage xml work correctly across matrix runs

How it works

PR opened / updated
  → CI runs (test × 2 Python versions, lint, coverage)
  → coverage job combines .coverage.py3.11 + .coverage.py3.12 → coverage.xml
  → pr-agent-context job posts context comment (patch coverage + review threads + failing checks)

review submitted / check completes / review comment changed
  → pr-agent-context-refresh triggers
  → fetches coverage.xml from the CI producer run (cross-run lookup)
  → posts a new context comment in append mode
  → hides the previous context comment

Pattern reference

Follows the implementation in DataHackIL/tfht_enforce_idx:

patch_coverage_source_mode: coverage_xml_artifact (combined XML, not raw artifacts)
coverage_source_workflows: CI for cross-run lookup in refresh
publish_mode: append with hide_previous_managed_comments_on_append: true (default)
wait_for_reviews_to_settle: true in refresh so resolved threads are not re-reported

Test plan

Open a PR → CI runs → pr-agent-context comment appears with coverage section
Submit a review → refresh workflow triggers → old comment hidden, new comment posted
Complete all checks with no issues → comment is deleted (delete_comment_when_empty default)

🤖 Generated with Claude Code

Follows the pattern from DataHackIL/tfht_enforce_idx: CI workflow (ci.yml): - Test job now runs with --cov avdp --cov-branch --cov-report= and uploads raw .coverage.py{version} data as pr-agent-context-coverage-py{version} artifacts (one per matrix entry) - New coverage job: downloads all raw artifacts, combines with coverage combine, generates coverage.xml, uploads as coverage-xml artifact - New pr-agent-context job: runs after lint/test/coverage on every PR using coverage_xml_artifact mode pointing at coverage-xml; uses template file Refresh workflow (pr-agent-context-refresh.yml): - Triggers on pull_request_review, pull_request_review_comment, check_run, status - Runs in append mode with hide_previous_managed_comments_on_append (default true) so old pr-agent-context comments are hidden when a new one is posted - wait_for_reviews_to_settle: true so resolved threads are not re-reported - enable_cross_run_coverage_lookup: true with coverage_source_workflows: CI so refresh runs can reuse coverage artifacts from the CI producer run Template (.github/pr-agent-context-template.md): - Standard section layout: preamble, opening instructions, copilot comments, review comments, failing checks, patch coverage pyproject.toml: - Add [tool.coverage.run/paths/report/xml] config so coverage combine and coverage xml work correctly across matrix runs and local development Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Integrates pr-agent-context into CI by generating a combined coverage.xml across Python matrix runs and adds a refresh workflow that re-posts/upgrades the PR context comment as reviews and checks change.

Changes:

Upload per-matrix raw coverage data, then combine into a single coverage.xml artifact in a dedicated coverage job.
Add a pr-agent-context CI job plus a new refresh workflow to append updated context comments on review/check activity.
Add coverage configuration to pyproject.toml and introduce a standardized PR-agent context prompt template.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`pyproject.toml`	Adds `[tool.coverage.*]` configuration to support `coverage combine` and `coverage xml`.
`.github/workflows/ci.yml`	Uploads raw coverage artifacts from matrix tests, combines them into `coverage.xml`, and posts the PR agent context comment.
`.github/workflows/pr-agent-context-refresh.yml`	New workflow to refresh/re-post the PR agent context comment when reviews/comments/checks change.
`.github/pr-agent-context-template.md`	Adds the prompt template layout used by `pr-agent-context`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace hardcoded /home/runner/work/SynthBanshee/SynthBanshee/avdp with /home/runner/work/*/*/avdp so coverage path normalization works regardless of repo name, fork, or self-hosted runner workspace layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-07T11:34:07Z

pr-agent-context report:

🚨 `pr-agent-context` failed while preparing PR context.

PR: #2
Error: CalledProcessError: Command '['git', '-C', '/home/runner/work/SynthBanshee/SynthBanshee/caller-repo', 'diff', '--unified=0', '0efa0f83fad69b448f3037f09c3642a5ea209007...edd0b1407fa77605d0e1b8dcf4ffcf6309b047e1']' returned non-zero exit status 128.
Run: https://github.com/DataHackIL/SynthBanshee/actions/runs/24079184993

The workflow continued gracefully so this failure does not block CI.
Check the job logs for the full traceback.

Run metadata:

Tool ref: v4
Tool version: 4.0.13
Trigger: status updated
Workflow run: 24079184993 attempt 1
Comment timestamp: 2026-04-07T11:34:06.450860+00:00
PR head commit: edd0b1407fa77605d0e1b8dcf4ffcf6309b047e1

Implements §4.2a of docs/audio_generation_v3_design.md. AGG speakers (AGG_M_30-45_001, BEN_M_40-55_003): - Align volume_delta_db to target table: I2→+2, I3→+5, I4→+9, I5→+13 - I5 was +10/+11; now +13 dB above I1 baseline (≥8 dB range required) VIC speakers (VIC_F_25-40_002, SW_F_30-45_001): - Invert pitch trajectory: was 0→+7 st (rises into child-voice range); now -4→-1 st (lower adult female baseline, hard cap at I4–I5) - Fix debug_run_1 defect #2: HilaNeural baseline ~215 Hz rising to 285–287 Hz; new offsets target ≤200 Hz at I1, capped at I4–I5 - Redesign volume: was 0/−1/−3/−2/+4; now 0/+1/+2/+4/+5 (positive escalation so VIC is audible under AGG without being child-loud) - Additional distress at I4–I5 to come from rate instability and timing compression (M6/M7), not further pitch elevation All values are initial calibration defaults from §4.2a — validate by listening test and F0/RMS measurement before scale generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on (#24) * feat(m2a): SSML prosody defaults — VIC F0 cap + AGG loudness escalation Implements §4.2a of docs/audio_generation_v3_design.md. AGG speakers (AGG_M_30-45_001, BEN_M_40-55_003): - Align volume_delta_db to target table: I2→+2, I3→+5, I4→+9, I5→+13 - I5 was +10/+11; now +13 dB above I1 baseline (≥8 dB range required) VIC speakers (VIC_F_25-40_002, SW_F_30-45_001): - Invert pitch trajectory: was 0→+7 st (rises into child-voice range); now -4→-1 st (lower adult female baseline, hard cap at I4–I5) - Fix debug_run_1 defect #2: HilaNeural baseline ~215 Hz rising to 285–287 Hz; new offsets target ≤200 Hz at I1, capped at I4–I5 - Redesign volume: was 0/−1/−3/−2/+4; now 0/+1/+2/+4/+5 (positive escalation so VIC is audible under AGG without being child-loud) - Additional distress at I4–I5 to come from rate instability and timing compression (M6/M7), not further pitch elevation All values are initial calibration defaults from §4.2a — validate by listening test and F0/RMS measurement before scale generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(m2a): measure-prosody CLI command + prosody_metrics module New module synthbanshee/labels/prosody_metrics.py: - measure_clip(wav_path): reads WAV + .jsonl, returns TurnMetrics per event - aggregate_metrics(): groups by (role, intensity), computes median F0 / mean RMS - run_threshold_checks(): evaluates §4.2a thresholds (VIC I1 ≤200 Hz, VIC I4/I5 <250 Hz, AGG I5−I1 ≥8 dB RMS); returns (label, passed, detail) New CLI command: synthbanshee measure-prosody <clip_dir> - Walks clip_dir recursively for .wav files - Prints Rich table of F0 median/std and RMS by role × intensity - Prints §4.2a pass/fail checks with measured values - --output/-o: export per-turn CSV for offline analysis - --roles: filter to specific speaker roles (default AGG,VIC) - Exit code 1 on threshold failure 29 unit tests covering _measure_segment, measure_clip, aggregate_metrics, and run_threshold_checks including edge cases (no data, exact thresholds, librosa unavailable, stereo WAV, missing JSONL). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cli): add --max-clips/-n to generate-batch for wet tests Truncates the selected scene config list after typology-proportional selection, so small smoke/wet-test runs don't require editing run configs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(scenes): add Tier A she_proves scene configs (SP_SV/IT/NEG/NEU_A_0001-0002) 8 Tier A scene configs covering all four typologies (SV, IT, NEG, NEU), mirroring the Tier B configs without the acoustic_scene augmentation block. Seeds in the 1100–1402 range to avoid collisions with Tier B (100–402). Required by: configs/run_configs/tier_a_500_she_proves.yaml, the M2a wet-test batch run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(m2a): address Copilot review comments + patch coverage gaps COPILOT-1: Add test_vic_i4/i5_exactly_at_threshold_fails — explicit boundary tests confirming that exactly 250 Hz fails the strict < threshold for I4/I5. COPILOT-2: Update PR description to accurately reflect scope (config changes + measurement tooling + Tier A scene configs + CLI additions). COPILOT-3: Fix VIC I4/I5 threshold comparison: was <=, now strict < per the spec label "< 250 Hz". Add strict= parameter to _f0_check inner function. COPILOT-4: Rename RoleIntensityStats.f0_std_hz → f0_std_hz_mean and update docstring to "mean of per-turn F0 standard deviations" (not a true pooled std, which would require frame-level counts). Update CLI and test references. Coverage gaps: - prosody_metrics: add TestMeasureSegmentCoverage with int16/float32 RMS equivalence, successful librosa path (sys.modules mock), and all-unvoiced path; add TestMeasureClipStereo for explicit stereo→mono downmix coverage; add test_stats_with_none_f0_reports_no_data for the s.f0_median_hz is None branch in run_threshold_checks - cli: add TestMeasureProsodyCLI covering empty dir, missing JSONL, table output, threshold-failure exit code, CSV export, and --roles filter; add test_max_clips_truncates_selected_configs for the --max-clips branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(config): route generated clips to corpus repo via SYNTHBANSHEE_DATA_DIR Remove hardcoded output_dir: data/he from all scene/run/example YAML configs so the SYNTHBANSHEE_DATA_DIR env var default in SceneConfig and RunConfig takes effect. Add envvar="SYNTHBANSHEE_DATA_DIR" to --output-dir on both generate and generate-batch CLI commands. Add .env.example documenting all required env vars including: SYNTHBANSHEE_DATA_DIR=/path/to/avdp-synth-corpus/data/he With SYNTHBANSHEE_DATA_DIR set, clips land in the corpus repo rather than inside the SynthBanshee source tree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(speakers): flatten AGG/BEN pitch trajectory — cap at +1 st (M2a) Azure normalizes WAV output so volume_delta_db has no effect; the only perceptible change across intensity levels was pitch, causing the aggressor to sound like helium at I4/I5 (+3/+4 st). Cap pitch_delta_st at 0 for I2/I3 and +1 for I4/I5 on both AGG_M_30-45_001 and BEN_M_40-55_003. Rate multiplier (1.0→1.25/1.30) now carries all urgency cues until M3 adds per-turn RMS gain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(measure-prosody): address PR #24 Copilot review (COPILOT-1 through COPILOT-5) COPILOT-1: strengthen test_roles_filter_excludes_other_roles — add AGG I1/I5 clips with sufficient RMS delta so AGG checks pass, assert exit_code == 0 and "VIC" not in output. COPILOT-2 + COPILOT-5: rename CSV header f0_std_hz_mean → f0_std_hz to match the per-turn TurnMetrics field being exported (not an aggregate). COPILOT-3: wrap EventLabel.model_validate_json() per line in measure_clip() with try/except Exception; emit warnings.warn for malformed lines so the tool remains usable on partial datasets. Also restructure loop to skip empty lines explicitly (if not line: continue). COPILOT-4: add include_roles: set[str] | None = None param to run_threshold_checks(); gate VIC checks and AGG escalation check on role membership. Pass include_roles=include_roles from the CLI so --roles AGG does not emit failing VIC no-data checks. Coverage: add test_malformed_jsonl_line_skipped, test_empty_jsonl_lines_ skipped, test_threshold_failure_sets_exit_code_1, and three TestRunThresholdChecks tests for include_roles parameter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci(pr-agent-context): pin to v4.0.18, enable include_outdated_review_threads Pin workflow ref and tool_ref to v4.0.18 (from v4 floating tag) and set include_outdated_review_threads: true, introduced in that release. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(generate-batch): shuffle selected configs before --max-clips truncation Without shuffling, selected[:max_clips] always picks the first N configs in typology-insertion order (e.g. all IT scenes for a Tier A run), making wet-test runs unrepresentative. Shuffle with random.Random(run_cfg.random_seed) before truncating so the sample is typology-balanced and deterministic. Update test assertion to match the new log line format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: clarify has_violence is a derived field, not a prohibited label The previous blanket "no binary Violence/Non-Violence labels" wording was being misread as a reason to drop has_violence from metadata and manifests. Clarify the actual policy across CLAUDE.md, AGENTS.md, README.md, docs/design_approaches.md, and docs/audio_violence_datasdet_project_docs.md: - has_violence is a DERIVED convenience field computed from the hierarchical taxonomy (violence_typology, tier1_category, max_intensity). It belongs in weak_label JSON and manifest.csv; AI teams need it for baseline models and stratified sampling. - What is prohibited is REPLACING the taxonomy with a binary flag as the sole or primary label. - The taxonomy columns remain the ground truth. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…lence rule, measured-peak note) (#103) * docs(spec): #101 — corpus-handoff schema drift Three doc drifts surfaced from a corpus pre-handoff review of DataHackIL/avdp-synth-corpus PR #2 (the 8-clip M2a wet test): 1. The §2 directory diagram and §2.4 / §5.1 / §5.2 / §10 examples used uppercase paths, contradicting §2.5 ("all filenames are lowercase") and the on-disk reality produced by `cli.py:365–367` (`scene_id.lower()`, `speaker_id.lower()`). Consumer teams who try to reconstruct paths from `speakers[].speaker_id` (canonical uppercase) hit a directory that does not exist. 2. The §5.1 example JSON lacked the M11 `generation_metadata` block and `speakers[].voice_family`. Both are part of the current ClipMetadata schema; their absence in the spec left consumer teams guessing at the contract. 3. `weak_label.has_violence` was never pinned in the spec. The downstream corpus README invented a rule (`typology in {SV,IT,NEG} and max_intensity >= 3`) that disagrees with the actual code (`any(e.tier1_category != "NONE" for e in events)`) on every NEG row. Pinned the code rule with a source-file reference so external docs can mirror it. Changes: - §2.1: lowercase-path footnote pointing to §2.3 (no inline narrative) - §2.3: paragraph distinguishing uppercase `speaker_id` (canonical identifier) from the lowercase directory form, with the accurate `speakers[0]` rule and the cli.py source pointer - §2.4: lowercase clip-id examples + brief parenthetical - §5.1: example JSON now has lowercase paths, `-2.0` target, `generation_metadata`, `voice_family`; three short field notes; new sub-section pinning the `has_violence` rule - §5.2 / §10: lowercase `clip_id` to match §2.5 What this does NOT claim (corrections from the closed PR #100): - `generation_metadata` presence is by commit, not by `generator_version: 0.1.0` (the version was bumped before the field was wired — corpus on-disk clips disprove the prior claim) - The directory uses `scene.speakers[0]`, not necessarily AGG; for NEU/NEG scenes the order is whatever the YAML lists first Test plan: - `pytest tests/unit/` — 1688 passed - `ruff check synthbanshee/ tests/ docs/` — clean - Visual review: all uppercase `SP_IT_B_0023` / `_EVT_007` clip-id forms reconciled across §2.4, §5.1, §5.2, §10 Refs #101. Pairs with #102 (cli normalized_dbfs fix); merge order does not matter — they are independent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): v2 — corpus-handoff schema drift Revises 4374460 after a self-review found that v1 still had the same documentation anti-patterns the review of closed PR #100 flagged: rule stated in two places, prose where a table belongs, a misleading `voice_family` example, internal-history milestone language, a technically wrong claim about clip_id in YAML, undocumented event_id casing, and missing `has_violence` boundary cases. Plus a critical correction: v1 (and #102 v1) described `normalized_dbfs` as the configured *target*, but `labels/schema.py:175` already documents the field as the **measured** peak — kept deliberately distinct from `generation_metadata.loudness_target_peak_dbfs` which is the target. Changes vs v1: - §2.1 footnote: shortened to a pointer ("see §2.5 for casing rules") - §2.3 paragraph: collapsed to one line pointing at §2.5's table - §2.4 parenthetical: rewritten — `clip_id` is *derived* from `scene_id` (lowercased + `_NN` segment); the on-disk form does NOT appear verbatim in any YAML - §2.5: new "Identifier casing (per surface)" table — single source of truth for which surfaces use UPPERCASE values vs lowercase paths - §5.1 example: `normalized_dbfs: -2.0` → `-2.013` (a realistic measured value, since the field is measured not target); `voice_family` example given DISTINCT values (`"Avri"` overrides `tts_voice_id`) for AGG, omitted for VIC to demonstrate the optional fallback - §5.1 field notes: four dense bullets replacing the previous bullets-plus-`####`-sub-section; M11 language removed (the spec describes the contract, not its git history); `has_violence` derivation rule moved into a bullet that ALSO documents boundary cases (empty events, typology/flag disagreement) - §5.2: one sentence explaining `event_id` format (literal `EVT` token + lowercase `clip_id` prefix — not a casing inconsistency) Test plan: - `pytest tests/unit/` — 1688 passed - `ruff check synthbanshee/ tests/ docs/` — clean Refs #101 (2/2). Pairs with #102 (`fix(cli):` measured-peak wiring) — either order is safe; both converge on the schema docstring's `normalized_dbfs = measured` contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): #101 — fix Tier-B scene_config path in §5.1 example The example clip is `sp_it_b_0023_00` — the `b` denotes Tier B — but the `scene_config` field showed `configs/scenes/she_proves/`. Tier B She-Proves scene configs actually live under `configs/scenes/she_proves_tier_b/`. Caught by Copilot review on ed65750; consumers reconstructing paths from this example would not find the file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

This comment has been minimized.

Sign in to view

shaypal5 requested a review from Copilot April 7, 2026 11:21

shaypal5 self-assigned this Apr 7, 2026

shaypal5 added CI enhancement New feature or request labels Apr 7, 2026

Copilot started reviewing on behalf of shaypal5 April 7, 2026 11:22 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread .github/workflows/pr-agent-context-refresh.yml

Comment thread .github/workflows/pr-agent-context-refresh.yml

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/pr-agent-context-refresh.yml

This comment has been minimized.

Sign in to view

shaypal5 merged commit 0e69ee8 into main Apr 7, 2026
6 checks passed

shaypal5 deleted the feat/pr-agent-context branch April 7, 2026 11:33

shaypal5 mentioned this pull request Apr 15, 2026

feat(m2a): SSML prosody defaults — VIC F0 cap + AGG loudness escalation #24

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: integrate pr-agent-context with coverage XML and refresh flow#2

ci: integrate pr-agent-context with coverage XML and refresh flow#2
shaypal5 merged 2 commits into
mainfrom
feat/pr-agent-context

shaypal5 commented Apr 7, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Apr 7, 2026

Summary

How it works

Pattern reference

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants