ci: integrate pr-agent-context with coverage XML and refresh flow#2
Merged
Conversation
Follows the pattern from DataHackIL/tfht_enforce_idx:
CI workflow (ci.yml):
- Test job now runs with --cov avdp --cov-branch --cov-report= and uploads
raw .coverage.py{version} data as pr-agent-context-coverage-py{version}
artifacts (one per matrix entry)
- New coverage job: downloads all raw artifacts, combines with coverage combine,
generates coverage.xml, uploads as coverage-xml artifact
- New pr-agent-context job: runs after lint/test/coverage on every PR using
coverage_xml_artifact mode pointing at coverage-xml; uses template file
Refresh workflow (pr-agent-context-refresh.yml):
- Triggers on pull_request_review, pull_request_review_comment, check_run, status
- Runs in append mode with hide_previous_managed_comments_on_append (default true)
so old pr-agent-context comments are hidden when a new one is posted
- wait_for_reviews_to_settle: true so resolved threads are not re-reported
- enable_cross_run_coverage_lookup: true with coverage_source_workflows: CI
so refresh runs can reuse coverage artifacts from the CI producer run
Template (.github/pr-agent-context-template.md):
- Standard section layout: preamble, opening instructions, copilot comments,
review comments, failing checks, patch coverage
pyproject.toml:
- Add [tool.coverage.run/paths/report/xml] config so coverage combine and
coverage xml work correctly across matrix runs and local development
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Integrates pr-agent-context into CI by generating a combined coverage.xml across Python matrix runs and adds a refresh workflow that re-posts/upgrades the PR context comment as reviews and checks change.
Changes:
- Upload per-matrix raw coverage data, then combine into a single
coverage.xmlartifact in a dedicatedcoveragejob. - Add a
pr-agent-contextCI job plus a new refresh workflow to append updated context comments on review/check activity. - Add
coverageconfiguration topyproject.tomland introduce a standardized PR-agent context prompt template.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
pyproject.toml |
Adds [tool.coverage.*] configuration to support coverage combine and coverage xml. |
.github/workflows/ci.yml |
Uploads raw coverage artifacts from matrix tests, combines them into coverage.xml, and posts the PR agent context comment. |
.github/workflows/pr-agent-context-refresh.yml |
New workflow to refresh/re-post the PR agent context comment when reviews/comments/checks change. |
.github/pr-agent-context-template.md |
Adds the prompt template layout used by pr-agent-context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
Replace hardcoded /home/runner/work/SynthBanshee/SynthBanshee/avdp with /home/runner/work/*/*/avdp so coverage path normalization works regardless of repo name, fork, or self-hosted runner workspace layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
|
pr-agent-context report: 🚨 `pr-agent-context` failed while preparing PR context.
PR: #2
Error: CalledProcessError: Command '['git', '-C', '/home/runner/work/SynthBanshee/SynthBanshee/caller-repo', 'diff', '--unified=0', '0efa0f83fad69b448f3037f09c3642a5ea209007...edd0b1407fa77605d0e1b8dcf4ffcf6309b047e1']' returned non-zero exit status 128.
Run: https://github.com/DataHackIL/SynthBanshee/actions/runs/24079184993
The workflow continued gracefully so this failure does not block CI.
Check the job logs for the full traceback.Run metadata: |
shaypal5
added a commit
that referenced
this pull request
Apr 15, 2026
Implements §4.2a of docs/audio_generation_v3_design.md. AGG speakers (AGG_M_30-45_001, BEN_M_40-55_003): - Align volume_delta_db to target table: I2→+2, I3→+5, I4→+9, I5→+13 - I5 was +10/+11; now +13 dB above I1 baseline (≥8 dB range required) VIC speakers (VIC_F_25-40_002, SW_F_30-45_001): - Invert pitch trajectory: was 0→+7 st (rises into child-voice range); now -4→-1 st (lower adult female baseline, hard cap at I4–I5) - Fix debug_run_1 defect #2: HilaNeural baseline ~215 Hz rising to 285–287 Hz; new offsets target ≤200 Hz at I1, capped at I4–I5 - Redesign volume: was 0/−1/−3/−2/+4; now 0/+1/+2/+4/+5 (positive escalation so VIC is audible under AGG without being child-loud) - Additional distress at I4–I5 to come from rate instability and timing compression (M6/M7), not further pitch elevation All values are initial calibration defaults from §4.2a — validate by listening test and F0/RMS measurement before scale generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tasks
shaypal5
added a commit
that referenced
this pull request
Apr 16, 2026
…on (#24) * feat(m2a): SSML prosody defaults — VIC F0 cap + AGG loudness escalation Implements §4.2a of docs/audio_generation_v3_design.md. AGG speakers (AGG_M_30-45_001, BEN_M_40-55_003): - Align volume_delta_db to target table: I2→+2, I3→+5, I4→+9, I5→+13 - I5 was +10/+11; now +13 dB above I1 baseline (≥8 dB range required) VIC speakers (VIC_F_25-40_002, SW_F_30-45_001): - Invert pitch trajectory: was 0→+7 st (rises into child-voice range); now -4→-1 st (lower adult female baseline, hard cap at I4–I5) - Fix debug_run_1 defect #2: HilaNeural baseline ~215 Hz rising to 285–287 Hz; new offsets target ≤200 Hz at I1, capped at I4–I5 - Redesign volume: was 0/−1/−3/−2/+4; now 0/+1/+2/+4/+5 (positive escalation so VIC is audible under AGG without being child-loud) - Additional distress at I4–I5 to come from rate instability and timing compression (M6/M7), not further pitch elevation All values are initial calibration defaults from §4.2a — validate by listening test and F0/RMS measurement before scale generation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(m2a): measure-prosody CLI command + prosody_metrics module New module synthbanshee/labels/prosody_metrics.py: - measure_clip(wav_path): reads WAV + .jsonl, returns TurnMetrics per event - aggregate_metrics(): groups by (role, intensity), computes median F0 / mean RMS - run_threshold_checks(): evaluates §4.2a thresholds (VIC I1 ≤200 Hz, VIC I4/I5 <250 Hz, AGG I5−I1 ≥8 dB RMS); returns (label, passed, detail) New CLI command: synthbanshee measure-prosody <clip_dir> - Walks clip_dir recursively for .wav files - Prints Rich table of F0 median/std and RMS by role × intensity - Prints §4.2a pass/fail checks with measured values - --output/-o: export per-turn CSV for offline analysis - --roles: filter to specific speaker roles (default AGG,VIC) - Exit code 1 on threshold failure 29 unit tests covering _measure_segment, measure_clip, aggregate_metrics, and run_threshold_checks including edge cases (no data, exact thresholds, librosa unavailable, stereo WAV, missing JSONL). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cli): add --max-clips/-n to generate-batch for wet tests Truncates the selected scene config list after typology-proportional selection, so small smoke/wet-test runs don't require editing run configs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(scenes): add Tier A she_proves scene configs (SP_SV/IT/NEG/NEU_A_0001-0002) 8 Tier A scene configs covering all four typologies (SV, IT, NEG, NEU), mirroring the Tier B configs without the acoustic_scene augmentation block. Seeds in the 1100–1402 range to avoid collisions with Tier B (100–402). Required by: configs/run_configs/tier_a_500_she_proves.yaml, the M2a wet-test batch run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(m2a): address Copilot review comments + patch coverage gaps COPILOT-1: Add test_vic_i4/i5_exactly_at_threshold_fails — explicit boundary tests confirming that exactly 250 Hz fails the strict < threshold for I4/I5. COPILOT-2: Update PR description to accurately reflect scope (config changes + measurement tooling + Tier A scene configs + CLI additions). COPILOT-3: Fix VIC I4/I5 threshold comparison: was <=, now strict < per the spec label "< 250 Hz". Add strict= parameter to _f0_check inner function. COPILOT-4: Rename RoleIntensityStats.f0_std_hz → f0_std_hz_mean and update docstring to "mean of per-turn F0 standard deviations" (not a true pooled std, which would require frame-level counts). Update CLI and test references. Coverage gaps: - prosody_metrics: add TestMeasureSegmentCoverage with int16/float32 RMS equivalence, successful librosa path (sys.modules mock), and all-unvoiced path; add TestMeasureClipStereo for explicit stereo→mono downmix coverage; add test_stats_with_none_f0_reports_no_data for the s.f0_median_hz is None branch in run_threshold_checks - cli: add TestMeasureProsodyCLI covering empty dir, missing JSONL, table output, threshold-failure exit code, CSV export, and --roles filter; add test_max_clips_truncates_selected_configs for the --max-clips branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(config): route generated clips to corpus repo via SYNTHBANSHEE_DATA_DIR Remove hardcoded output_dir: data/he from all scene/run/example YAML configs so the SYNTHBANSHEE_DATA_DIR env var default in SceneConfig and RunConfig takes effect. Add envvar="SYNTHBANSHEE_DATA_DIR" to --output-dir on both generate and generate-batch CLI commands. Add .env.example documenting all required env vars including: SYNTHBANSHEE_DATA_DIR=/path/to/avdp-synth-corpus/data/he With SYNTHBANSHEE_DATA_DIR set, clips land in the corpus repo rather than inside the SynthBanshee source tree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(speakers): flatten AGG/BEN pitch trajectory — cap at +1 st (M2a) Azure normalizes WAV output so volume_delta_db has no effect; the only perceptible change across intensity levels was pitch, causing the aggressor to sound like helium at I4/I5 (+3/+4 st). Cap pitch_delta_st at 0 for I2/I3 and +1 for I4/I5 on both AGG_M_30-45_001 and BEN_M_40-55_003. Rate multiplier (1.0→1.25/1.30) now carries all urgency cues until M3 adds per-turn RMS gain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(measure-prosody): address PR #24 Copilot review (COPILOT-1 through COPILOT-5) COPILOT-1: strengthen test_roles_filter_excludes_other_roles — add AGG I1/I5 clips with sufficient RMS delta so AGG checks pass, assert exit_code == 0 and "VIC" not in output. COPILOT-2 + COPILOT-5: rename CSV header f0_std_hz_mean → f0_std_hz to match the per-turn TurnMetrics field being exported (not an aggregate). COPILOT-3: wrap EventLabel.model_validate_json() per line in measure_clip() with try/except Exception; emit warnings.warn for malformed lines so the tool remains usable on partial datasets. Also restructure loop to skip empty lines explicitly (if not line: continue). COPILOT-4: add include_roles: set[str] | None = None param to run_threshold_checks(); gate VIC checks and AGG escalation check on role membership. Pass include_roles=include_roles from the CLI so --roles AGG does not emit failing VIC no-data checks. Coverage: add test_malformed_jsonl_line_skipped, test_empty_jsonl_lines_ skipped, test_threshold_failure_sets_exit_code_1, and three TestRunThresholdChecks tests for include_roles parameter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci(pr-agent-context): pin to v4.0.18, enable include_outdated_review_threads Pin workflow ref and tool_ref to v4.0.18 (from v4 floating tag) and set include_outdated_review_threads: true, introduced in that release. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(generate-batch): shuffle selected configs before --max-clips truncation Without shuffling, selected[:max_clips] always picks the first N configs in typology-insertion order (e.g. all IT scenes for a Tier A run), making wet-test runs unrepresentative. Shuffle with random.Random(run_cfg.random_seed) before truncating so the sample is typology-balanced and deterministic. Update test assertion to match the new log line format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: clarify has_violence is a derived field, not a prohibited label The previous blanket "no binary Violence/Non-Violence labels" wording was being misread as a reason to drop has_violence from metadata and manifests. Clarify the actual policy across CLAUDE.md, AGENTS.md, README.md, docs/design_approaches.md, and docs/audio_violence_datasdet_project_docs.md: - has_violence is a DERIVED convenience field computed from the hierarchical taxonomy (violence_typology, tier1_category, max_intensity). It belongs in weak_label JSON and manifest.csv; AI teams need it for baseline models and stratified sampling. - What is prohibited is REPLACING the taxonomy with a binary flag as the sole or primary label. - The taxonomy columns remain the ground truth. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced May 7, 2026
Open
shaypal5
added a commit
that referenced
this pull request
May 11, 2026
…lence rule, measured-peak note) (#103) * docs(spec): #101 — corpus-handoff schema drift Three doc drifts surfaced from a corpus pre-handoff review of DataHackIL/avdp-synth-corpus PR #2 (the 8-clip M2a wet test): 1. The §2 directory diagram and §2.4 / §5.1 / §5.2 / §10 examples used uppercase paths, contradicting §2.5 ("all filenames are lowercase") and the on-disk reality produced by `cli.py:365–367` (`scene_id.lower()`, `speaker_id.lower()`). Consumer teams who try to reconstruct paths from `speakers[].speaker_id` (canonical uppercase) hit a directory that does not exist. 2. The §5.1 example JSON lacked the M11 `generation_metadata` block and `speakers[].voice_family`. Both are part of the current ClipMetadata schema; their absence in the spec left consumer teams guessing at the contract. 3. `weak_label.has_violence` was never pinned in the spec. The downstream corpus README invented a rule (`typology in {SV,IT,NEG} and max_intensity >= 3`) that disagrees with the actual code (`any(e.tier1_category != "NONE" for e in events)`) on every NEG row. Pinned the code rule with a source-file reference so external docs can mirror it. Changes: - §2.1: lowercase-path footnote pointing to §2.3 (no inline narrative) - §2.3: paragraph distinguishing uppercase `speaker_id` (canonical identifier) from the lowercase directory form, with the accurate `speakers[0]` rule and the cli.py source pointer - §2.4: lowercase clip-id examples + brief parenthetical - §5.1: example JSON now has lowercase paths, `-2.0` target, `generation_metadata`, `voice_family`; three short field notes; new sub-section pinning the `has_violence` rule - §5.2 / §10: lowercase `clip_id` to match §2.5 What this does NOT claim (corrections from the closed PR #100): - `generation_metadata` presence is by commit, not by `generator_version: 0.1.0` (the version was bumped before the field was wired — corpus on-disk clips disprove the prior claim) - The directory uses `scene.speakers[0]`, not necessarily AGG; for NEU/NEG scenes the order is whatever the YAML lists first Test plan: - `pytest tests/unit/` — 1688 passed - `ruff check synthbanshee/ tests/ docs/` — clean - Visual review: all uppercase `SP_IT_B_0023` / `_EVT_007` clip-id forms reconciled across §2.4, §5.1, §5.2, §10 Refs #101. Pairs with #102 (cli normalized_dbfs fix); merge order does not matter — they are independent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): v2 — corpus-handoff schema drift Revises 4374460 after a self-review found that v1 still had the same documentation anti-patterns the review of closed PR #100 flagged: rule stated in two places, prose where a table belongs, a misleading `voice_family` example, internal-history milestone language, a technically wrong claim about clip_id in YAML, undocumented event_id casing, and missing `has_violence` boundary cases. Plus a critical correction: v1 (and #102 v1) described `normalized_dbfs` as the configured *target*, but `labels/schema.py:175` already documents the field as the **measured** peak — kept deliberately distinct from `generation_metadata.loudness_target_peak_dbfs` which is the target. Changes vs v1: - §2.1 footnote: shortened to a pointer ("see §2.5 for casing rules") - §2.3 paragraph: collapsed to one line pointing at §2.5's table - §2.4 parenthetical: rewritten — `clip_id` is *derived* from `scene_id` (lowercased + `_NN` segment); the on-disk form does NOT appear verbatim in any YAML - §2.5: new "Identifier casing (per surface)" table — single source of truth for which surfaces use UPPERCASE values vs lowercase paths - §5.1 example: `normalized_dbfs: -2.0` → `-2.013` (a realistic measured value, since the field is measured not target); `voice_family` example given DISTINCT values (`"Avri"` overrides `tts_voice_id`) for AGG, omitted for VIC to demonstrate the optional fallback - §5.1 field notes: four dense bullets replacing the previous bullets-plus-`####`-sub-section; M11 language removed (the spec describes the contract, not its git history); `has_violence` derivation rule moved into a bullet that ALSO documents boundary cases (empty events, typology/flag disagreement) - §5.2: one sentence explaining `event_id` format (literal `EVT` token + lowercase `clip_id` prefix — not a casing inconsistency) Test plan: - `pytest tests/unit/` — 1688 passed - `ruff check synthbanshee/ tests/ docs/` — clean Refs #101 (2/2). Pairs with #102 (`fix(cli):` measured-peak wiring) — either order is safe; both converge on the schema docstring's `normalized_dbfs = measured` contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(spec): #101 — fix Tier-B scene_config path in §5.1 example The example clip is `sp_it_b_0023_00` — the `b` denotes Tier B — but the `scene_config` field showed `configs/scenes/she_proves/`. Tier B She-Proves scene configs actually live under `configs/scenes/she_proves_tier_b/`. Caught by Copilot review on ed65750; consumers reconstructing paths from this example would not find the file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.coverage.py{version}artifacts; a newcoveragejob combines them and producescoverage.xml; a newpr-agent-contextjob posts an AI-agent-readable context comment on every PRpr-agent-context-refresh.ymlre-posts the comment (in append mode, hiding the previous one) whenever a check completes, a review is submitted, or a review comment changes.github/pr-agent-context-template.mdwith the standard section layout (preamble, opening instructions, copilot comments, review comments, failing checks, patch coverage)[tool.coverage.*]sections added topyproject.tomlsocoverage combineandcoverage xmlwork correctly across matrix runsHow it works
Pattern reference
Follows the implementation in DataHackIL/tfht_enforce_idx:
patch_coverage_source_mode: coverage_xml_artifact(combined XML, not raw artifacts)coverage_source_workflows: CIfor cross-run lookup in refreshpublish_mode: appendwithhide_previous_managed_comments_on_append: true(default)wait_for_reviews_to_settle: truein refresh so resolved threads are not re-reportedTest plan
pr-agent-contextcomment appears with coverage section🤖 Generated with Claude Code