feat(security-issue-sync): pre-flight v2 — skill-marker detection#416
Merged
potiuk merged 1 commit intoMay 31, 2026
Merged
Conversation
…detection + relaxed rules A dry-run of apache#414's pre-flight against a real adopter tracker revealed the original rules misfired in two ways: - The "last comment author is a bot" check was structurally unreachable on single-operator private trackers where the sync skill writes rollup updates as the operator's personal GitHub user, not as a *[bot] account. - The 7-day updatedAt safety override caught most trackers because every tracker had been touched by the recent sync itself (rollup-comment writes, label flips) — conflating skill activity with substantive activity. Skip rate measured ~5% in this setup vs the predicted 30-50%. This tunes the classifier with two changes: 1. Skill-or-bot detection. Treat a comment as bot-equivalent when its body starts with the skill marker `<!-- apache-steward: ` (matches every status-rollup, release-manager hand-off, and wrap-up comment the framework writes). Falls back to the original `*[bot]` login check, plus an override-file hook for adopters with personal-account bots. Requires fetching body on the last comment — bumps query response size moderately (still cheaper than one subagent transcript), and the body field is what enables the skill-marker detection that drives most of the real-world skip rate. 2. Relaxed lifecycle skip rules. The original "idle > 14d" gates were a safety net for the broken bot-detection. With skill-or-bot detection working, the "all phases done; awaiting release" / "fix released; awaiting advisory" patterns are skip-eligible regardless of comment age — the skill marker itself is the "nothing new since last sync" signal. Re-running the dry-run on the same setup: skip rate ~5% → ~30%, and the skipped trackers were all correctly steady-state ones. Adds a new "fix released; awaiting advisory propagation" skip rule for the `cve allocated + fix released` label set — the single largest contributor to the new skip count.
This was referenced May 31, 2026
potiuk
added a commit
that referenced
this pull request
May 31, 2026
…ifier rule (#423) PR #418 shipped the preflight-audit CLI with a replay mode but no fixture exercising the full classifier end-to-end. This adds: - `tests/fixtures/synthetic_workspace_sweep.json` — 12-issue GraphQL response, one issue per rule path (Rule 1 dispatch, Rule 1-yields-then-Rule-7, Rule 2 dispatch-urgent, Rules 3-7 skip-noop, GitHub-App bot login, personal-bot needing override, fall-through dispatch, recently-closed dispatch). Each issue node carries a `_purpose` annotation documenting which rule it should land on. - `tests/test_eval_replay.py` — drives `classify_response` against the fixture with a pinned `now` (2026-06-01T12:00:00Z) and asserts: 1. The full per-decision bucket distribution (positional identifiers per bucket). 2. The same distribution under `extra_bot_logins` — one issue migrates from dispatch to skip-noop with the override. 3. Per-issue assertions with reason-substring matches, keeping the fixture's `_purpose` annotations in lock-step with the classifier behaviour. 4. A skip-rate floor (≥30%) matching the real-world target after #416's rule tuning. A rule change that alters the distribution fails one of the asserts; the diff in the failing assertion tells the reviewer how the rule affects coverage before they ever look at real adopter data. The eval is deterministic (no live `gh` calls, fixed `now`) so CI runs it in milliseconds. This closes the tune-then-verify loop one more rung up — PR #416 used a one-off `/tmp/` script, PR #418 promoted it to a CLI, and this PR locks the rule behaviour into the test suite.
potiuk
added a commit
to potiuk/airflow-steward
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the apache#410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388) - out-of-context — read/PATCH one field without loading the body (apache#412 github-body-field, apache#424 github-rollup) - fetch-upfront — batch per-item round-trips (apache#347) - preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
potiuk
added a commit
to potiuk/airflow-steward
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the apache#410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388) - out-of-context — read/PATCH one field without loading the body (apache#412 github-body-field, apache#424 github-rollup) - fetch-upfront — batch per-item round-trips (apache#347) - preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
potiuk
added a commit
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns (#427) Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the #410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (#386/#387/#388) - out-of-context — read/PATCH one field without loading the body (#412 github-body-field, #424 github-rollup) - fetch-upfront — batch per-item round-trips (#347) - preflight-classifier — skip obvious no-ops before LLM passes (#414/#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A dry-run of #414's pre-flight against a real adopter tracker surfaced two issues that kept the skip rate at ~5% instead of the predicted 30–50%:
*[bot]account — so the gate never fired.updatedAtsafety override caught most trackers because the sync itself bumpsupdatedAtevery time it writes a rollup append or flips a label. The override conflated skill activity with substantive activity.Changes
<!-- apache-steward:(matches every status-rollup / RM hand-off / wrap-up comment). Falls back to the original*[bot]login check plus an override-file hook for adopters with personal-account bots.cve allocated + fix released + skill-last→skip-noop(fix released; awaiting advisory propagation). Largest contributor to the new skip count in dry-run.bodyon the last comment (was justauthor { login }+createdAt). Response size grows moderately depending on rollup length, still cheaper than one subagent transcript.Measurement
Same set, same trackers, before/after on the rule change: skip rate ~5% → ~30%, and the new skips were all correctly identified steady-state trackers. The savings are recurring (every
sync allinvocation), so the per-sweep win compounds.Test plan
lycheeon the edited file — cleanskill-and-tool-validate— no new violationsprek(markdownlint, typos, format) — green🤖 Generated with Claude Code