Skip to content

feat: run robustness estimator on empirical test data and persist metric#21

Merged
daedalus merged 1 commit intomasterfrom
copilot/run-estimator-empirical-data
May 6, 2026
Merged

feat: run robustness estimator on empirical test data and persist metric#21
daedalus merged 1 commit intomasterfrom
copilot/run-estimator-empirical-data

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 6, 2026

Replaces the placeholder example numbers in the robustness evaluator docs with real values measured from the current test suite, and persists the full metric snapshot in impactguard.toml.

Empirical inputs

Value
Tests run 1,054 (425 adversarial / 629 normal)
Adversarial passing 424 / 425
Normal passing 629 / 629
Coverage 57%
α 0.65 (security context)

Results

Metric Value Label
Robustness Score R 0.5691 FAIR
Robustness + Diversity R_d 0.5691
Fragility Index F 0.0024 ROBUST
Adversarial ratio 40.3% ✓ ≥ 25%
Diversity D 1.000 all categories covered

Per-category (taxonomy): boundary 28/28, semantic 22/22, evasion 24/24, compositional 19/19.

Changes

  • impactguard.toml — new [impactguard.robustness] section persisting all metric values and per-category breakdown; acts as the canonical measured baseline
  • README.md — CLI example, Python API snippet, and "Example output" block replaced with the empirical numbers above

Summary by Sourcery

Persist empirically measured robustness metrics from the current test suite and surface them in documentation as the canonical example outputs.

New Features:

  • Add an [impactguard.robustness] section to impactguard.toml that stores the latest robustness evaluation snapshot and per-category adversarial breakdown.

Enhancements:

  • Update README examples for the robustness evaluator CLI and Python API to use real metrics derived from the current test suite instead of placeholder values.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 6, 2026

Reviewer's Guide

Update robustness evaluation documentation and configuration to use empirically measured metrics from the current test suite, and persist the full robustness metric snapshot (including per-category breakdown) in impactguard.toml as a canonical baseline.

File-Level Changes

Change Details Files
Replace README robustness evaluator examples with empirical metrics from the current test suite.
  • Update Python API usage example to pass empirical totals, adversarial/normal counts, coverage, alpha, and per-category CategoryStats matching the current taxonomy tests.
  • Update CLI example invocation arguments (n-total, n-adversarial, passing-adv, passing-norm, coverage, categories JSON) to match empirical test data.
  • Update CLI JSON-output example command to use the same empirical inputs as the primary CLI example.
  • Refresh the sample human-readable CLI output block to show computed metrics and per-category breakdown based on the empirical run, including updated robustness, fragility, and diversity values.
README.md
Persist the latest empirically measured robustness metrics and test composition in configuration as a baseline.
  • Introduce a new [impactguard.robustness] section documenting the last measured robustness evaluation from empirical test runs.
  • Record test composition inputs (n_total, n_adversarial, n_normal, passing_adv, passing_norm, coverage, alpha) used for the robustness calculation.
  • Store derived primary metrics (robustness_score, robustness_score_with_diversity, robustness_label) and adversarial-specific metrics (p_adversarial, p_normal, adversarial_ratio, fragility_index, fragility_label, diversity_score) with inline explanatory comments and thresholds.
  • Add a nested [impactguard.robustness.categories] table capturing per-category adversarial totals and passing counts aligned with test_adversarial_taxonomy.py.
impactguard.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@daedalus daedalus marked this pull request as ready for review May 6, 2026 20:30
@daedalus daedalus merged commit e9bd719 into master May 6, 2026
1 check was pending
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The empirical robustness metrics are now duplicated between README examples and impactguard.toml; consider centralizing these values (e.g., generating README snippets from the TOML or a single snapshot file) to avoid future drift when the baseline is updated.
  • Storing a specific empirical run’s metrics directly in impactguard.toml mixes configuration with measurement output; you might want to move the snapshot to a dedicated metrics/baseline file and keep the TOML focused on user-adjustable settings.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The empirical robustness metrics are now duplicated between README examples and impactguard.toml; consider centralizing these values (e.g., generating README snippets from the TOML or a single snapshot file) to avoid future drift when the baseline is updated.
- Storing a specific empirical run’s metrics directly in impactguard.toml mixes configuration with measurement output; you might want to move the snapshot to a dedicated metrics/baseline file and keep the TOML focused on user-adjustable settings.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@daedalus daedalus deleted the copilot/run-estimator-empirical-data branch May 7, 2026 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants