Skip to content

Add golden regression store and coverage-aware validation#23

Closed
DevOpsMadDog wants to merge 1 commit into
mainfrom
codex/implement-golden-regression-dataset-loader
Closed

Add golden regression store and coverage-aware validation#23
DevOpsMadDog wants to merge 1 commit into
mainfrom
codex/implement-golden-regression-dataset-loader

Conversation

@DevOpsMadDog
Copy link
Copy Markdown
Owner

Summary

  • add a golden regression data store for loading and matching historical cases
  • integrate the decision engine with the store to return coverage-aware regression validation details
  • seed historical regression cases and add focused tests that exercise pass/fail/no-coverage paths

Testing

  • pytest tests/test_golden_regression.py

https://chatgpt.com/codex/tasks/task_e_68df1ed639848329924e3bced93280b5

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines 771 to +774
return {
"status": "validated",
"confidence": 0.89,
"similar_cases": 23,
"validation_passed": True
"status": status,
"confidence": average_confidence,
"validation_passed": validation_passed,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Regression failures still boost consensus confidence

When _real_golden_regression_validation finds failing historical cases it sets status to "regression_failed", but it still returns the average match confidence in confidence. Downstream consensus (_real_consensus_checking) only consumes this numeric confidence and never inspects validation_passed, so a regression failure with high-confidence matches will still contribute a positive 0.6–0.9 score toward the overall decision and may allow a release despite detected regressions. Consider zeroing or penalizing confidence when validation_passed is False, or have consensus incorporate the failure flag.

Useful? React with 👍 / 👎.

DevOpsMadDog added a commit that referenced this pull request May 2, 2026
…endpoints

Closed in this batch (canonical envelope, mirroring batch-6 pattern):
- /api/v1/posture-reports/reports #7: canonical envelope shipped
- /api/v1/cloud-ir/incidents #17: canonical envelope shipped
- /api/v1/network-forensics/captures #21: canonical envelope shipped
- /api/v1/network-segmentation/segments #22: canonical envelope shipped
- /api/v1/microsegmentation/segments #23: canonical envelope shipped
- /api/v1/awareness-gamification/challenges #29: canonical envelope shipped
- /api/v1/gdpr/activities #30: canonical envelope shipped

Pattern (class-c): all seven list endpoints upgraded from minimal
{<legacy_key>, total, hint} to the canonical batch-6/batch-7 envelope:
    {
        "items": [...],
        "<legacy_key>": [...],   # back-compat (reports/incidents/captures/etc.)
        "total": int,
        "org_id": str,
        "limit": int,            # ge=1, le=500 — defaults to 50
        "offset": int,           # ge=0 — defaults to 0
        "filters_applied": {...} # echoes every filter param (None if unset)
        "hint": str              # only present when total == 0
    }

Each endpoint now (1) accepts limit + offset query params with FastAPI
ge/le validation, (2) echoes every filter back into filters_applied even
when None (no missing keys), (3) always returns the full envelope shape
even on hit (legacy clients keep their original key, new clients use
items + pagination context), (4) preserves the actionable empty-state
hint with a "this is correct for fresh tenants" framing.

Triage status update: 26/30 fully closed. 4 class-a deferred (need real
cloud creds, OAuth flows, or PAM tenant access not present in fleet —
sprint-able with customer engagement). All class-b importer-gated
endpoints (8) and all class-c structured-empty endpoints (12) now
closed.

Verified: pytest tests/test_empty_endpoints_batch7.py 11/11 PASS.
Beast Mode regression on phase4/phase7/trustgraph/pipeline_api: 170/170 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DevOpsMadDog added a commit that referenced this pull request May 5, 2026
…34c5fb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DevOpsMadDog added a commit that referenced this pull request May 5, 2026
… at HEAD 2c72e3a

Suite 1 Beast Mode 13 files: 753 passed, 0 failed in 8.63s
Suite 2 Perf -m perf: 194 passed, 2 skipped, 0 failed in 26.28s
Suite 3 OWASP -m owasp: 47 passed, 2 skipped, 0 failed in 17.86s
Suite 4 Lockdown (asyncio + coroutines): 11 passed, 0 failed in 6.50s
Total: 1005 passed, 0 failed, 4 skipped, 0 broken collectors
Commits validated since sweep #23: 9945b72, 2c72e3a (doc-only)
CERTIFICATION: ALL GREEN

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant