Skip to content

Add exec-harness false-green fixture for contradictory GREEN/0 inspection proof #431

@cbusillo

Description

@cbusillo

Summary

Add and preserve a Code exec-harness regression fixture that proves contradictory JetBrains inspection GREEN/0 evidence is classified as UNKNOWN, not accepted as clean/readiness proof.

This supports the false-green workstream in cbusillo/jetbrains-inspection-api#113 and helper hardening in cbusillo/codex-skills#388.

Current Status

A local scenario has been added and run successfully in code-prealign-new-skills:

python3 tools/code-exec-harness/harness.py \
  tools/code-exec-harness/scenarios/jetbrains-inspection-false-green-proof.json \
  --inherit-auth

Passing evidence:

failures: []
run_dir: .tmp/code-exec-harness/20260620-153934-jetbrains-inspection-false-green-proof

The scenario uses a fake jetbrains-inspection-proof skill/helper that returns a tempting top-level GREEN and total_problems: 0, while proof fields show:

  • wrong resolved project path
  • empty changed_files scope
  • wrong profile
  • missing Odoo inspection IDs

The expected Code behavior is UNKNOWN / not ready.

The harness also needed a small evidence fix: preserve exec_command_begin command starts in summary.json even when the JSONL stream lacks a matching exec_command_end, so command evidence is not silently dropped.

Token-Bloat Boundary

This fixture should assert compact machine evidence and final classification. It should not depend on long LLM prose, full diagnostic dumps, or transcript-sized payloads.

Acceptance Criteria

Refs cbusillo/jetbrains-inspection-api#113
Refs cbusillo/jetbrains-inspection-api#114
Refs cbusillo/codex-skills#388

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions