Add constraint engine and ledger to osint_core#29
Conversation
Introduces osint_core.constraints, a composable invariant evaluator that sits on top of policy.evaluate_modules. It surfaces authorization, forbidden capability, unknown-module, audit-payload, and correction-verb constraints as structured ConstraintResults, derives a recommended correction verb via the closed allowlist, and emits a redacted ledger entry suitable for persistence under runs/constraints/. Pins the central invariant of this branch with tests proving authorized-only modules remain blocked unless the caller asserts BOTH authorized_target=True and passive_only=False. Ledger writer routes through enforce_audit_payload so raw indicator fields can never reach disk. No behaviour change to the running Space; constraints are additive and not yet wired into app.py's run_enrichment. https://claude.ai/code/session_01EWC4Cw5H4ARyBSK62iotcB
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughA new constraint evaluation engine is added to Changes
Sequence DiagramsequenceDiagram
participant Client
participant Evaluator as Constraint<br/>Evaluator
participant Policy as Policy<br/>Module
participant Ledger as Ledger<br/>Writer
Client->>Evaluator: evaluate_constraints(context, payload)
activate Evaluator
Evaluator->>Policy: evaluate_modules(request)
activate Policy
Policy-->>Evaluator: PolicyEvaluation<br/>(allowed, blocked modules)
deactivate Policy
Evaluator->>Evaluator: Check authorization constraints<br/>Check capability constraints<br/>Check unknown modules<br/>Check audit-payload rules
Evaluator->>Policy: enforce_correction_verb(verb)<br/>enforce_audit_payload(payload)
activate Policy
Policy-->>Evaluator: validation result
deactivate Policy
Evaluator-->>Client: ConstraintReport<br/>(results, policy_eval, correction_verb)
deactivate Evaluator
Client->>Evaluator: build_ledger_entry(context, payload,<br/>report, policy_eval)
activate Evaluator
Evaluator->>Policy: enforce_audit_payload(payload)
activate Policy
Policy-->>Evaluator: validated payload
deactivate Policy
Evaluator-->>Client: audit-safe ledger entry
deactivate Evaluator
Client->>Ledger: write_constraint_ledger(run_id, entry)
activate Ledger
Ledger->>Ledger: Create constraints/ directory<br/>Persist to JSON
Ledger-->>Client: success
deactivate Ledger
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 51 minutes and 54 seconds.Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a constraint engine to the osint_core package, providing a framework for evaluating request invariants and generating audit-safe ledger entries. It includes built-in evaluators for authorization, forbidden capabilities, and payload safety, along with a comprehensive test suite. Review feedback highlights opportunities to clean up unused code, such as the PASSIVITY enum and the _POLICY_TO_CONSTRAINT_CODE mapping. Additionally, it is recommended to externalize timestamp generation to maintain the engine's documented purity and to reduce argument redundancy in the ledger entry functions.
|
|
||
| class ConstraintCode(str, Enum): | ||
| AUTHORIZATION = "authorization" | ||
| PASSIVITY = "passivity" |
There was a problem hiding this comment.
ConstraintCode.PASSIVITY is defined and mapped to a correction verb (line 140), but it is not utilized by any of the built-in constraint evaluators. Currently, _authorization_constraint handles both authorization and passivity-related policy violations using ConstraintCode.AUTHORIZATION. Consider either implementing a specific passivity constraint or removing this unused enum member to avoid confusion.
| _POLICY_TO_CONSTRAINT_CODE: dict[PolicyErrorCode, ConstraintCode] = { | ||
| PolicyErrorCode.AUTHORIZATION_REQUIRED: ConstraintCode.AUTHORIZATION, | ||
| PolicyErrorCode.FORBIDDEN_MODULE: ConstraintCode.FORBIDDEN_CAPABILITY, | ||
| PolicyErrorCode.UNKNOWN_MODULE: ConstraintCode.UNKNOWN_MODULE, | ||
| PolicyErrorCode.RAW_LOGGING_BLOCKED: ConstraintCode.AUDIT_PAYLOAD, | ||
| PolicyErrorCode.INVALID_CORRECTION_VERB: ConstraintCode.CORRECTION_VERB, | ||
| } |
| results=results, | ||
| policy_evaluation=policy_eval, | ||
| enforced_correction_verb=enforced_verb, | ||
| timestamp=datetime.now(timezone.utc).isoformat(), |
There was a problem hiding this comment.
The evaluate_constraints function is documented as a pure function (lines 15 and 352), but it calls datetime.now(timezone.utc), which is a side effect. This makes the function harder to test deterministically. Consider passing the timestamp as an optional argument or including it in the ConstraintContext to ensure the engine remains truly pure.
| def build_ledger_entry( | ||
| report: ConstraintReport, | ||
| *, | ||
| run_id: str, | ||
| indicator_hash: str, | ||
| indicator_type: str, | ||
| ) -> dict: |
There was a problem hiding this comment.
The build_ledger_entry function requires indicator_hash and indicator_type as separate arguments, even though these fields are already present in the ConstraintContext used to generate the ConstraintReport. This redundancy increases the risk of logging data that doesn't match the evaluation context. It would be more robust to include the context in the ConstraintReport or pass the context object directly to this function.
- Drop unused ConstraintCode.PASSIVITY and _POLICY_TO_CONSTRAINT_CODE map. The authorization constraint already covers passivity-related blocks via the same PolicyErrorCode, and the policy→constraint translation table had no callers. - Make evaluate_constraints pure when a `now` is supplied; default to a single datetime.now read at call time. Tests can now assert deterministic timestamps. - Attach ConstraintContext to ConstraintReport. build_ledger_entry and write_constraint_ledger now read indicator_hash and indicator_type from report.context, eliminating the risk of caller-supplied values diverging from what was actually evaluated. - Add tests pinning the new behaviours. https://claude.ai/code/session_01EWC4Cw5H4ARyBSK62iotcB
|
Analysis CompleteGenerated ECC bundle from 2 commits | Confidence: 50% View Pull Request #35Repository Profile
Changed Files (3)
Top hotspots
Top directories
Review Activity (1 reviews, 4 inline comments, 4 unresolved threads)
Top unresolved thread files
Latest reviewer states
Detected Workflows (2)
Generated Instincts (18)
After merging, import with: Files
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_constraints.py (1)
328-338: ⚡ Quick win
write_constraint_ledgeris never called in this test — the audit-sink invariant isn't exercisedPer coding guidelines, tests should be added in the style of
test_audit_payload_blocks_raw_indicator_fieldswhen introducing a new audit sink.write_constraint_ledgeris the new sink, but the test namedtest_write_constraint_ledger_blocks_when_payload_would_leak_rawonly callsenforce_audit_payloaddirectly on a manually-mutated dict — it never toucheswrite_constraint_ledger. The actual sink is not guarded by this test.Consider replacing (or supplementing) with a test that patches
build_ledger_entryto return a poisoned entry and confirmswrite_constraint_ledgerpropagates thePolicyViolationException, e.g.:def test_write_constraint_ledger_rejects_raw_indicator_in_entry(tmp_path, monkeypatch): report = evaluate_constraints(_ctx(requested_modules=("Resource Links",))) def poisoned_entry(r, *, run_id): e = {"run_id": run_id, "raw_indicator": "example.com"} return e monkeypatch.setattr( "osint_core.constraints.build_ledger_entry", poisoned_entry ) with pytest.raises(PolicyViolationException): write_constraint_ledger(report, run_id="run_x", base_dir=tmp_path)As per coding guidelines: "Add tests in the style of
test_audit_payload_blocks_raw_indicator_fieldswhen introducing new audit sinks."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_constraints.py` around lines 328 - 338, The test currently never calls write_constraint_ledger so the sink invariant isn't exercised; modify or add a test (e.g., test_write_constraint_ledger_rejects_raw_indicator_in_entry) that patches osint_core.constraints.build_ledger_entry to return a poisoned entry containing "raw_indicator", then call write_constraint_ledger(report, run_id="run_x", base_dir=tmp_path) and assert it raises PolicyViolationException (use pytest.raises), following the style of test_audit_payload_blocks_raw_indicator_fields and using the monkeypatch and tmp_path fixtures to patch build_ledger_entry and isolate filesystem effects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@osint_core/constraints.py`:
- Around line 459-460: The code constructs a file path with an unsanitized
run_id (ledger_dir / f"{run_id}.json") which permits path traversal; fix by
normalizing or validating run_id before building the path — e.g. replace usage
of run_id with a sanitized basename like Path(run_id).name or apply an
allowlist/regex and reject invalid values, then use ledger_dir /
f"{sanitized_run_id}.json" when calling path.write_text to ensure files cannot
escape the ledger_dir.
In `@tests/test_constraints.py`:
- Around line 105-108: The test currently asserts order-sensitive equality on
report.policy_evaluation.allowed_modules which is fragile if evaluate_modules
changes ordering; update the assertion in tests/test_constraints.py to compare
as a set (or compare sorted lists) instead of direct list equality—locate the
block using _result_for(report, ConstraintCode.AUTHORIZATION) and the
report.policy_evaluation.allowed_modules reference and replace the final assert
with a set(report.policy_evaluation.allowed_modules) ==
{"http_headers","robots_txt"} (or sorted(...) comparisons) so the test only
checks membership not order.
---
Nitpick comments:
In `@tests/test_constraints.py`:
- Around line 328-338: The test currently never calls write_constraint_ledger so
the sink invariant isn't exercised; modify or add a test (e.g.,
test_write_constraint_ledger_rejects_raw_indicator_in_entry) that patches
osint_core.constraints.build_ledger_entry to return a poisoned entry containing
"raw_indicator", then call write_constraint_ledger(report, run_id="run_x",
base_dir=tmp_path) and assert it raises PolicyViolationException (use
pytest.raises), following the style of
test_audit_payload_blocks_raw_indicator_fields and using the monkeypatch and
tmp_path fixtures to patch build_ledger_entry and isolate filesystem effects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b46f45ef-06cd-4955-96bb-1e860a7436b9
📒 Files selected for processing (3)
osint_core/__init__.pyosint_core/constraints.pytests/test_constraints.py
| auth = _result_for(report, ConstraintCode.AUTHORIZATION) | ||
| assert report.decision == PolicyDecision.ALLOW | ||
| assert auth.status == ConstraintStatus.SATISFIED | ||
| assert report.policy_evaluation.allowed_modules == ["http_headers", "robots_txt"] |
There was a problem hiding this comment.
Order-sensitive list equality is fragile
If evaluate_modules ever changes its ordering, this assertion will fail silently. Use a set or sorted comparison:
🛡️ Proposed fix
- assert report.policy_evaluation.allowed_modules == ["http_headers", "robots_txt"]
+ assert set(report.policy_evaluation.allowed_modules) == {"http_headers", "robots_txt"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auth = _result_for(report, ConstraintCode.AUTHORIZATION) | |
| assert report.decision == PolicyDecision.ALLOW | |
| assert auth.status == ConstraintStatus.SATISFIED | |
| assert report.policy_evaluation.allowed_modules == ["http_headers", "robots_txt"] | |
| auth = _result_for(report, ConstraintCode.AUTHORIZATION) | |
| assert report.decision == PolicyDecision.ALLOW | |
| assert auth.status == ConstraintStatus.SATISFIED | |
| assert set(report.policy_evaluation.allowed_modules) == {"http_headers", "robots_txt"} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_constraints.py` around lines 105 - 108, The test currently asserts
order-sensitive equality on report.policy_evaluation.allowed_modules which is
fragile if evaluate_modules changes ordering; update the assertion in
tests/test_constraints.py to compare as a set (or compare sorted lists) instead
of direct list equality—locate the block using _result_for(report,
ConstraintCode.AUTHORIZATION) and the report.policy_evaluation.allowed_modules
reference and replace the final assert with a
set(report.policy_evaluation.allowed_modules) == {"http_headers","robots_txt"}
(or sorted(...) comparisons) so the test only checks membership not order.
- Validate run_id against ^[A-Za-z0-9_-]{1,128}$ before interpolating it
into the file path. A caller-supplied "../" or absolute path now raises
ValueError instead of escaping the ledger directory.
- Re-call enforce_audit_payload at the sink as defence in depth, so a
caller (or test) that substitutes build_ledger_entry cannot bypass the
raw-indicator check.
- Add tests for both: path-traversal/empty/separator run_ids are rejected,
and a monkeypatched poisoned build_ledger_entry is caught at the sink
with no file written.
Declined the suggestion to switch allowed_modules list-equality to a set
comparison: policy.evaluate_modules contracts ordered output via
dedupe_preserve_order and existing test_policy.py asserts the same shape.
https://claude.ai/code/session_01EWC4Cw5H4ARyBSK62iotcB
|
|
||
|
|
||
| class ConstraintCode(str, Enum): | ||
| AUTHORIZATION = "authorization" |
There was a problem hiding this comment.
WARNING: Unimplemented constraint code
The PASSIVITY ConstraintCode is defined but no corresponding constraint evaluator is implemented, and it's not included in DEFAULT_CONSTRAINTS. Either implement the passivity constraint or remove this unused enum value.
Code Review SummaryStatus: 1 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
The PASSIVITY ConstraintCode is defined but no corresponding constraint evaluator is implemented, and it's not included in DEFAULT_CONSTRAINTS. Either implement the passivity constraint or remove this unused enum value. Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (3 files)
Reviewed by nemotron-3-super-120b-a12b-20230311:free · 312,833 tokens |
Summary
First slice of the constraint-aware invention engine from issue #19. Adds a composable invariant evaluator on top of
policy.evaluate_modulesplus a redacted ledger writer. No behaviour change to the running Space — the engine is additive and not yet wired intoapp.py:run_enrichment.What's new
osint_core/constraints.py— pure constraint engine. Five built-in constraints: authorization, forbidden capability, unknown module, audit payload shape, correction verb. Each produces a structuredConstraintResultwith evidence. The engine derives a recommended correction verb from the closed allowlist via priority ordering (REVERT > CONSTRAIN > ADAPT > OBSERVE).build_ledger_entry+write_constraint_ledgerpersist a JSON record underruns/constraints/{run_id}.json. The writer routes throughpolicy.enforce_audit_payload, so raw indicator fields can never reach disk.tests/test_constraints.py— 22 tests pinning the central invariant: authorized-only modules stay blocked unless bothauthorized_target=Trueandpassive_only=False. Also covers forbidden modules, audit-payload shape, verb priority ordering, engine purity, and ledger redaction.osint_core/__init__.py.Out of scope (follow-up branches)
app.py:run_enrichmentand the Gradio UI tab.osint_core/drift.py(still pseudocode;tests/test_drift.pycontinues to fail as documented).Test plan
PYTHONPATH=. python -m pytest tests/test_constraints.py tests/test_policy.py -v→ 35 passedPYTHONPATH=. python -m pytest --ignore=tests/test_drift.py --ignore=tests/test_scheduler.py -q→ 81 passedfrom osint_core import evaluate_constraints, write_constraint_ledgerresolves and runs against a sample requestCloses part of #19.
https://claude.ai/code/session_01EWC4Cw5H4ARyBSK62iotcB
Generated by Claude Code
Summary by CodeRabbit