Build deterministic analysis core by Spbd1 · Pull Request #5 · Spbd1/Argument-Risk-Engine

Spbd1 · 2026-05-18T05:23:30Z

Motivation

Provide a deterministic analysis core that extracts claims, locates exact evidence spans, and retrieves taxonomy candidates before any LLM integration.
Reduce false positives by applying healthy-reasoning suppressors and conservative lexical matching rules so neutral prose does not produce many candidates.
Ensure components are fast and scale to large taxonomy packs so retrieval works with 1000+ rows for practical QA and testing.

Description

Implemented sentence-level claim extraction in engine/argument_risk_engine/extraction/claim_extractor.py with typed Claim objects, preserved character offsets, marker-based claim-type detection, precedence rules, and short-fragment filtering.
Added exact evidence handling in engine/argument_risk_engine/explanation/evidence.py via EvidenceSpan and find_evidence_spans that return exact substring matches and never fabricate evidence, plus a legacy-compatible evidence_span wrapper.
Built an in-memory indexed taxonomy in engine/argument_risk_engine/retrieval/inverted_index.py with tokenization, stopword/generic-term suppression, field-specific indexing (name, synonyms, signals, trigger_patterns, definitions), activation/deprecation checks, candidate-only flags, and healthy-suppressor detection.
Implemented conservative lexical retrieval in engine/argument_risk_engine/retrieval/lexical_retriever.py producing RetrievedTaxonomyEntry objects with retrieval_score, matched terms/fields, retrieval_reason, false_positive_risk, healthy-pattern suppression (penalties), diagnostics, and index caching; and added candidate_filter and retrieval_diagnostics helpers for final filtering and observability.
Added/updated unit tests in tests/test_claim_extractor.py and tests/test_retriever.py to cover sentence splitting, offsets and claim types, exact evidence spans, large neutral retrieval behavior (1000 rows), deprecated exclusions, candidate-only rows, and healthy-reasoning suppressor behavior.

Testing

Ran lint/format checks and auto-fixes with python -m ruff check --fix and final python -m ruff check, and issues were fixed by the automated pass (no remaining ruff errors reported).
Ran targeted tests with python -m pytest tests/test_claim_extractor.py tests/test_retriever.py -q which passed, and then ran the full test suite with python -m pytest -q which completed successfully.
All automated tests passed: 25 passed, 3 warnings (pytest) and the lint checks completed successfully.

Codex Task

Build deterministic analysis core

12a8905

Spbd1 added the codex label May 18, 2026 — with ChatGPT Codex Connector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build deterministic analysis core#5

Build deterministic analysis core#5
Spbd1 wants to merge 1 commit into
codex/add-model-settings-configuration-in-dashboardfrom
codex/build-deterministic-analysis-core

Spbd1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Spbd1 commented May 18, 2026

Motivation

Description

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant