Build review, evaluation, and report workflows by Spbd1 · Pull Request #10 · Spbd1/Argument-Risk-Engine

Spbd1 · 2026-05-18T06:10:11Z

Motivation

Add user-facing review, benchmark evaluation, and report generation features so analysis outputs can be reviewed, evaluated, and exported from the dashboard and CLI.
Provide an append-only, locally persisted review store and validation to support human-in-the-loop corrections for MVP workflows.
Provide simple operational evaluation metrics and an evaluation runner to measure model behavior against a small benchmark without implying scientific validation.
Provide multi-format report generation (JSON/Markdown/HTML) with storage and download endpoints for sharing analysis results.

Description

Implemented review models and validation in engine/argument_risk_engine/review/models.py and append-only JSONL storage + helpers in engine/argument_risk_engine/review/store.py, persisting to data/review/review_store.jsonl and including a legacy-feedback adapter.
Added review service and routes that expose GET /review/items, POST /review/items, GET /review/summary, and preserved the legacy POST /review/feedback adapter in backend/app/services/review_service.py and backend/app/api/routes_review.py.
Implemented evaluation metrics and runner in engine/argument_risk_engine/evaluation/metrics.py and engine/argument_risk_engine/evaluation/runner.py to compute label precision/recall/F1, false-positive rate, evidence-span exact/partial match, human review rate, over-classification rate, and no-finding rate, and to collect false-positive/false-negative/evidence-span-miss error lists and a non-scientific disclaimer.
Added evaluation service and routes POST /evaluation/run, GET /evaluation/summary, and GET /evaluation/errors with result persistence under data/evaluation and a small JSONL mini-benchmark data/benchmarks/mini_eval_set.jsonl.
Implemented report renderers engine/argument_risk_engine/reports/{json_export,markdown,html}.py, plus report service with local persistence/indexing and API endpoints POST /reports/from-analysis, GET /reports, GET /reports/{report_id}, and GET /reports/{report_id}/download in backend/app/services/report_service.py and backend/app/api/routes_reports.py.
Wired routers into the application for both root and /api prefixes and added a CLI wrapper scripts/run_evaluation.py to run the mini benchmark from the command line.
Added and updated tests (tests/test_evaluation.py, tests/test_review_reports_api.py) to cover metrics, runner output, review persistence, report generation, and downloads.

Testing

Ran the mini-benchmark CLI via python scripts/run_evaluation.py which executed the runner and printed metrics and the disclaimer (smoke run succeeded).
Ran the full test suite via python -m pytest -q which passed (42 passed, 4 warnings).
Ran focused tests tests/test_evaluation.py and tests/test_review_reports_api.py which passed (4 passed, 1 warning).
Ran static/style checks with python -m ruff check ... as part of CI-style linting and fixed reported issues so lint check passed locally.

Codex Task

Build review evaluation and report workflows

b5e584d

Spbd1 added the codex label May 18, 2026 — with ChatGPT Codex Connector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build review, evaluation, and report workflows#10

Build review, evaluation, and report workflows#10
Spbd1 wants to merge 1 commit into
codex/build-react-dashboard-with-specified-featuresfrom
codex/build-review,-evaluation,-and-report-workflows-wm0oc0

Spbd1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Spbd1 commented May 18, 2026

Motivation

Description

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant