Skip to content

Update evals dataset to tag draft documents#462

Merged
ppinchuk merged 8 commits into
mainfrom
evals-cleanup
Jun 3, 2026
Merged

Update evals dataset to tag draft documents#462
ppinchuk merged 8 commits into
mainfrom
evals-cleanup

Conversation

@rajeee
Copy link
Copy Markdown
Collaborator

@rajeee rajeee commented Jun 3, 2026

Adds a document_type field (Final / Draft / Proposal) at the top level of each manifest entry. The date extraction eval now runs only on Final documents — date extraction is only meaningful for enacted ordinances, since drafts and proposals have no adoption date by definition.

Dev: 13/48 tagged Draft/Proposal (per prior failure analysis in unified-query-failure-analysis_v2.md), filter reduces the run to 35 cases. No real verdict regressions on the kept Final cases — every previously-passing Final case still passes; one (Greene, TN) improved via LLM non-determinism.

Held-out: every case classified by reading the document text (in parallel subagents). All 22 are Final; one PDF (Carroll_County_Indiana.pdf) was replaced — the file on disk contained Carroll County, Maryland content instead of the Indiana ordinance the manifest pointed to. Held-out accuracy/verdicts unchanged after the swap; only the OCR overhead from the now-correctly-scanned-PDF shows up as a small time/token bump in the baseline.

@rajeee rajeee requested a review from ppinchuk as a code owner June 3, 2026 14:00
Copilot AI review requested due to automatic review settings June 3, 2026 14:00
@rajeee rajeee requested a review from castelao as a code owner June 3, 2026 14:00
@rajeee rajeee changed the title Tag eval manifests with document_type; filter date eval to Final Update evals dataset to tag draft documents Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a document_type tag (Final / Draft / Proposal) to eval manifest entries and updates the date-extraction eval to only run on Final documents, since adoption dates aren’t meaningful for drafts/proposals. It also updates stored dev/held-out eval result artifacts to reflect the new filtered dataset and reruns.

Changes:

  • Filter evals/test_run_date_extraction_evals.py cases to document_type == "Final" (defaulting missing document_type to Final).
  • Add document_type to dev + held-out solar manifests.
  • Refresh checked-in dev/held-out result JSON and dev breakdown CSV after rerunning evals with the new filtering.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
evals/test_run_date_extraction_evals.py Filters date extraction eval cases to Final documents only.
evals/data/dev/solar/manifest.json5 Adds document_type annotations per dev-case classification.
evals/data/held-out/solar/manifest.json5 Adds document_type annotations (all Final).
evals/results/dev/date_extraction_evals.json Updates dev aggregate metrics after filtering to Final only.
evals/results/dev/date_extraction_evals_breakdown.csv Updates per-case breakdown after filtering/rerun.
evals/results/held_out/date_extraction_evals.json Updates held-out aggregate metrics after rerun.

Comment thread evals/test_run_date_extraction_evals.py Outdated
ppinchuk
ppinchuk previously approved these changes Jun 3, 2026
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should hold this PR, bur I would recommend changing the "document_type" key to "document_publish_status" or "document_draft_status" or even just "document_status" (or whatever else you like) in order to avoid confusion with actual document types such as pdf, text, doc, etc

@rajeee
Copy link
Copy Markdown
Collaborator Author

rajeee commented Jun 3, 2026

I don't think you should hold this PR, bur I would recommend changing the "document_type" key to "document_publish_status" or "document_draft_status" or even just "document_status" (or whatever else you like) in order to avoid confusion with actual document types such as pdf, text, doc, etc

Good point. I changed it.

@ppinchuk ppinchuk merged commit 5995463 into main Jun 3, 2026
4 checks passed
@ppinchuk ppinchuk deleted the evals-cleanup branch June 3, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants