Skip to content

Adaptional/LossBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

LossBench

A benchmark for evaluating LLM extraction accuracy on insurance loss run documents.

Overview

LossBench contains 36 synthetic insurance loss run PDFs across 12 test sets. Each PDF contains one or more loss runs combined into a single file, and each file is paired with verified correct extraction answers ("ground truth") in JSON files. The documents vary in format complexity, from simple tables to multi-page horizontal layouts with multiple coverage types per claim. All claims are synthetically created with randomized details, and designed to capture a wide variety of real-world edge cases and formats.

Structure

data/
├── lr-01/ ... lr-12/
│   ├── doc-{XXX}-{1,2,3}.pdf      # Test documents
│   └── ground-{XXX}-{1,2,3}.json  # Ground truth
results/
├── native.csv                      # Direct PDF extraction results
├── ocr.csv                         # OCR-based extraction results
└── hybrid.csv                      # Hybrid approach results

Schema

Each ground truth JSON contains an array of LossRunItem objects with 17 fields:

Field Description
policy_no Policy number
claim_id Claim identifier
insurer Insurance carrier
insured Policyholder name
report_date Valuation/report date
date_of_loss Date of incident
date_reported Date claim was reported
closed_date Date claim was closed
loss_summary_description Description of the loss
claim_status Open/Closed
claimant Claimant name
claim_coverage_type Coverage type (BI, PD, etc.)
loss_reserved Outstanding loss reserve
loss_paid Loss amount paid
expense_reserve Outstanding expense reserve
expense_paid Expense amount paid
loss_total_recovered Subrogation/recovery

Test Sets

Set Challenge
lr-01, lr-02, lr-03 Baseline formats
lr-04 Multi-claimant blocks (34 claimants per claim)
lr-05 Multi-coverage TPA format (7 coverage types per claim)
lr-06 Workers' comp (3 coverage types per claim)
lr-07 - lr-12 Various complex formats

Evaluation

Results CSVs contain per-run metrics:

  • f1, precision, recall - Cell-level accuracy
  • expected_rows, extracted_rows - Row counts
  • model, provider - Model tested

Row matching uses composite key: claim_id | claimant | claim_coverage_type

Pass threshold: 95% F1

License

MIT

About

Loss run parsing benchmark for LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors