Japan OCR Mini Benchmark v0.2.0
This release adds the v0.2.0 synthetic Japanese receipt target run and publication payload.
Snapshot
- Public snapshot ZIP:
japan_ocr_mini_benchmark_public_v0.2.0_snapshot.zip - ZIP size:
8103466bytes - ZIP sha256:
ea74b0b9b591e5ee4e1b1401031e8d5f724ad7dfa0a704c9defd04b1b0339b9b
Release Notes
Japan OCR Mini Benchmark v0.2.0 Release Notes
Status
- Release status: Release Candidate
- RC status: release_candidate_ready
- Target run ID:
v020_target_20260613_221713 - Manual visual review:
ok_by_user_step148 - Created at:
2026-06-13T22:42:40
Highlights
- Added a v0.2.0 synthetic Japanese receipt target run with 20 generated records.
- Added both clean and noisy rendered receipt images.
- Introduced hybrid item generation using a validated LLM-approved item pool plus deterministic item master data.
- Strengthened noisy image rendering with resolution loss, local print fading, stroke-level kasure, thermal banding, local blur patches, JPEG roundtrip compression, and safe shift-blend motion blur.
- Added nationwide randomized fictional store locations while excluding nearby Osaka/Kita-ku style local place names.
- Adjusted parking receipt behavior to hide tax breakdown for more natural payment-machine style receipts.
- Added review gallery and shortlist review files for human visual inspection.
Generation Summary
- Requested records:
20 - Successful records:
20 - Failed records:
0 - Documents with LLM-approved items:
19 - LLM-approved item count:
56 - Item-master item count:
124 - LLM item mix ratio:
0.3111
Validation Summary
- Validation status:
warning - Record count:
20 - Status counts:
{'ok': 8, 'warning': 12} - Issue code top counts:
{'clean_noisy_size_large_difference': 12} - Noisy profile counts:
{'light': 3, 'hard': 8, 'medium': 9}
Known Validation Warning
clean_noisy_size_large_differenceis expected for this release candidate.- The warning appears because noisy images include stronger degradation, rotation, canvas margins, shadows, and camera-like framing.
- Human visual review was completed and accepted before freezing the release candidate.
Files
- Target run directory:
release_v0.2.0 - Release candidate summary:
release_v0.2.0\release_candidate\v020_release_candidate_summary.json - Release candidate checklist:
release_v0.2.0\release_candidate\v020_release_candidate_checklist.md - Release candidate file inventory:
release_v0.2.0\release_candidate\v020_release_candidate_files.csv - Full review HTML:
release_v0.2.0\review_audit\v020_review_gallery_step147.html - Shortlist review HTML:
release_v0.2.0\review_audit\v020_review_shortlist_step147.html
Notes
- This release candidate uses synthetic fictional receipt data.
- Store names, branch names, addresses, product names, and transaction contents are artificial test data.
- The dataset is intended for OCR/VLM evaluation and workflow testing, not for representing real transactions.