Releases: K10124/japan-ocr-mini-benchmark-public
Releases · K10124/japan-ocr-mini-benchmark-public
Japan OCR Mini Benchmark v0.2.0
Japan OCR Mini Benchmark v0.2.0
This release adds the v0.2.0 synthetic Japanese receipt target run and publication payload.
Snapshot
- Public snapshot ZIP:
japan_ocr_mini_benchmark_public_v0.2.0_snapshot.zip - ZIP size:
8103466bytes - ZIP sha256:
ea74b0b9b591e5ee4e1b1401031e8d5f724ad7dfa0a704c9defd04b1b0339b9b
Release Notes
Japan OCR Mini Benchmark v0.2.0 Release Notes
Status
- Release status: Release Candidate
- RC status: release_candidate_ready
- Target run ID:
v020_target_20260613_221713 - Manual visual review:
ok_by_user_step148 - Created at:
2026-06-13T22:42:40
Highlights
- Added a v0.2.0 synthetic Japanese receipt target run with 20 generated records.
- Added both clean and noisy rendered receipt images.
- Introduced hybrid item generation using a validated LLM-approved item pool plus deterministic item master data.
- Strengthened noisy image rendering with resolution loss, local print fading, stroke-level kasure, thermal banding, local blur patches, JPEG roundtrip compression, and safe shift-blend motion blur.
- Added nationwide randomized fictional store locations while excluding nearby Osaka/Kita-ku style local place names.
- Adjusted parking receipt behavior to hide tax breakdown for more natural payment-machine style receipts.
- Added review gallery and shortlist review files for human visual inspection.
Generation Summary
- Requested records:
20 - Successful records:
20 - Failed records:
0 - Documents with LLM-approved items:
19 - LLM-approved item count:
56 - Item-master item count:
124 - LLM item mix ratio:
0.3111
Validation Summary
- Validation status:
warning - Record count:
20 - Status counts:
{'ok': 8, 'warning': 12} - Issue code top counts:
{'clean_noisy_size_large_difference': 12} - Noisy profile counts:
{'light': 3, 'hard': 8, 'medium': 9}
Known Validation Warning
clean_noisy_size_large_differenceis expected for this release candidate.- The warning appears because noisy images include stronger degradation, rotation, canvas margins, shadows, and camera-like framing.
- Human visual review was completed and accepted before freezing the release candidate.
Files
- Target run directory:
release_v0.2.0 - Release candidate summary:
release_v0.2.0\release_candidate\v020_release_candidate_summary.json - Release candidate checklist:
release_v0.2.0\release_candidate\v020_release_candidate_checklist.md - Release candidate file inventory:
release_v0.2.0\release_candidate\v020_release_candidate_files.csv - Full review HTML:
release_v0.2.0\review_audit\v020_review_gallery_step147.html - Shortlist review HTML:
release_v0.2.0\review_audit\v020_review_shortlist_step147.html
Notes
- This release candidate uses synthetic fictional receipt data.
- Store names, branch names, addresses, product names, and transaction contents are artificial test data.
- The dataset is intended for OCR/VLM evaluation and workflow testing, not for representing real transactions.
Japan OCR Mini Benchmark v0.1.1
Update release with InternVL3.5-14B comparison results.
Changes from v0.1.0:
- Added InternVL3.5-14B Q8_0 model output for receipt_005_noisy.png
- Added compare_custom_model_output.py for evaluating arbitrary model outputs
- Updated experiment_log.md with InternVL comparison results
- Updated failure_cases.md with InternVL failure cases
- Updated README with Qwen vs InternVL comparison
- Documented that Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio
Model comparison summary:
- Qwen3.6 35B A3B Q4_K_M GGUF: mostly correct, with small tax target amount errors and dakuten/handakuten item-name errors
- InternVL3.5-14B Q8_0 GGUF: more significant structured extraction errors, including missing items and incorrect tax/discount fields
Japan OCR Mini Benchmark v0.1.0
Initial public sample release of Japan OCR Mini Benchmark.
This release includes:
- 5 synthetic noisy Japanese receipt images
- Ground-truth JSON files
- Qwen3.6 35B A3B model output JSON files
- Python evaluation script
- Experiment log
- Failure case notes
- License and synthetic data notice
The benchmark focuses on Japanese receipt OCR/VLM extraction, including item names, amounts, tax target fields, discounts, point usage, payment amount, cash received, and change.
Known failure case:
Qwen3.6 35B A3B made measurable errors on receipt_005_noisy.png, including small tax target amount errors and dakuten/handakuten item-name errors.
Note: Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio.