14 Jun 00:07

K10124

0fb8f33

Japan OCR Mini Benchmark v0.2.0 Latest

Latest

Japan OCR Mini Benchmark v0.2.0

This release adds the v0.2.0 synthetic Japanese receipt target run and publication payload.

Snapshot

Public snapshot ZIP: japan_ocr_mini_benchmark_public_v0.2.0_snapshot.zip
ZIP size: 8103466 bytes
ZIP sha256: ea74b0b9b591e5ee4e1b1401031e8d5f724ad7dfa0a704c9defd04b1b0339b9b

Release Notes

Japan OCR Mini Benchmark v0.2.0 Release Notes

Status

Release status: Release Candidate
RC status: release_candidate_ready
Target run ID: v020_target_20260613_221713
Manual visual review: ok_by_user_step148
Created at: 2026-06-13T22:42:40

Highlights

Added a v0.2.0 synthetic Japanese receipt target run with 20 generated records.
Added both clean and noisy rendered receipt images.
Introduced hybrid item generation using a validated LLM-approved item pool plus deterministic item master data.
Strengthened noisy image rendering with resolution loss, local print fading, stroke-level kasure, thermal banding, local blur patches, JPEG roundtrip compression, and safe shift-blend motion blur.
Added nationwide randomized fictional store locations while excluding nearby Osaka/Kita-ku style local place names.
Adjusted parking receipt behavior to hide tax breakdown for more natural payment-machine style receipts.
Added review gallery and shortlist review files for human visual inspection.

Generation Summary

Requested records: 20
Successful records: 20
Failed records: 0
Documents with LLM-approved items: 19
LLM-approved item count: 56
Item-master item count: 124
LLM item mix ratio: 0.3111

Validation Summary

Validation status: warning
Record count: 20
Status counts: {'ok': 8, 'warning': 12}
Issue code top counts: {'clean_noisy_size_large_difference': 12}
Noisy profile counts: {'light': 3, 'hard': 8, 'medium': 9}

Known Validation Warning

clean_noisy_size_large_difference is expected for this release candidate.
The warning appears because noisy images include stronger degradation, rotation, canvas margins, shadows, and camera-like framing.
Human visual review was completed and accepted before freezing the release candidate.

Files

Target run directory: release_v0.2.0
Release candidate summary: release_v0.2.0\release_candidate\v020_release_candidate_summary.json
Release candidate checklist: release_v0.2.0\release_candidate\v020_release_candidate_checklist.md
Release candidate file inventory: release_v0.2.0\release_candidate\v020_release_candidate_files.csv
Full review HTML: release_v0.2.0\review_audit\v020_review_gallery_step147.html
Shortlist review HTML: release_v0.2.0\review_audit\v020_review_shortlist_step147.html

Notes

This release candidate uses synthetic fictional receipt data.
Store names, branch names, addresses, product names, and transaction contents are artificial test data.
The dataset is intended for OCR/VLM evaluation and workflow testing, not for representing real transactions.

Assets 3

07 Jun 13:29

K10124

v0.1.1

0d1f709

Japan OCR Mini Benchmark v0.1.1

Update release with InternVL3.5-14B comparison results.

Changes from v0.1.0:

Added InternVL3.5-14B Q8_0 model output for receipt_005_noisy.png
Added compare_custom_model_output.py for evaluating arbitrary model outputs
Updated experiment_log.md with InternVL comparison results
Updated failure_cases.md with InternVL failure cases
Updated README with Qwen vs InternVL comparison
Documented that Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio

Model comparison summary:

Qwen3.6 35B A3B Q4_K_M GGUF: mostly correct, with small tax target amount errors and dakuten/handakuten item-name errors
InternVL3.5-14B Q8_0 GGUF: more significant structured extraction errors, including missing items and incorrect tax/discount fields

Assets 3

06 Jun 14:44

K10124

v0.1.0

a69440b

Japan OCR Mini Benchmark v0.1.0

Initial public sample release of Japan OCR Mini Benchmark.

This release includes:

5 synthetic noisy Japanese receipt images
Ground-truth JSON files
Qwen3.6 35B A3B model output JSON files
Python evaluation script
Experiment log
Failure case notes
License and synthetic data notice

The benchmark focuses on Japanese receipt OCR/VLM extraction, including item names, amounts, tax target fields, discounts, point usage, payment amount, cash received, and change.

Known failure case:
Qwen3.6 35B A3B made measurable errors on receipt_005_noisy.png, including small tax target amount errors and dakuten/handakuten item-name errors.

Note: Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Japan OCR Mini Benchmark v0.2.0

Snapshot

Release Notes

Japan OCR Mini Benchmark v0.2.0 Release Notes

Status

Highlights

Generation Summary

Validation Summary

Known Validation Warning

Files

Notes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: K10124/japan-ocr-mini-benchmark-public