Japan OCR Mini Benchmark v0.1.1

K10124 released this 07 Jun 13:29

· 16 commits to main since this release

0d1f709

Update release with InternVL3.5-14B comparison results.

Changes from v0.1.0:

Added InternVL3.5-14B Q8_0 model output for receipt_005_noisy.png
Added compare_custom_model_output.py for evaluating arbitrary model outputs
Updated experiment_log.md with InternVL comparison results
Updated failure_cases.md with InternVL failure cases
Updated README with Qwen vs InternVL comparison
Documented that Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio

Model comparison summary:

Qwen3.6 35B A3B Q4_K_M GGUF: mostly correct, with small tax target amount errors and dakuten/handakuten item-name errors
InternVL3.5-14B Q8_0 GGUF: more significant structured extraction errors, including missing items and incorrect tax/discount fields

Assets 3