Japan OCR Mini Benchmark v0.1.1
Update release with InternVL3.5-14B comparison results.
Changes from v0.1.0:
- Added InternVL3.5-14B Q8_0 model output for receipt_005_noisy.png
- Added compare_custom_model_output.py for evaluating arbitrary model outputs
- Updated experiment_log.md with InternVL comparison results
- Updated failure_cases.md with InternVL failure cases
- Updated README with Qwen vs InternVL comparison
- Documented that Qwen3.6 35B A3B results were generated using a Q4_K_M GGUF quantized model in LM Studio
Model comparison summary:
- Qwen3.6 35B A3B Q4_K_M GGUF: mostly correct, with small tax target amount errors and dakuten/handakuten item-name errors
- InternVL3.5-14B Q8_0 GGUF: more significant structured extraction errors, including missing items and incorrect tax/discount fields