Skip to content

OCR: GLM-OCR model produces garbage output via MLXVLM #66

@kiki830621

Description

@kiki830621

Problem

macdoc ocr pipeline runs end-to-end without crashes, but the OCR output is meaningless token repetitions instead of recognized text.

Type

bug

Expected

Running macdoc ocr /tmp/ocr-test.png on an image containing "Hello OCR Test 你好世界" should produce readable Markdown text.

Actual

Output is repeated garbage tokens:

刷刷刷刷刷刷一闪ount刷刷刷刷刷一闪ount刷刷刷刷刷刷...

Or with longer max-tokens:

encyencyencyencyency...指指指指指...Span Span Span...

Context

  • Model: EZCon/GLM-OCR-8bit-mlx (8-bit quantized GLM-OCR for MLX)
  • Pipeline: OCRPipelineVLMModelFactory.shared.loadContainer()MLXLMCommon.generate()
  • The MLXVLM library has a GlmOcr.swift implementation, so model architecture should be correct
  • Possible causes:
    1. The quantized model (EZCon/GLM-OCR-8bit-mlx) itself may be broken
    2. Image preprocessing (CIImage conversion) may not match what the model expects
    3. Chat template / prompt format may be wrong for this model
    4. Need to test with a known-working model (e.g., mlx-community/Qwen2.5-VL-3B-Instruct-4bit)

Impact

macdoc ocr is unusable until this is resolved — the entire OCR feature depends on correct VLM output.

Next Steps

  • Test with mlx-community/Qwen2.5-VL-3B-Instruct-4bit to verify the pipeline works with a different model
  • If Qwen works, the issue is GLM-OCR specific (model or config)
  • If Qwen also fails, the issue is in OCRPipeline's image/prompt handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions