-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Problem
macdoc ocr pipeline runs end-to-end without crashes, but the OCR output is meaningless token repetitions instead of recognized text.
Type
bug
Expected
Running macdoc ocr /tmp/ocr-test.png on an image containing "Hello OCR Test 你好世界" should produce readable Markdown text.
Actual
Output is repeated garbage tokens:
刷刷刷刷刷刷一闪ount刷刷刷刷刷一闪ount刷刷刷刷刷刷...
Or with longer max-tokens:
encyencyencyencyency...指指指指指...Span Span Span...
Context
- Model:
EZCon/GLM-OCR-8bit-mlx(8-bit quantized GLM-OCR for MLX) - Pipeline:
OCRPipeline→VLMModelFactory.shared.loadContainer()→MLXLMCommon.generate() - The MLXVLM library has a
GlmOcr.swiftimplementation, so model architecture should be correct - Possible causes:
- The quantized model (
EZCon/GLM-OCR-8bit-mlx) itself may be broken - Image preprocessing (CIImage conversion) may not match what the model expects
- Chat template / prompt format may be wrong for this model
- Need to test with a known-working model (e.g.,
mlx-community/Qwen2.5-VL-3B-Instruct-4bit)
- The quantized model (
Impact
macdoc ocr is unusable until this is resolved — the entire OCR feature depends on correct VLM output.
Next Steps
- Test with
mlx-community/Qwen2.5-VL-3B-Instruct-4bitto verify the pipeline works with a different model - If Qwen works, the issue is GLM-OCR specific (model or config)
- If Qwen also fails, the issue is in
OCRPipeline's image/prompt handling
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working