Skip to content

MonDevHub/monocr

Repository files navigation

MonOCR

MonOCR Feature Graphic

English | မြန်မာဘာသာ | ဘာသာမန်


Linguistic Preservation Objectives

MonOCR is an open-source technical framework dedicated to the digital preservation of the Mon language (mnw). Classified by UNESCO as a vulnerable script, Mon lacks standardized inclusion in global OCR toolchains.

This project establishes a zero-leak privacy foundation for character recognition, enabling offline digitization of historical and community-sourced manuscripts.

Research Trajectory & Dataset Growth

The current inference engine (~6.6M parameters) is a V1 implementation optimized for low-latency edge execution. Given the historical scarcity of high-quality Mon-Burmese datasets, this platform acts as a data acquisition terminal. The integrated Feedback Service enables the collection and auditing of community-sourced manuscripts, which will directly inform the training of future, higher-capacity recognition models.


Live Access


Platform Architecture

MonOCR maintains absolute architectural parity across all targets. While the underlying mathematical model is unified, it is delivered via platform-optimized serialization to maximize hardware-accelerated performance:

  • Web/Android: Standardized via universal ONNX weights.
  • iOS: Optimized for Apple Neural Engine via CoreML (.mlpackage).

Implementation Cross-Reference

Concern Principal Implementation Architectural Rationale
Model (Web/Android) apps/android/.../monocr.onnx Deterministic cross-platform benchmarks
Model (iOS) apps/ios/.../monocr.mlpackage ANE-optimized hardware utilization
Asset Sync shared/locales/sync.mjs Multi-target linguistic idempotency
Ingestion Auth internal/auth/middleware.go Perimeter security for asset ingestion
Native Execution engine/MonOcrEngine.swift Hardware-bound inference logic

System Specifications

Attribute Specification Rationale
Model Architecture MobileNetV3 + BiLSTM + CTC Optimal accuracy-to-latency ratio for edge inference
Parameter Count ~6.6M Balanced for browser-bound execution limits
Asset footprint ~25MB (FP32) Optimized for delivery via edge CDNs
Inference Precision FP32 / ANE-Optimized Maximizing character fidelity in low-resource contexts

Documentation Hub

All technical documentation, architectural decisions, and setup guides are centralized in the Documentation Hub.


Community and Support

Janakh PonOung Seik NyanMonDevHub

Note

The Mon language is classified as a "vulnerable" language in UNESCO's Atlas of the World’s Languages in Danger.

About

The MonOCR Platform: Academic-grade OCR for the Mon language. High-performance, privacy-first ecosystem across Web (SvelteKit), iOS (SwiftUI), and Android (Kotlin).

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors