MonOCR

Linguistic Preservation Objectives

MonOCR is an open-source technical framework dedicated to the digital preservation of the Mon language (mnw). Classified by UNESCO as a vulnerable script, Mon lacks standardized inclusion in global OCR toolchains.

This project establishes a zero-leak privacy foundation for character recognition, enabling offline digitization of historical and community-sourced manuscripts.

Research Trajectory & Dataset Growth

The current inference engine (~6.6M parameters) is a V1 implementation optimized for low-latency edge execution. Given the historical scarcity of high-quality Mon-Burmese datasets, this platform acts as a data acquisition terminal. The integrated Feedback Service enables the collection and auditing of community-sourced manuscripts, which will directly inform the training of future, higher-capacity recognition models.

Live Access

Web: ocr.mondevhub.com
Android: Google Play Store
iOS: Apple App Store (Review Pending)

Platform Architecture

MonOCR maintains absolute architectural parity across all targets. While the underlying mathematical model is unified, it is delivered via platform-optimized serialization to maximize hardware-accelerated performance:

Web/Android: Standardized via universal ONNX weights.
iOS: Optimized for Apple Neural Engine via CoreML (.mlpackage).

Implementation Cross-Reference

Concern	Principal Implementation	Architectural Rationale
Model (Web/Android)	`apps/android/.../monocr.onnx`	Deterministic cross-platform benchmarks
Model (iOS)	`apps/ios/.../monocr.mlpackage`	ANE-optimized hardware utilization
Asset Sync	`shared/locales/sync.mjs`	Multi-target linguistic idempotency
Ingestion Auth	`internal/auth/middleware.go`	Perimeter security for asset ingestion
Native Execution	`engine/MonOcrEngine.swift`	Hardware-bound inference logic

System Specifications

Attribute	Specification	Rationale
Model Architecture	MobileNetV3 + BiLSTM + CTC	Optimal accuracy-to-latency ratio for edge inference
Parameter Count	~6.6M	Balanced for browser-bound execution limits
Asset footprint	~25MB (FP32)	Optimized for delivery via edge CDNs
Inference Precision	FP32 / ANE-Optimized	Maximizing character fidelity in low-resource contexts

Web App: Browser-bound SvelteKit PWA.
Android App: Native Jetpack Compose (NNAPI).
iOS App: Native SwiftUI (Apple Neural Engine).
Feedback Service: Mobile-focused ingestion API (Go).
Core Assets: Shared assets and synchronization logic.

Documentation Hub

All technical documentation, architectural decisions, and setup guides are centralized in the Documentation Hub.

Architecture (ADRs): Logical decision records.
API Specifications: OpenAPI contracts.
HuggingFace Models - ONNX, CoreML, CKPT: Core inference assets.
NPM Package: Portable SDK.

Community and Support

Janakh Pon • Oung Seik Nyan • MonDevHub

Feedback: Report technical bugs via GitHub Issues.
Linguistic Assets: Audit our shared translation sheet.
Dataset Acquisition: Contribute script samples via our Android or iOS applications or reach out directly.
Technical Standards: Review our Contributing Guide and Security Policy.

Note

The Mon language is classified as a "vulnerable" language in UNESCO's Atlas of the World’s Languages in Danger.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
apps		apps
assets		assets
docs		docs
scripts		scripts
services/feedback		services/feedback
shared		shared
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.mnw.md		README.mnw.md
README.my.md		README.my.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MonOCR

Linguistic Preservation Objectives

Research Trajectory & Dataset Growth

Live Access

Platform Architecture

Implementation Cross-Reference

System Specifications

Documentation Hub

Community and Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MonOCR

Linguistic Preservation Objectives

Research Trajectory & Dataset Growth

Live Access

Platform Architecture

Implementation Cross-Reference

System Specifications

Documentation Hub

Community and Support

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages