Skip to content

nan.ai OCR open data initiative makes handwritten data publicly available for reuse in training OCR models.

License

Notifications You must be signed in to change notification settings

Saphron-Asia/nan.ai-opendata-ocr

Repository files navigation

Nan.ai OCR Open Data

Nan.ai OCR open data initiative makes handwritten data publicly available for reuse in training OCR models. This dataset is derived from forms submitted via Nan.ai and processed using our OCR ML service (which extracts information from photos of forms captured using readily available mobile devices. Extracted, anonymized, and annotated based on forms submitted via Nan.ai, this dataset can be used to train OCR models for your own use case.

You can participate by (1) handwritten data or (2) annotating existing datasets. We also welcome image processing experts to improve this repository's usability for various use cases.

To explore our datasets, you can use the existing image processing notebooks available here or import data by following the instructions here.

Description of the data

From form images, the image data is isolated, extracted and anonymized to form a generic dataset similar to the MNIST handwritten dataset. This repository is constantly updated to reflect any inputs such as annotations for improvement from the OCR ML service. Users have the discretion on the ratio to split the data to test-train-validate sets.

We are also creating datasets derived and annotated based on the corpus data such as use-case specific dictionaries (e.g. possible handwritten values and its counterpart on standard naming of places in the Philippines).

Alongside our open data initiative, we are also open sourcing a related machine learning service, Nan.ai OCR.

Navigate this project

Resources

License

Nan.ai-opendata-ocr is licensed under the Creative Commons Zero v1.0 Universal

About

nan.ai OCR open data initiative makes handwritten data publicly available for reuse in training OCR models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published