Nan.ai OCR Open Data

Nan.ai OCR open data initiative makes handwritten data publicly available for reuse in training OCR models. This dataset is derived from forms submitted via Nan.ai and processed using our OCR ML service (which extracts information from photos of forms captured using readily available mobile devices. Extracted, anonymized, and annotated based on forms submitted via Nan.ai, this dataset can be used to train OCR models for your own use case.

You can participate by (1) handwritten data or (2) annotating existing datasets. We also welcome image processing experts to improve this repository's usability for various use cases.

To explore our datasets, you can use the existing image processing notebooks available here or import data by following the instructions here.

Description of the data

From form images, the image data is isolated, extracted and anonymized to form a generic dataset similar to the MNIST handwritten dataset. This repository is constantly updated to reflect any inputs such as annotations for improvement from the OCR ML service. Users have the discretion on the ratio to split the data to test-train-validate sets.

We are also creating datasets derived and annotated based on the corpus data such as use-case specific dictionaries (e.g. possible handwritten values and its counterpart on standard naming of places in the Philippines).

Alongside our open data initiative, we are also open sourcing a related machine learning service, Nan.ai OCR.

Navigate this project

Resources

License

Nan.ai-opendata-ocr is licensed under the Creative Commons Zero v1.0 Universal

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
notebooks		notebooks
sample data		sample data
.DS_Store		.DS_Store
CODEOFCODUCT.md		CODEOFCODUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATAJOURNAL.md		DATAJOURNAL.md
HOWTO.md		HOWTO.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nan.ai OCR Open Data

Description of the data

Navigate this project

Resources

License

About

Releases

Packages

Contributors 3

License

Saphron-Asia/nan.ai-opendata-ocr

Folders and files

Latest commit

History

Repository files navigation

Nan.ai OCR Open Data

Description of the data

Navigate this project

Resources

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages