Skip to content
Dataset of Devnagari (Nepali) handwritten characters.
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit 1be7655 Oct 26, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dataset Prepared the dataloader Feb 25, 2018
sample samples added Feb 25, 2018
README.md updated SOTA result Oct 26, 2018
dataloader.py Refactor to be consistent with PyTorch convention Aug 8, 2018
test.py Tests the refactored DHCDataset Aug 8, 2018
utils.py Prepared the dataloader Feb 25, 2018
viewer.ipynb added the viewer notebook Feb 25, 2018

README.md

DHCD_Dataset

This repository contains the DHCD dataset, a dataset of Devnagari (Nepali) handwritten characters.

LPR

License Plate Recognition (LPR) dataset is also available now at this link.

Description

DHCD dataset contains 46 classes [36 character class and 10 digit class] (क .. + १ .. ) of Devnagari script. Each class has 2000 images which is divided into two sets: training and test containing 1700 and 300 images respectively. So technically, this dataset is larger both in terms of samples and classes than the famous MNIST dataset which was the initial inspiration for the creation of this dataset.

This repo contains the dataloader for PyTorch and it can be easily transported to other libraries like TensorFlow, Keras, Caffe etc.

Beside, the general character classification task, the dataset can also be explored for other problems like transferring style, disentanglement, semi-supervised learnign etc. as there are lot of variations within each class.

Example

This work by Suvash Thapaliya is a recent example of work in this dataset. Resnet-32 is used to obtain the error rate of 1.49% which is the SOTA on this dataset for the task of classification.

Contributors

The school children of class 6 and 7 (in 2015) from Mount Everest Higher Secondary School, Bhaktapur, Nepal contributed towards this dataset by volunteering to write the characters which were scanned manually. Beside the manual scanning, other pre-processing tasks were also performed, detail of which can be found in the paper.

If you use this dataset in your work, please cite it as follows:

Bibtex

@inproceedings{acharya2015deep,
  title={Deep learning based large scale handwritten Devanagari character recognition},
  author={Acharya, Shailesh and Pant, Ashok Kumar and Gyawali, Prashnna Kumar},
  booktitle={Software, Knowledge, Information Management and Applications (SKIMA), 2015 9th International Conference on},
  pages={1--6},
  year={2015},
  organization={IEEE}
}
You can’t perform that action at this time.