Handwritten Japanese Recognition

Create a model that recognizing handwritten Japanese characters, including Hiragana, Katakana, Kanji, and Kuzushiji, using Tensorflow.

Prerequisites and Installations

Python 3.7 (any python3 versions should work)

Tensorflow 2.2.0

Keras 2.4.3 - note that this would need to be downgraded to 2.2.4 to convert h5 to mlmodel

Numpy 1.16.4 (this isn't the newest version but this prevents tensorflow from printing bunch of future warning)

matplotlib (newest version)

PIL (newest version)

skimage (newest version)

sklearn (newest version)

corelmltools 3.2

Getting the Data / Training your model

The data for Hiragana, Katakana, and Kanji can be downloaded from the ETL database website. The data for Kuzushiji can be found in the KMINST database. See Reference below.

read_{hira, kana, kanj}.py - save the whole data into one npz file

modify_{hira, kana, kanj}.py - create the train_images, train_labels, test_images, and test_labels files

{hiragana, katakana, kanji}_CNN.py - create a h5 model

convert_{hiragana, katakana, kanji}.py - convert h5 to mlmodel (See Additional Notes below to run this python file without any errors.)

Results

Model	Hiragana	Katakana	Kanji	Kuzushiji
Accuracy	98.4%	98.6%	98.6%	95.6%

Note: these are the results I got after running the CNN models, results may vary slightly in every trial.

Additional Notes

If tensorflow doesn't work at first, try running this command: /Applications/Python 3.7/Install Certificates.command

When converting h5 to mlmodel, downgrade keras 2.4.3 to keras 2.2.4.

To fix potential errors, run the following command:

tf_upgrade_v2 --infile /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py --outfile /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py

Also, go into the optimizers.py file within keras and manually change this in line 75:

original:allowed_kwargs = {'clipnorm', 'clipvalue'}

changed version:allowed_kwargs = {'clipnorm', 'clipvalue','name','learning_rate'}

References

Electrotechnical Laboratory, Japanese Technical Committee for Optical Character Recognition, ETL Character Database, 1973-1984.
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, David Ha, "Deep Learning for Classical Japanese Literature", arXiv:1812.01718. KMNIST Dataset" (created by CODH), adapted from "Kuzushiji Dataset" (created by NIJL and others), doi:10.20676/00000341

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Hiragana		Hiragana
Kanji		Kanji
Katakana		Katakana
Kuzushiji		Kuzushiji
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hiragana

Hiragana

Kanji

Kanji

Katakana

Katakana

Kuzushiji

Kuzushiji

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Handwritten Japanese Recognition

Prerequisites and Installations

Getting the Data / Training your model

Results

Additional Notes

References

About

Releases

Packages

Languages

License

Nippon2019/Handwritten-Japanese-Recognition

Folders and files

Latest commit

History

Repository files navigation

Handwritten Japanese Recognition

Prerequisites and Installations

Getting the Data / Training your model

Results

Additional Notes

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages