OCR application that classifies almost 3000 Japanese kanji. Full list of characters can be accessed by running print_all_characters.py
.
Deployed on kanji.al3xbro.me
- Tensorflow, Keras: 2.12
- NumPy, OpenCV
Note: Follow the guide at https://www.tensorflow.org/guide/gpu to use your GPU for training
- 92% accuracy for both validation and training sets.
- 0.18 training loss and 0.21 validation loss.
- Download an image dataset of your choice.
- Modify the
config.py
file to contain the correct paths. - Run the
image_preprocessing.py
script to process images. - Run the
delete_hiragana.py
script to remove hiragana from the dataset. - Run the
model_training.py
script to train your model. Uses data augmentation to help the model generalize.
- Run the
predicting.py
script to test your model. - Try writing kanji in your own handwriting and testing your model on that. Have fun!
- Datasets ETL8G and ETL9G from etlcdb were used for training and validation.
- Used etlcdb-image-extractor to extract images from these datasets. Thank you!