You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think any training for a japanese OCR should include all the jōyō kanji, as they are considered the most common and needed for japanese readers.
Any chance you update the japan_dict.txt file and retrain the japanese model? If needed, I can try to searchfor other missing kanji from the jōyō list, but not very good at scripting to it will take me some time to obtain the list.
The text was updated successfully, but these errors were encountered:
PaddleOCR is one of the best OCR toolkits I've tested so far. However, the official Japanese model could be significantly improved if it could recognize all Jouyou kanji (regular-use kanji), as these are very common.
jouyou_kanji_file="jouyou-kanji.csv"dict_file="dict_japan.txt"output_file="missing_kanji.txt"withopen(dict_file, "r") asf:
dict_chars=set(f.read().splitlines())
withopen(jouyou_kanji_file, "r") asf:
jouyou_kanji=set( line[0] forlineinf.read().splitlines() )
missing_kanji=jouyou_kanji-dict_charswithopen(output_file, "w") asf:
forcharinmissing_kanji:
f.write(char+"\n")
print(f"Missing kanji have been written to {output_file}")
madmalkav
changed the title
jōyō kanji not included in japanese model
25 jōyō kanji (regular-use kanji) not included in japanese model
Nov 14, 2023
I have found that japan_dict.txt misses, at least, these two kanji that are part of the jōyō kanji list:
喉
渇
I think any training for a japanese OCR should include all the jōyō kanji, as they are considered the most common and needed for japanese readers.
Any chance you update the japan_dict.txt file and retrain the japanese model? If needed, I can try to searchfor other missing kanji from the jōyō list, but not very good at scripting to it will take me some time to obtain the list.
The text was updated successfully, but these errors were encountered: