An incomplete list of Vocaloid song title yomigana data, based on the database dump of Vocaloid Wiki on Fandom, 初音ミク Wiki (atwiki) and VocaDB, with 2171, 26290 and 82721 usable entries from each source.
Note: the source code used here is unorganised and could be improved.
extractor_*.py
: extractor script to get romaji/yomigana and song title from the data set.sanitize_kana.py
: generate yomigana from romaji and verify it against that generated from the song title.sort_non_ja
: try to filter out non-Japanese songs and songs with a mixed-language title.
- MeCab
- MeCab-ipadic-NEologd
pip3 install regex jaconv pyokaka mecab-python3
search keyword: vocaloid song names yomigana romaji furigana hiragana katakana kana pronunciation database ボカロ ボーカロイド 曲名 読み方 読み仮名 ひらがな カタカナ 仮名 よみがな