-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add furigana on subtitles #89
Comments
I'm aware that Voracious has this but it will likely never get added to Memento. There are two reasons:
|
What about just displaying the furigana as a second line above the normal subtitles with some calculations to determine how long or big the furigana line should be to align with the normal line? |
That assumes I have way more control over QTextEdits and fonts than I actually do.
If you're not discouraged, feel free to implement it yourself. Personally I have no interest in hacking a QTextEdit to deal with this sort of stuff. Not to mention even after all this work is done, there is still the problem of kanji to furigana mappings not being one-to-one. |
From a non-technical point of view, I'd say generating furigana is a very hard problem to get right (elaborating on the point "kanji can often be mapped to a bunch of different furigana") As an example, ImmersionKit provides a huge trove of sentences with furigana, taken from existing (mostly Jo-Mako's) decks that human oversight probably went into. If https://github.com/mathewthe2/immersion-kit-api/blob/3ec3a75f84fdc99ceb5967e345b009e19cf7d783/tokenizer/japanesetokenizer.py#L26 still reflects their process somewhat, you can see their NLP setup & the content-specific tweaks needed. But there are still many issues. For a trivial example, search for sentences with 山道. The generated furigana for every instance of 山道 is さんどう (the onyomi reading), but if you listen to the sentences you'll find that some of them use the kunyomi reading やまみち. Both readings are valid and have the same meaning, and the only way to tell is to listen to the original dialogue audio. For examples where only one reading would be considered correct in the respective sentences, you can check out how 弾く(ひく、はじく)、堪える(こたえる、たえる、こらえる)and 惚ける(とぼける、ほうける、ほける)all get messed up. I think maybe some ML solution can come in, and maybe speech recognition helps with Memento's case. But ultimately it's a hard problem that is influenced by literary sensibilities and artistic license (see the scholarly debate about whether 国境 should be くにざかい or こっきょう at the beginning of Snow Country). Because even state |
No description provided.
The text was updated successfully, but these errors were encountered: