Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

Pitch accent for compound words #1542

Open
archiif opened this issue Mar 18, 2021 · 4 comments
Open

Pitch accent for compound words #1542

archiif opened this issue Mar 18, 2021 · 4 comments

Comments

@archiif
Copy link
Contributor

archiif commented Mar 18, 2021

Currently, Yomichan can't display the pitch accent for compound words correctly (or maybe the data from Kanjium is lacking?).
For example with: 一子相伝
Yomichan would display this:
image

But the word actually consists of two different pitch accents, atamadaka for the first part of the compound word and heiban for the second one.
For reference, this is what is displayed in the NHK pitch accent dictionary:
image

Maybe what's happening is that the pitch accents Yomichan is displaying above are simply the two parts of a single compound pitch, but Yomichan is incorrectly treating these two pitch accents as if they are simply two accent variants. But this is just a wild guess.

@toasted-nutbread
Copy link
Collaborator

The issue is that the source data represents it as a single word, and Yomichan doesn't attempt to do lookups of the individual parts of compound words, as there is not a good way to reliably do this.

The source data for the term you listed is the following:

term     reading	accents
一子相伝  いっしそうでん  1,0

And I don't believe that the multiple comma-separated values generally represent the accents of the compounds, although the format of this file isn't really documented.

@archiif
Copy link
Contributor Author

archiif commented Mar 22, 2021

I see, there doesn't seem to be any great solutions for automatic pitch accent generation of compound words. For now I'll just manually edit the pitch accent data for my cards.

@redacted0
Copy link

redacted0 commented Feb 17, 2022

@toasted-nutbread Yomichan doesn't seem to support this anyways though. The JSON format assumes that there can only be one pitch accent phrase in a word.

I don't think it would be effective to get Yomichan to do lookups for each part since those lookups could lead to erroneous accents.

It would be best if I could just add multiple phrases like:

[
    "一子相伝",
    "pitch",
    {
        "reading": "いっしそうでん",
        "pitches": [
            [{
                "pronunciation":  "イッシ",
                "position":1,
                "nasal":[],
                "devoice":[]
            }, {
                "pronunciation":  "ソーデン",
                "position":0,
                "nasal":[],
                "devoice":[]
            }]
        ]
    }
]

This would also have the added benefit of allowing a specific pronunciation instead of using the reading (which is currently used to correlate to other dictionary entries). I.e. 通う(カヨウ) vs 火曜(カヨー).

@redacted0
Copy link

Although I do agree that this source data from Kanjium doesn't make use of having multiple phrases, I still would like to add that it would be a good idea so that we can utilise sources that do use multiple phrases

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants