Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

{part-of-speech} returns "Unknown" for nouns and na-adjectives #1924

Open
BernhardValenti opened this issue Sep 2, 2021 · 3 comments
Open
Labels
dictionary format Issue is related to a dictionary formatting problem

Comments

@BernhardValenti
Copy link

Description
trying to add {part-of-speech} to my cards, but the template returns Unknown for nouns and na-adjectives. i-adjectives and verbs seem to return the proper values.

eg: and 元気

dictionary is JMdict, and in the yomichan browser it does show the n for noun, and adj-na for na-adjectives.

Browser version
chrome 92

Yomichan version
21.7.31.2

@toasted-nutbread
Copy link
Collaborator

This is because the JMdict dictionary data doesn't list the part-of-speech for nouns. It lists n and adj-na as tags, but this is not the same as part-of-speech.

@toasted-nutbread toasted-nutbread added the dictionary format Issue is related to a dictionary formatting problem label Sep 4, 2021
@BernhardValenti
Copy link
Author

updated my templates to:

{{#*inline "part-of-speech-pretty"}}
    {{~#if (op "===" . "v1")~}}Ichidan verb
    {{~else if (op "===" . "v5")~}}Godan verb
    {{~else if (op "===" . "vk")~}}Kuru verb
    {{~else if (op "===" . "vs")~}}Suru verb
    {{~else if (op "===" . "vz")~}}Zuru verb
    {{~else if (op "===" . "adj-i")~}}I-adjective
    {{~else if (op "===" . "adj-na")~}}Na-adjective
    {{~else if (op "===" . "n")~}}Noun
    {{~else~}}{{.}}
    {{~/if~}}
{{/inline}}

{{#*inline "part-of-speech"}}
    {{~#scope~}}
        {{~#if (op "!==" definition.type "kanji")~}}
            {{~#set "first" true}}{{/set~}}
            {{~#each definition.expressions~}}
                {{~#each wordClasses~}}
                    {{~#unless (get (concat "used_" .))~}}
                        {{~> part-of-speech-pretty . ~}}
                        {{~#unless (get "first")}}, {{/unless~}}
                        {{~#set (concat "used_" .) true~}}{{~/set~}}
                        {{~#set "first" false~}}{{~/set~}}
                    {{~/unless~}}
                {{~/each~}}
            {{~/each~}}
            {{~#if (get "first")~}}
                {{#each definition.definitionTags}}
                {{~#unless (get (concat "used_" .))~}}
                    {{~> part-of-speech-pretty name ~}}
                    {{~#unless (get "first")}}, {{/unless~}}
                    {{~#set (concat "used_" .) true~}}{{~/set~}}
                    {{~#set "first" false~}}{{~/set~}}
                {{~/unless~}}
            {{/each}}
            {{~/if~}}
            {{~#if (get "first")~}}Unknown{{~/if~}}
        {{~/if~}}
    {{~/scope~}}
{{/inline}}

@stephenmk
Copy link
Contributor

This is because the JMdict dictionary data doesn't list the part-of-speech for nouns. It lists n and adj-na as tags, but this is not the same as part-of-speech.

Just to be clear, the proper JMdict XML source data does distinguish part-of-speech tags from all other miscellaneous tags.

In the JMdict dictionary file produced for Yomichan by yomichan-import, the part-of-speech field only contains a limited and modified subset of those tags. From what I understand, these values are used behind-the-scenes for de-conjugating words into their dictionary forms so that they may be queried by yomichan. Part-of-speech tags that are not used for de-conjugation are not added to this part-of-speech list. However, all of the part-of-speech information is still added properly to the definition tags of each term; they're just mixed in with all the other miscellaneous and usage-domain tags that are displayed to the user in the glossary.

Based on my understanding of how this works, I don't think this {part-of-speech} handlebar should even exist. All of this information already exists in a complete form within the {glossary} field. The part-of-speech of a given word can also vary depending on the sense in which it is used. For example, 亜 can be a prefix or a noun. The {part-of-speech} handlebar could be updated to return a list of all possible parts-of-speech for an expression, but why? That information isn't as useful and could even be confusing without the corresponding sense context (which exists in the glossary).

The part-of-speech tags for JMdict entries that are displayed in yomichan are each contained within a <span> node with a data-category="partOfSpeech" attribute, so maybe something could be done with that if a user needs a way to extract or query the data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dictionary format Issue is related to a dictionary formatting problem
Projects
None yet
Development

No branches or pull requests

3 participants