-
Notifications
You must be signed in to change notification settings - Fork 231
Add new structured content features: lists and the HTML lang
attribute
#2129
Conversation
A full list of supported style types is documented here: https://developer.mozilla.org/en-US/docs/Web/CSS/list-style-type There's nothing in this code preventing a term bank from assigning, for example, a `list-style-type` style to a `div` element, but it doesn't seem like browsers will complain about things like that.
Support added for the following node types: "ruby", "rt", "rp", "table", "thead", "tbody", "tfoot", "tr", "td", "th", "span", "div", "ol", "ul", "li", "a" I couldn't get it to work for the alt-hover text on "img" tags. Tests are included in the file "test/data/dictionaries/valid-dictionary/term_bank_1.json"
}, | ||
"lang": { | ||
"type": "string", | ||
"description": "Defines the language of an element in the format defined by RFC 5646" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: add period to the end of description value (other descriptions are punctuated).
Apply to other locations in the file also.
Looks good, just a few comments on CSS: The way that Anki cards have styling information generated for structured content is a special case which needs to be handled.
|
Out of scope for this PR, but this may be something that's worth updating at some point also, since this seems like an oversight on my part. (not saying this is something you should do) |
And to this point, yes I agree that the validation is unfortunately slow. This is partially due to how non-specific JSON schemas are technically allowed to be, and I made a few optimizations a while back, but I should revisit this. It's also potentially a motivation for having a different way to represent the type of data mentioned in #1165. While structured content can accomplish it, it is by no means the optimal way of doing it. Compare something like: Current[
"読む",
"よむ",
"5 v5m vt",
"v5",
1440001,
[
{
"content": [
{
"content": {
"content": "now mostly used in idioms",
"tag": "li"
},
"data": {
"content": "notes"
},
"lang": "ja",
"style": {
"listStyleType": "'📝 '"
},
"tag": "ul"
},
{
"content": [
{
"content": "to count",
"tag": "li"
},
{
"content": "to estimate",
"tag": "li"
}
],
"data": {
"content": "glossary"
},
"lang": "en",
"style": {
"listStyleType": "circle"
},
"tag": "ul"
},
{
"content": {
"content": [
"see: ",
{
"content": "さばを読む",
"href": "?query=さばを読む\u0026wildcards=off",
"lang": "ja",
"tag": "a"
},
{
"content": " to manipulate figures to one's advantage; to count wrongly on purpose; to inflate or deflate one's age",
"data": {
"content": "refGlosses"
},
"style": {
"fontSize": "x-small",
"verticalAlign": "middle"
},
"tag": "span"
}
],
"tag": "li"
},
"data": {
"content": "references"
},
"lang": "en",
"style": {
"listStyleType": "'➡ '"
},
"tag": "ul"
}
],
"type": "structured-content"
}
],
1456360,
"P ichi news6k"
] Minimized[
"読む",
"よむ",
"5 v5m vt",
"v5",
1440001,
[
{
"content": "now mostly used in idioms",
"type": "note"
},
"to count",
"to estimate",
{
"content": "さばを読む",
"brief": "to manipulate figures to one's advantage; to count wrongly on purpose; to inflate or deflate one's age", // optional
"href": "?query=さばを読む\u0026wildcards=off", // optional
"type": "references"
}
],
1456360,
"P ichi news6k"
] For comparison, I also ran the validation on the dictionary you provided and it took roughly 18 minutes:
|
Thanks for the detailed feedback. The files have now been updated.
I added the |
.gloss-sc-ul { | ||
/* remove-property padding-left */ | ||
} | ||
:root[data-glossary-layout-mode=compact] .gloss-sc-ul[data-sc-content=glossary] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group these together for simplicity:
:root[data-glossary-layout-mode=compact] .gloss-sc-ul[data-sc-content=glossary],
:root[data-glossary-layout-mode=compact] .gloss-sc-ul[data-sc-content=glossary] .gloss-sc-li,
:root[data-glossary-layout-mode=compact] .gloss-sc-ul[data-sc-content=glossary] .gloss-sc-li:not(:first-child)::before {
/* remove-rule */
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, that's actually what I tried first. But the tests do not like that. I think that only works if they are defined that way (as a group) in the ext/css/structured-content.css
file. But the rules are all different there, so they can't be combined.
Running test-css-json.js...
Error: Could not find rule with matching selectors
at generateRules (/home/stephen/Code/yomichan/dev/css-to-json-util.js:139:48)
at main (/home/stephen/Code/yomichan/test/test-css-json.js:28:42)
at testMain (/home/stephen/Code/yomichan/dev/util.js:127:15)
at Object.<anonymous> (/home/stephen/Code/yomichan/test/test-css-json.js:35:5)
at Module._compile (node:internal/modules/cjs/loader:1105:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Module._load (node:internal/modules/cjs/loader:827:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought I handled that differently, but I guess not; my bad! (I should really be more familiar with my own code)
|
I think this is an interesting idea. It seems simple enough that even I could probably implement it, and it would probably cut back on a lot of validation time. I think a major drawback to the idea (as I understand it) is that it would very tightly couple the dictionary's display logic with yomichan, which would require [1] yomichan to be updated whenever major changes occur in the dictionary format and [2] yomichan to maintain compatibility with all prior dictionary formats (to support users who do not upgrade to the newest version of the dictionary). If the new data type representations are sufficiently modular, perhaps this would not be an issue. However, I'm not sure we're going to be able to strike a good balance between modularity and conciseness with this JMdict data that would be worth the effort. Here's an example: Minimized JMdict term bank entry with hypothetical data types[
"読む",
"よむ",
"5 v5m vt",
"v5",
1440001,
{
"type": "jmdictglossary",
"sources": [
{
"lang": "zh",
"language": "Chinese",
"wasei": false,
"content": "讀・dú",
"type": "partial"
}
],
"notes": [
"now mostly used in idioms"
],
"glosses": [
"to count",
"to estimate"
],
"infoGlosses": [
{
"type": "expl",
"content": "The Yomi in Yomichan comes from 読む"
},
{
"type": "tm",
"content": "FooSoft® 2022"
}
],
"references": [
{
"kanji": "さばを読む",
"brief": " to manipulate figures to one's advantage; to count wrongly on purpose; to inflate or deflate one's age",
"type": "reference"
},
{
"kanji": "欠",
"reading": "あくび",
"brief": " 欠 can be read as けつ, あくび, or かけ, so a reading has to be specified",
"type": "antonym"
}
]
},
1456360,
"P ichi news6k"
] Note that the entire glossary is contained within a single This hypothetical I don't know anything about JSON schema validation, and I haven't looked into how Yomichan does it or what engine it uses. My naive hope was that maybe Yomichan could be updated to use a faster validator (ajv claims to be the fastest) and perhaps that would be enough to solve the problem. |
Yeah, I agree overall that it's difficult to balance modularity, efficiency, and compatibility, hence why I haven't yet took the initiative to implement something like what I mentioned in #2129 (comment).
A custom one I wrote, which supports a limited subset. I can look into doing a comparison vs ajv. The other downside of complex structured content vs native definitions is in the database storage overhead, since all of the formatting needs to be stored. |
Structured Content
lang
AttributeAuthors of dictionaries for Yomichan may want to include content from various languages, so I think this is a valuable feature. JMDict, for example, includes Japanese loanword source-words from over 60 different languages.
Characters 直次茶冷 displayed in structured content glosses
⚠The current version of Yomichan seems to apply
lang="ja"
attributes to standard glosses which contain Japanese characters, but not to structured content glosses. By default, most browsers will render the characters "直次茶冷" in simplified Chinese.Structured Content Lists
The current way in which I have inserted supplemental information into JMdict glossaries (see: #1165) is a little awkward. When the "compact glossaries" option is enabled, all of the information is grouped and compacted together. It will be easier for the user to parse the information if the different types of information are broken into separate sections. My idea is to break them up into separate unordered lists, each with its own
list-style-type
.読む
読む (compact glossaries)
元 ("compact glossaries" and "group related terms" mode)
アルバイト
欠席
ちりも積もれば山となる
I've included tests for these new structured content features in the file
test/data/dictionaries/valid-dictionary1/term_bank_1.json
.Additionally, here is a new version of JMdict for Yomichan which uses the new features.
jmdict_english_info_glosses_2022_05_13.zip
This version takes over 30 minutes to validate during the import process, so it is probably not viable for distribution unless Yomichan's validation procedure is optimized. Glosses that do not contain supplemental information are not inserted into structured content containers (they are formatted identically to the current production version of the dictionary), so I don't think there are any additional optimizations I can make to the dictionary without cutting content.
Here is a version of the new JMdict dictionary that does not contain external reference notes (i.e. notes that indicate when an entry is referenced by another entry). On my PC it takes about 10 minutes to validate.
jmdict_english_info_glosses_no_ext_xrefs_2022_05_13.zip
I'm open to suggestions on how to improve the appearance of the new JMdict dictionary. I still need to clean up the code in my branch of yomichan-import a bit, but I think I'm out of ideas for additional features to add.