You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the 虎の巻 entry, Stephen added two additional surface forms to the kanji field, トラの巻 and とらの巻, quoting the ngram numbers:
虎の巻 257508
トラの巻 71459
とらの巻 5711
とらのまき 1565
トラのまき 633
とらのマキ No matches
トラのマキ No matches
I don't have an issue with トラの巻, it's obviously common, but, とらの巻 OTOH only gets 1.7% of the total ngram hits. If it were a "true kanji form" (i.e. with another kanji rather than with kana replacing a kanji), we'd have tagged it [rK], and I think [rK] forms are only really worth adding when they actually appear in actual dictionaries. So, I removed とらの巻, but Jim added it back saying "With 5k in the ngrams I'd keep it."
I feel we need to sit down and figure out exactly what thresholds we want these different forms to meet for inclusion. Personally, I don't think we should include anything not in another dictionary if it's what we'd qualify as an [rK] if it had contained unique kanji - i.e. if it gets less than 2.5% of the total ngram hits. The ngrams of course aren't the end-all and twitter usage etc. can be useful indicators too depending on the word, but I don't think we need to make any exceptions for absolute numbers. Like I said in the 女の子 entry, "it doesn't make sense to just look at the raw numbers, or all our P-tagged entries should be horrible messes with lots of different and rare versions. There's a balance we need to strike between presenting easy-to-read entries, and trying to include absolutely everything. "
The text was updated successfully, but these errors were encountered:
https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1267620.1
In the 虎の巻 entry, Stephen added two additional surface forms to the kanji field, トラの巻 and とらの巻, quoting the ngram numbers:
I don't have an issue with トラの巻, it's obviously common, but, とらの巻 OTOH only gets 1.7% of the total ngram hits. If it were a "true kanji form" (i.e. with another kanji rather than with kana replacing a kanji), we'd have tagged it [rK], and I think [rK] forms are only really worth adding when they actually appear in actual dictionaries. So, I removed とらの巻, but Jim added it back saying "With 5k in the ngrams I'd keep it."
We discussed this previously in the 女の子 entry regarding the addition of オンナノコ[nokanji] to the reading field: https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&e=2181330
we ended up not including it, despite the absolute number (100k) being rather high.
I feel we need to sit down and figure out exactly what thresholds we want these different forms to meet for inclusion. Personally, I don't think we should include anything not in another dictionary if it's what we'd qualify as an [rK] if it had contained unique kanji - i.e. if it gets less than 2.5% of the total ngram hits. The ngrams of course aren't the end-all and twitter usage etc. can be useful indicators too depending on the word, but I don't think we need to make any exceptions for absolute numbers. Like I said in the 女の子 entry, "it doesn't make sense to just look at the raw numbers, or all our P-tagged entries should be horrible messes with lots of different and rare versions. There's a balance we need to strike between presenting easy-to-read entries, and trying to include absolutely everything. "
The text was updated successfully, but these errors were encountered: