Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questionable yojijukugo #72

Closed
stephenmk opened this issue Jul 31, 2022 · 3 comments
Closed

Questionable yojijukugo #72

stephenmk opened this issue Jul 31, 2022 · 3 comments

Comments

@stephenmk
Copy link

I've put together a list of JMdict entries which contain the [yoji] miscellaneous tag on one or more senses but do not contain any surface forms that can be found in jitenon's yoji dictionary.

The list: https://gist.github.com/stephenmk/0dcb1318ec60bb35045e75b062d74be4

Goo Jisho also hosts content from gakken and shinmeikai-branded yoji dictionaries. I have some data that was scraped from these online dictionaries about a year ago, and I was also unable to find any of the above surface forms in either of these datasets. I can't say with certainty that these datasets are comprehensive, however.

The list is unfortunately pretty long. At least 1138 of 2730 of our yoji entries do not contain any surface forms that may be found in the jitenon dictionary. I'm not sure we'd want to remove the yoji tag from all of them, but there are far too many to review individually. So there may not be much we can do with this information.

There are six entries in the list which have priority tags, so maybe we should at least consider removing those yoji tags:

sequence surface form
1232400 拒絶反応
1307250 四捨五入
1321540 実力行使
1595050 暑中見舞
1703710 専守防衛
2029860 意思疎通
@JMdictProject
Copy link
Owner

I think the best thing is to remove the [yoji] tag from all 1138. I sampled a dozen or so, and as expected the tags were added because the terms were in Kanji Haitani's yoji list. That list turned out to be rather, er, flawed.
I can do the tag removal as a bulk-edit process, so I'll add the task to my "get a roundtuit" list.

@robinjmdict
Copy link

robinjmdict commented Aug 1, 2022

Thanks for generating the list, Stephen. Good thinking to use the jitenon site.

I think the best thing is to remove the [yoji] tag from all 1138.

I agree.

I was able to find a few that are included in other yojijukugo dictionaries or online lists (e.g. 才気縦横, 翻然大悟, 文武不岐) but it's clear that the vast majority should not have the tag.

@JMdictProject
Copy link
Owner

Yes, many thanks to Stephen for that list. I have run the update now (removing from the list the 3 Robin mentioned.)
I'll close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants