-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Urum basic lexicon #62
Comments
Maybe this is of interest for the dictionaria project |
@LinguList and @xrotwang Should we keep the basic lexicon as an issue in Concepticon or should I open a new issue on the Dictionaria GitHub page? Btw. the link didn't work anymore, but I found this one: http://projects.turkmas.uoa.gr/urum/download/docs/uum-lexicon.pdf |
This is a dataset for one language in lexibank, or even more than one, given the glossing languages. One would need to see to which degree the concept list can be extracted from the data (using some tools like adobe pro). One may also think of contacting the authors, if they are interested in sharing the concept list in form of an excel sheet. |
But we should then ask them directly, maybe now? |
It looks like the PDF that you've linked can be parsed relatively easily, the entries are all organized similarly (there are no optional notes) and each piece of information is preceded by the keyword, so it won't be that hard with PDFMiner. And unfortunately I don't know the authors =( |
So we can already prepare the data with adobe pro (this is working even better), I think @MacyL has it, otherwise I'll ask Nathan, and then we have the concept list, which is anyway nice. In the meantime, we ask the authors if they are interested in submitting their data to dictionaria? |
http://urum.lili.uni-bielefeld.de/download/docs/uum-lexicon.pdf
This list draws from WOLD, adds 90 more concepts, and provides alternative categories. It is long, and it is a PDF, so now way to quickly extract a linking to the concepticon. The semantic categories would be interesting, though, but this is probably rather a long-term than a short-term list-to-map.
The text was updated successfully, but these errors were encountered: