Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload all of GAZ to wikidata #3

Open
cmungall opened this issue Mar 28, 2018 · 11 comments
Open

Upload all of GAZ to wikidata #3

cmungall opened this issue Mar 28, 2018 · 11 comments
Labels
question questions or discussion items (comnsider splitting)

Comments

@cmungall
Copy link
Member

We don't have resources to update gaz.obo. Unless we can find a volunteer it may make most sense to upload to wikidata and have people update on wikidata (having a way to export a gazetteer in obo or owl from wikidata will be easy).

If people are in favor I can look to getting some tips on how best to do this

@pbuttigieg
Copy link
Member

Given the amount of country and regional data in the wikidata system, I think this is a viable option, especially if we can somehow link the existing GAZ IDs to the WD entries. It may hurt the semantic rigor of GAZ a bit, but perhaps this can be mitigated by linking the GAZ/WD instances to ENVO classes.

We should consider circulating a link to this issue via some general mailing lists, just to be sure we have input.

@cmungall
Copy link
Member Author

cmungall commented Mar 29, 2018

Yes, we should definitely link the WD instances to their types. As a first step I got an ENVO ID property registered in WD. Next is to upload the core ENVO graph (facilitated by EnvironmentOntology/envo#600).

I think WD would be open to including a lot of the GAZ relationship types in. There are already some cognates of the RO relations in there, e.g. tributary. For anything else that does not fit, we can maintain our own axiom layer.

mail lists: I contacted the obo-discuss list.

When we're ready we can engage wikidata. A lot of the technical parts should be quite straightforward with the infrastructure Andrew Su's group has put into place, but we will need to make a case for inclusion in WD, and that what we have is trustworthy.

@lschriml
Copy link
Collaborator

Have you contacted Michael Ashburner about GAZ, I developed it with him.
If he is not able, or does not wish to, I will volunteer to take care of the GAZ.
And can coordinate other volunteers to work on the GAZ.

And I can work with Andrew Su/Wikidata for integration options, as we are already working together.

Cheers,
Lynn

@cmungall
Copy link
Member Author

cmungall commented Apr 1, 2018

Unfortunately Michael isn't able to develop it any further.

It would be great to have you as the caretaker, and coordinate with Andrew and others on wikidata integration, thanks!

For short term management of the gaz edit file this old thread may be useful:

http://gmod.827538.n3.nabble.com/cv-relationships-td4039290i20.html#a4042423

@mjy
Copy link

mjy commented Apr 2, 2018

We spent a lot of time integrating just a couple gazeteers with GIS layers (GADM, Natural Earth, TDWG hierarchy), in a relational DB format, the "normalization" is seriously non-trivial, and it never ends.

I too question whether a ontology is the right solution for these data. Wiki-data, assuming it can handle the shape data, may be a good solution, but even there it will likely fail unless the concepts of time and synonym at different levels are carefully worked out (e.g. same name [language specific], same time, different shape representation, same source of shape representation).

The most important issue to me is to represent your GIS data as shapes, so that you can compute. To my knowledge there is no means of reasoning across shapes in OWL, so again this suggests that GAZ is not particularly the best representation for maintaining these data.

@lschriml
Copy link
Collaborator

lschriml commented Apr 2, 2018 via email

@Public-Health-Bioinformatics

Happy to hear GAZ could be synchronized with an updated resource like Wikidata.

I would put in a plug for linking countries like Yugoslavia as historic/archaic, perhaps using "instance of"
"historical country" as wikidata does (so I can avoid including them in selection lists). Also would be great to take in GeoNames ids via wikidata as that is the other comprehensive open source resource I've seen.

@mjy GAZ could perform a useful role in terms of reasoning via "located in", and "shares border with" type relations without expecting it to have reasoning power on GIS lat/long.

Now in GAZ it seems like municipality lists aren't complete - and was hoping a wikidata integration could resolve this (e.g. I could find only GAZ "populated place, Brazil" http://purl.obolibrary.org/obo/GAZ_00002831) vs https://en.wikipedia.org/wiki/Municipalities_of_Brazil ? I see for example that this info exists as e.g. instances of https://www.wikidata.org/wiki/Q3184121 "municipality of Brazil" class. (We have some dynamic lookup trying to fetch municipality for user's biosample location.)

Lynn, much appreciated that you can organize the GAZ v2!

@mjy
Copy link

mjy commented Apr 3, 2018

@Public-Health-Bioinformatics I agree that kind of reasoning might be useful, but your examples are getting exactly to my point.

All ontologies come with certain learning curve, what are the concepts, how are they organized, are they complete, when were they updated etc. While there is certainly a fairly steep learning curve behind the kinds of ways we can represent a GIS layer/shape, once we have a shape/point in place we can largely ignore all this type of baggage, i.e. shapes will "just work" with respect to queries like intersection, nearness, containing, boarder sharing etc., i.e. I don't have to worry that someone curated a "located_in" assertion, or did a syncronization, etc.

So playing the devils advocate (in fact I actually wanted to use something like GAZ several years back when we came up with our GIS models) why try to replicate all of this functionality with human made assertions that must be continually curated and understood when you can depend on a parellel system that is specifically designed to address these questions (and address them very quickly, and with much higher precision)?

@Public-Health-Bioinformatics

I get that - I wouldn't look to OWL logic + GAZ to do what GIS queries do even if GAZ had lat/lon. For biosample descriptions though, it would be great to have comprehensive ontology identifiers for municipal and other levels of govt. such that they map over exactly to an updated GIS database of such things. For those users needing to enter lat/lon (e.g. for NCBI biosample data requirements), this could be looked up reliably - and immediately if in GAZ directly. So if GAZ can be updated with this information comprehensively via script from wikidata on a periodic basis, then I like that. Or perhaps sourced from GeoNames instead? In this vision, geo entity names and located_in relations are actually curated in wikidata or elsewhere. Only works if source database is satisfactory though.

@pbuttigieg pbuttigieg mentioned this issue Apr 28, 2019
@cmungall cmungall added the question questions or discussion items (comnsider splitting) label Apr 29, 2019
@cmungall
Copy link
Member Author

cmungall commented Jul 3, 2019

Just an update on this, I have processed 4k of the 6k+ GAZ entries

High confidence matches here:

https://github.com/cmungall/environments2wikidata/blob/master/matches/align-high-confidence-gaz.tsv

cmungall added a commit to INCATools/environments2wikidata that referenced this issue Aug 5, 2019
@cmungall
Copy link
Member Author

cmungall commented Aug 5, 2019

All 6k entries are now processed!

Around ~167k of fairly high confidence mappings. Note these can act as seed to get more high confidence ones.

https://github.com/cmungall/environments2wikidata/blob/master/matches/align-high-confidence-gaz.tsv

The complete subset of Wikidata in ttl plus all hypothetical matches are stored here:
https://osf.io/unga9/
(upload in progress)

Note we now also have a property in wikidata:
https://www.wikidata.org/wiki/Property:P6778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question questions or discussion items (comnsider splitting)
Projects
None yet
Development

No branches or pull requests

4 participants