Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify Wikidata reconciliation strategies for GAZ. #43

Open
andrawaag opened this issue Jan 6, 2023 · 3 comments
Open

Identify Wikidata reconciliation strategies for GAZ. #43

andrawaag opened this issue Jan 6, 2023 · 3 comments

Comments

@andrawaag
Copy link
Contributor

GAZ does seem to have many mappings to external identifiers (if at all). This makes aligning Wikidata particularly challenging.

To get all terms in GAZ covered in Wikidata we would probably need to apply different strategies to see if a term is already is covered or not.

In the case where the label used in Wikidata exactly matches the term in GAZ, Open refine, can be our friend. I used this tool - offered in for example PAWS - to align GAZ countries with Wikidata.

However, I continued with terms on Suriname in GAZ. So far all terms do exist in Wikidata but most with a different spelling variation. I will try to add all GAZ terms for that country, manually.

So so far two strategies have been applied:

  1. Where the terms match exactly in Wikidata, we can rely on Open Refine
  2. Where the terms exist, but with difference in spelling, manual curation by a curator with local knowledge is required
  3. .......
@cmungall
Copy link
Member

cmungall commented Jan 6, 2023

I started a repo with some plans in it here:

https://github.com/INCATools/environments2wikidata

there are so many terms, manual curation will be hard. But we can use ontology axioms to aid in the disambiguating...

lots of old code, I will try and update...

@andrawaag
Copy link
Contributor Author

Today I tried to add as many GAZ identifiers to Wikidata on Suriname as possible (see: https://w.wiki/6CVW).

image

This was basically mainly a manual curation step, where I search for the names in Wikidata and added the respective GAZ identifiers.

@lschriml
Copy link
Collaborator

lschriml commented Jan 9, 2023

For editing the GAZ:
Make a pull request to:
edit the GAZ_countries.owl file.
As the full GAZ is quite large, we are no longer editing that file.

To edit:
First I would check the gaz.owl file that the locations you want are not already in the file.
I would recommend using a new ID space:

GAZ:$sequence(8,33333333,44444444)
As this will not conflict with already used name spaces

Cheers,
Lynn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants