Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author disambiguation #77

Open
3 of 12 tasks
eloiferrer opened this issue Jun 16, 2023 · 4 comments
Open
3 of 12 tasks

Author disambiguation #77

eloiferrer opened this issue Jun 16, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@eloiferrer
Copy link
Member

eloiferrer commented Jun 16, 2023

Issue description:
The current importers (CRAN, zbMath, polyDB) create entities for authors using ORCID ID, zbMath ID or no identifier.
For the cases in which an identifier exists, authors might have been created more than once by different importers.
Duplicate authors should be identified, merged and completed with information from Wikidata.
The dataset mentioned here (MaRDI4NFDI/portal-compose#344) can be useful for the task.

TODOS:

  • For each author with a zbMath ID check if the Wikidata QID can be found
  • For each author with an ORCID ID check if the Wikidata QID can be found
  • For each author with a Wikidata QID, import if available zbMath ID, ORCID and arXiv author ID.
  • Try to get more ORCID IDs with zbMath API; see Add author names portal-compose#487
  • Check arXiv author ID in e.g. https://arxiv.org/a/0000-0002-7970-7855.html
  • Merge duplicate entities.

Acceptance-Criteria

Checklist for this issue:

  • Assignee has been set for this issue
  • All fields of the issue have been filled
  • Issue has been assigned to the main project
  • Code was merged
  • Feature branch has been deleted and issues were updated / closed
  • Issue is tracked by an epic, or the label 'non-epic' is set to the issue.
@eloiferrer
Copy link
Member Author

The ORCID for all the zbmath authors in https://zenodo.org/records/7378860 have been inserted.

Current statistics in the KG:

  • Humans = 1178288

  • Humans with zbmath ID = 1117009

  • Humans with ORCID ID = 40109

  • Humans with zbmath ID and Wikidata = 40753

  • Humans with zbmath ID and ORCID = 32619

  • Humans with arXiv author ID = 127

Next step: get Wikidata QID for as many humans as possible:

  • given the zbMath ID & given the ORCID ID

@eloiferrer eloiferrer changed the title Author disambiguator/synchronizer Author disambiguation Feb 22, 2024
@eloiferrer
Copy link
Member Author

eloiferrer commented Feb 26, 2024

Given the zbMath ID I have matched them to items available in Wikidata. Only ~5% of the zbMath authors exist in Wikidata (with the zbmath identifier). For those where an ORCID was present, it has also been imported.

Current statistics:

  • Humans = 1178287
  • Humans with zbmath ID = 1117010
  • Humans with ORCID ID = 41815
  • Humans with zbmath ID and Wikidata = 49777
  • Humans with zbmath ID and ORCID = 34324
  • Humans with arXiv author ID = 139

@eloiferrer eloiferrer self-assigned this Feb 26, 2024
@eloiferrer
Copy link
Member Author

I've imported further Wikidata QIDs given the current ORCID in the KG.
I've also merge several authors that had the same ORCID ID.

Current statistics:

  • Humans = 1178923
  • Humans with zbmath ID = 1117010
  • Humans with ORCID ID = 41577
  • Humans with zbmath ID and Wikidata = 54884
  • Humans with zbmath ID and ORCID = 34331
  • Humans with arXiv author ID = 140

@eloiferrer
Copy link
Member Author

eloiferrer commented Feb 27, 2024

Wikidata has author items that contain two zbMath IDs. For most of the cases this is wrong, which leads to our knowledge graph having the same Wikidata QID for two different zbmath authors.
See cases here: http://tinyurl.com/27d65qov
This would require some manual disambiguation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant