Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill publication identifiers #1359

Open
peetucket opened this issue Jun 10, 2021 · 1 comment
Open

Backfill publication identifiers #1359

peetucket opened this issue Jun 10, 2021 · 1 comment

Comments

@peetucket
Copy link
Member

peetucket commented Jun 10, 2021

We have many publications for which we one identifier (PMID), but not other identifiers (DOI) for the same publication. This can be due to the fact that the original harvest source did not return all known identifier, or because the user entered the publication manually and didn't provide all identifiers.

The lack of identifiers can prevent us form pushing publications to ORCID or can cause duplicates when pushed to ORCID (see https://docs.google.com/document/d/1ZfNmfBzPTYm7aJpwrWAx6nXHvVvt1AfkOmSceOxCoXo)

The lack of identifiers also makes the dataset potentially less useful for research intelligence purposes.

It would be beneficial to backfill publications with other identifier where available. This would require using an API or other data source that could be a fed a known identifier from our database (e.g. a PMID) and return other known identifiers (e.g. DOI) for the same pubication. We would then augment our publication record with this identifier (in the PublicationIdentifier table, and then denormalized into the pub_hash).

Potential APIs to use:

@peetucket
Copy link
Member Author

Investigated the Clarivate Links AMR (Article Match Retrieval) Client. In our code, we can currently look up any WOS ID and get back PMIDs and DOIS:

 Clarivate::LinksClient.new.links(['WOS:A1976BW18000001'],fields: %w[doi pmid])
=> {"WOS:A1976BW18000001"=>{"doi"=>"10.5860/crl_37_03_205"}}

An experimental branch #1360 shows how you can also pass a DOI to get back other identifiers, and could be extended to also passing a PMID and getting back identifers:

Clarivate::LinksClient.new.links_doi(['10.1118/1.598623'],fields: %w[issn pmid ut])
=> {"10.1118/1.598623"=>{"issn"=>"0094-2405", "pmid"=>"10435530", "ut"=>"000081515000015"}}

Other identifiers known to Links AMR are PMID, ISBN, ISSN. See http://wokinfo.com/directlinks/amrfaq/#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant