Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid URLs and handles break JSON-LD #6542

Open
rdmpage opened this issue May 17, 2022 · 3 comments
Open

Invalid URLs and handles break JSON-LD #6542

rdmpage opened this issue May 17, 2022 · 3 comments
Assignees

Comments

@rdmpage
Copy link

rdmpage commented May 17, 2022

There are cases where ORCID URLs and Handles are not valid URIs, which breaks attempts to parse JSON-LD as RDF. These happen in about 10-20 records in a sample of 5000 that I am working with. Not supper common, but enough to break things.

URLs sometimes lack the http prefix, e.g the personal page for https://orcid.org/0000-0003-1802-2649. This breaks RDF, but also the ORCID web page: The personal page for Andrey I. Khalaim is given as https://orcid.org/www.zin.ru/labs/insects/hymenopt/personalia/khalaim/ instead of https://www.zin.ru/labs/insects/hymenopt/personalia/khalaim/

Ideally a simple regular expression to check users have actually input a URL would catch these.

For Handles there are some very bad examples at https://orcid.org/0000-0003-2573-1371 such as:

2018 | Dissertation/Thesis
SOURCE-WORK-ID: cv-prod-id-513032
HANDLE: Cecchetti, Arianna. "Effects of tourism operations on the bahavioural patterns of dolphin populations off the Azores with particular emphasis on the common dolphin (Delphinus delphis)". 2018. 112 p.. (Dissertação de Mestrado em Biologia). Ponta Delgada: U
HANDLE: http://hdl.handle.net/10400.3/4982
OTHER-ID: 101606494
CONTRIBUTORS: Cecchetti, Arianna

Note that first Handle is http://hdl.handle.net/cecchetti,%20arianna.%20%22effects%20of%20tourism%20operations%20on%20the%20bahavioural%20patterns%20of%20dolphin%20populations%20off%20the%20azores%20with%20particular%20emphasis%20on%20the%20common%20dolphin%20(delphinus%20delphis)%22.%202018.%20112%20p..%20(disserta%C3%A7%C3%A3o%20de%20mestrado%20em%20biologia).%20ponta%20delgada:%20u

This is probably a trivial error in the user-supplied content, but ideally this would be caught on input. I realise that dealing with user-supplied content can be a bit of a nightmare.

@rdmpage
Copy link
Author

rdmpage commented Oct 8, 2022

Further examples, for 0000-0003-2861-949X we have DOIs that are broken, e.g.:

Screenshot 2022-10-08 at 10 19 21

Note the | in the middle. These DOIs break any attempt to parse JSON-LD from Orcid.org

@TomDemeranville
Copy link
Contributor

That example has sadly been added by a member, and we see this behaviour from several of our clients. We do normalise many of our identifiers in API3.0, but don't do this for everything. This one has probably got past our parser because it has two dois in it. Argh.

@rdmpage
Copy link
Author

rdmpage commented Apr 21, 2023

Further to the list of woes with ORCID JSON-LD, note that sameAs should be a list of one or more URIs, but ORCID often includes simple strings such as numbers. These are not valid RDF.

Note that it may be slightly confusing because of the way JSON-LD is output because sameAs appears as a list of strings (e.g., "http://some.url"). But it is a list of URIs, not strings. If you look at the context at https://schema.org/docs/jsonldcontext.json you will see sameAs defined as:

"sameAs": {
      "@id": "schema:sameAs",
      "@type": "@id"
    },

This may seem a small point, but it breaks any use of sameAs in SPARQL queries because properly constructed queries expect values of sameAs to be URI not a literal.

It would be great if ORCID were to actually use the RDF it exports ("dog-fooding"), because if it did it would rapidly discover that its RDF output has problems. This is a pity because this is potentially a fabulous resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants