Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve robustness of CrossRef metadata import via DOI: trailing whitespace causes URISyntaxException #9384

Open
saschaszott opened this issue Feb 29, 2024 · 2 comments · May be fixed by #9385
Assignees
Labels
bug tools: import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs.
Milestone

Comments

@saschaszott
Copy link
Contributor

Bug Description

The method SearchByIdCallable.call() cannot handle queries that contain trailing whitespaces, e.g. the search query 10.1111/jan.12345 (URL /import-external?entity=Publication&sourceId=crossref&query=10.1111%2Fjan.12345%20) results in a java.net.URISyntaxException.

The exception message is:

Illegal character in path at index 59: https://api.crossref.org/works/filter=doi:10.1111/jan.12345

This error is caused by

URIBuilder uriBuilder = new URIBuilder(url + "/" + ID);
@saschaszott saschaszott added bug needs triage New issue needs triage and/or scheduling labels Feb 29, 2024
@floriangantner
Copy link
Contributor

Some similar behaviour can also be spectated among other external sources.

e.g. import publication from orcid (sandbox) on dspace demo page:

Screenshot 2024-02-29 at 14-33-01 DSpace Repository Import metadata from an external source

with leading whitespaces before the orcid

Screenshot 2024-02-29 at 14-32-53 DSpace Repository Import metadata from an external source

or trailing whitespaces after the orcid

Screenshot 2024-02-29 at 14-32-43 DSpace Repository Import metadata from an external source

@tdonohue tdonohue added tools: import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs. and removed needs triage New issue needs triage and/or scheduling labels Feb 29, 2024
@tdonohue tdonohue added this to the 7.6.2 milestone Feb 29, 2024
@saschaszott saschaszott changed the title improve robustness of CrossRef metadata import via DOI: trailing whitespace cause URISyntaxException improve robustness of CrossRef metadata import via DOI: trailing whitespace causes URISyntaxException Feb 29, 2024
@alanorth
Copy link
Contributor

alanorth commented Apr 15, 2024

@saschaszott related to this (not sure if it needs its own issue): if you try to use a DOI as a URI (with the protocol and doi.org), the results are interesting—the query takes a long time and there are 6 million results!

Screenshot 2024-04-15 at 14-39-24 CGSpace Import metadata from an external source

Results work as expected with just the DOI component:

Screenshot 2024-04-15 at 14-39-40 CGSpace Import metadata from an external source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug tools: import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs.
Projects
Status: 🏗 In Progress
Development

Successfully merging a pull request may close this issue.

4 participants