Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Customization : url fieldType validator #9739

Closed
luddaniel opened this issue Jul 26, 2023 · 2 comments · Fixed by #9750
Closed

Metadata Customization : url fieldType validator #9739

luddaniel opened this issue Jul 26, 2023 · 2 comments · Fixed by #9750
Labels
Feature: Metadata Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.
Milestone

Comments

@luddaniel
Copy link
Contributor

luddaniel commented Jul 26, 2023

Hello there,
We are facing an issue regarding url fieldType validation while using uncommon url.
Here is such url : https://archive.softwareheritage.org/swh:1:dir:561bfe6698ca9e58b552b4eb4e56132cac41c6f9;origin=https://github.com/gem-pasteur/macsyfinder;visit=swh:1:snp:1bde3cb370766b10132c4e004c7cb377979928d1;anchor=swh:1:rev:868637fce184865d8e0436338af66a2648e8f6e1 (generated by the website as a permalink)

Deeper analysis reveals that the url validator is based on org.apache.commons.validator.routines.UrlValidator and will fail on validation due to the // near 6f9;origin=https://githu.

UrlValidator as capability to be customized with options such as ALLOW_2_SLASHES that Allow two slashes in the path component of the URL :

UrlValidator urlValidator = new UrlValidator(schemes, UrlValidator.ALLOW_2_SLASHES);

The question now is : is it reasonable to add this option "for everyone and every url fields" ?
What do you think ?

Depending on the answer a PR could be considered.

@qqmyers
Copy link
Member

qqmyers commented Jul 26, 2023

FWIW: My understanding is that // is potentially an issue on the server the URL is for, e.g. for https://archive.softwareheritage.org in your example, rather than a potential problem for Dataverse (since Dataverse isn't trying to parse/interpret a URL in metadata).
So I'd think that changing this would be OK. It allows users to add potentially unusable URLs as metadata (if the referenced server can't handle them), or potentially to create two URLs for the same item (i.e. adding // adds an empty path segment that could be ignored meaning the URL with/without the // point to the same thing) but that seems like a minor concern (as one can add URLs that return 404s, etc.) given the benefit for your use case.
Others may see an issue, but I'd say go ahead with a PR if you don't get further comments.

@pdurbin
Copy link
Member

pdurbin commented Jul 26, 2023

I mean, the URL you mentioned definitely works when I click it: https://archive.softwareheritage.org/swh:1:dir:561bfe6698ca9e58b552b4eb4e56132cac41c6f9;origin=https://github.com/gem-pasteur/macsyfinder;visit=swh:1:snp:1bde3cb370766b10132c4e004c7cb377979928d1;anchor=swh:1:rev:868637fce184865d8e0436338af66a2648e8f6e1

So, yes, I think we should allow two slashes in URLs. And @qqmyers seems to think so too. @luddaniel please feel free to make a pull request!

luddaniel added a commit to Recherche-Data-Gouv/dataverse that referenced this issue Aug 2, 2023
luddaniel added a commit to Recherche-Data-Gouv/dataverse that referenced this issue Aug 2, 2023
@pdurbin pdurbin added Feature: Metadata Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc. labels Oct 13, 2023
stevenwinship pushed a commit that referenced this issue May 21, 2024
…the URL (#9750)

* #9739 - URLValidator now allows two slashes in the path component of the URL

* #9750 Adding a release note

---------

Co-authored-by: jeromeroucou <jeromeroucou@users.noreply.github.com>
@pdurbin pdurbin added this to the 6.3 milestone May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Bug a defect User Role: Depositor Creates datasets, uploads data, etc.
Projects
Status: Done
3 participants