Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not capitalize identifier #2154

Closed
datadavev opened this issue Jul 4, 2023 · 2 comments
Closed

Do not capitalize identifier #2154

datadavev opened this issue Jul 4, 2023 · 2 comments
Assignees
Labels
Milestone

Comments

@datadavev
Copy link

Describe the bug
In the DataONE web UI, the citation for a resource is listed at the top and appears to be capitalized. This is bad since some resource identifiers are case sensitive and means the identifier can't be copied and used. For example:

image

provides HTTPS://PASTA.LTERNET.EDU/PACKAGE/METADATA/EML/KNB-LTER-NTL/428/1 but that URL returns 404. It should instead be shown as https://pasta.lternet.edu/package/eml/knb-lter-ntl/428/1 which at least resolves to something.

Solution is simple, never use text-transform: uppercase; or any text-transform for that matter on cited resources.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'search.dataone.org'
  2. Click on 'any resource'
  3. Scroll down to 'top of page'
  4. See error - forced case of identifier.

Expected behavior
Any identifier should be show in the case that it was retrieved.

Screenshots
See above

@datadavev datadavev added the bug label Jul 4, 2023
@mbjones
Copy link
Member

mbjones commented Jul 6, 2023

Thanks for the report, Dave. That does seem to be a problem. That block of the citation is displaying the identifier, which is not necessarily a URI. But I agree with you we should be cautious and avoid reformatting it. In the case of DOIs, they are explicitly case-insensitive, which is how we got here I think. But as you point out for URIs, the scheme and host part are case-insensitive, and interpretation of the rest of the URI is scheme-specific -- for HTTP URIs, it says they SHOULD be treated as case-sensitive on the client side (although IIS treats them as case insensitive and is still spec compliant).

But the bottom line for me is that we can't tell any of this universally, and so the identifier is a string which we should reflect back in the citation in exactly the case as the original. Even something as simple as a trailing slash could be significant in some schemes, and so we should respect the identifier as an immutable string. And let's keep in mind that these strings are DataONE Identifier objects, which are explicitly defined as unicode strings with no whitespace or non-printing characters and with a max length of 800. See https://dataoneorg.github.io/api-documentation/apis/Types.html#Types.Identifier. I'm not sure what we should do with how that rule conflicts with DOI rules (where 10.18739/AAABBB and 10.18739/AAAbbb are the same identifier -- we would track them as distinct identifiers in DataONE).

@datadavev
Copy link
Author

Good point on DOIs. ARKs are a bit odd too, with hyphens being essentially "non characters" removed from a normalized view.

I guess ideally there would be some normalization rules applied for incoming identifiers to avoid things like doi:10.18739/AAABBB != doi:10.18739/AAAbbb.

But at least preserving the provided case avoids inadvertently breaking some things.

@robyngit robyngit added this to the 2.26.0 milestone Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants