Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directoryLabel leading '.' char for hidden directories is ignored (removed) #8807

Open
jernsting opened this issue Jun 18, 2022 · 6 comments
Labels
Projects

Comments

@jernsting
Copy link

What steps does it take to reproduce the issue?
Crate a new file in a dataset with directoryLabel starting with '.' char. And save the file.

  • When does this issue occur?
    Issue occurs when adding files in hidden directories on top level. Hidden directories inside other directories work.

  • Which page(s) does it occurs on?
    Upload dataverse files (API and webinterface)

  • What happens?
    The leading '.' char is ignored / removed. For example adding .hidden/file will be saved as hidden/file.

  • To whom does it occur (all users, curators, superusers)?
    all users

  • What did you expect to happen?
    Adding .hidden/file should be saved in a hidden directory named .hidden

Which version of Dataverse are you using?
v. 5.9 build 750-fb24c87

Any related open or closed issues to this bug report?
datalad/datalad-dataverse#83

Screenshots:

  • Adding file in hidden dir
    Bildschirmfoto 2022-06-18 um 11 17 04
  • hidden dir is now no longer hidden
    Bildschirmfoto 2022-06-18 um 11 17 13
@bpoldrack
Copy link

For the record: Same thing appears to happen at demo.dataverse.org with slightly newer dataverse.

@poikilotherm
Copy link
Contributor

poikilotherm commented Jun 21, 2022

Hat tip to @qqmyers for pointing me in the right direction.

The problem is in FileMetadata.setDirectoryLabel() -> StringUtil.sanitizeFileDirectory():

while (value.startsWith("/") || value.startsWith("-") || value.startsWith(".") || value.startsWith(" ")){
value = value.substring(1);
}

@landreev @sekmiller according to git history, you wrote the code - there is no statement about why you cut off a leading dot, but this has obviously been a design decision. Before creating a PR to exclude single starting dots, maybe you can shed some light on this (I know, code is from 2019...)? Thx!

There is a parameterized test that also demonstrates the experienced behavior is as designed.

{".some/path/to/a/directory", "some/path/to/a/directory", false},
)

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2022

Here's the release note from 4.12: https://github.com/IQSS/dataverse/releases/tag/v4.12


A note on folder names:

In this release users are given an option to edit the folder names in the file metadata. Strict validation rules for the folder names are also introduced. Only the following characters are allowed: the alphanumerics, '_', '-', '.' and ' ' (white space). Some datafiles in your Dataverse may already have folder names saved in the database (if they were extracted from uploaded zip archives with folder structure). The following sanitizing rules will be applied to all the existing folder names in the database: any invalid characters will be replaced by the '.' character. Any sequences of dots will be further replaced with a single dot. For example, the folder name data&info/code=@137 will be converted to data.info/code.137. This update will be automatically applied to the database the first time this release is deployed.


From the above it seems like period/dot/full stop/. should be allowed. That note was added in PR #5650 which closed #5565.

As observed in this issue it seems like leading periods are disallowed. This is reflected not only in the code above but in the SQL update script in the pull request above:

-- get rid of any leading or trailing slashes, spaces, '-'s and '.'s: 
UPDATE filemetadata SET directoryLabel = regexp_replace(directoryLabel, '^[/ .\-]+', '', '');

@pdurbin
Copy link
Member

pdurbin commented Apr 9, 2024

If we supported dot directories like .datalad it would be easier for DataLad folks to find their datasets that are hosted on Dataverse:

Besides, it would be generally useful as well, of course! 😄

Reading my comment above, I'm wondering if dot directories are already supported. 🤔 We should re-test.

@landreev
Copy link
Contributor

@pdurbin
Similarly, I cannot remember why we chose to implement it like that, stripping the initial dot... yet it definitely looks like we did so on purpose. 🤔 indeed.
Interestingly, it is only the initial dot that is stripped. I.e., if you enter .hidden it is auto-converted to hidden. But subfolder/.hidden is left intact.
We should probably just go ahead and stop stripping it - ?

@pdurbin
Copy link
Member

pdurbin commented Apr 23, 2024

@landreev it sounds like you looked at the code. Great. Yes, we should stop removing the initial .. Interesting that they're allowed in subfolders!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🔍 Interest
pdurbin
Watching
Development

No branches or pull requests

5 participants