Skip to content

Directory names conflicting with file names#12407

Open
jo-pol wants to merge 2 commits into
IQSS:developfrom
DANS-KNAW-jp:directory-name-conflict
Open

Directory names conflicting with file names#12407
jo-pol wants to merge 2 commits into
IQSS:developfrom
DANS-KNAW-jp:directory-name-conflict

Conversation

@jo-pol
Copy link
Copy Markdown
Contributor

@jo-pol jo-pol commented May 26, 2026

What this PR does / why we need it:

Downloads result in unzip problems for datasets with full file-paths duplicating directories. Note that directory does not just mean the directoryLabel, but also the parents in the directoryLabel.

  • EditDatafilesPage: message shows all conflicting files, not just the first
  • a file with a directory conflicting with an existing full path is rejected
  • files with a full path that conflicts with an existing directory will get a sequence number added
  • additional unit test
  • manual test script
  • scripts to detect (latest version of) datasets with conflicting directory paths

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this:

  • Adjust constants at the start of test-apis.py to meet your system and an empty dataset. Output with dashed lines show expectations after deploy. See also screenshots below.
  • Download all files resulting from test-apis.py before this fix, unzip fails.
  • Deploy the fix - try to add a non-conflicting file to the dataset with conflicts - saving the dataset fails
  • Try to add files with files conflicting with existing directories to a healthy dataset: they will get a sequence number
  • Try ta add a file with a (parent) directory that exists as a file: saving the dataset will fail.
  • Adding files to a directory that already exists should succeed.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

See the changed dataset-management.rst file, the other changed rst file had a typo.

Is there a release notes update needed for this change?:

Existing datasets with the new type of duplicate names should be identified, see: scripts/issues/dirs-duplicating-files/find_duplicates.py The reported datasets should be fixed manually, best before deploying this fix. Depending on your preferences and the size of your database you might want a variation of the scripts.

Additional documentation:

Screenshots for the results of test-apis.py before and after deploy. Before depoly we see conflicts on foo and foo/bar), after deploy some request return 400-bad-request.
image image

jo-pol added 2 commits May 26, 2026 14:21
- EditDatafilesPage: message shows all conflicting files, not just the first
- a file with a directory conflicting with an existing full path is rejected
- files with a full path that conflicts with an existing directory will get a sequence number added
- additional unit test
- manual test script
- scripts to detect (latest version of) datasets with conflicting directory paths

Note that directory does not just mean directoryLabel, but also the parents in the directoryLabel
@jo-pol jo-pol changed the title squash of DANS PR242 up to e24bc721 Directory names conflicting with file names May 26, 2026
@jo-pol jo-pol marked this pull request as ready for review May 26, 2026 13:30
@pdurbin pdurbin moved this to Ready for Triage in IQSS Dataverse Project May 26, 2026
@pdurbin pdurbin moved this from Ready for Triage to Ready for Review ⏩ in IQSS Dataverse Project May 26, 2026
r = requests.post(url, headers=headers, json=json_content, verify=False)

print(r.status_code)
print(r.text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready for Review ⏩

Development

Successfully merging this pull request may close these issues.

3 participants