Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues/4101 wrong filename in output #4139

Merged

Conversation

gmloose
Copy link
Contributor

@gmloose gmloose commented Jun 14, 2022

Changelog Entry

To be copied to the draft changelog by merger:

  • Fix wrong filename in output due to name conflict
    The logic that determines a potential filename conflict contains a flaw. If one of the outputs is a directory that contains (at least) two files with identical names except for a "trailing underscore part", then, depending on the order in which these files are visited, the filename with the "trailing underscore part" may be considered a duplicate of that without. For example, in my case there are files named table.f4 and table.f4_TSM0. If table.f4 was already visited, then table.f4_TSM0 is considered a duplicate of `table.f4. This PR corrects this incorrect behavior.

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

The logic that determined a potential filename conflict contained a
flaw. This commit fixes that.
@mr-c
Copy link
Contributor

mr-c commented Jun 14, 2022

Thanks!

I wonder how we can create a unit test for this...

@gmloose
Copy link
Contributor Author

gmloose commented Jun 14, 2022

I'm still trying to reproduce it in a small example. It looks like I'm making some progress. So hold on ...

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it ought to work. I've pulled it in for testing.

It would be nice to have a test, but we shouldn't not fix the bug if we don't get one.

Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too. Thanks for submitting the fix.

Added a test that verifies that the fix in DataBiosphere#4101 avoids incorrect filename conflict resolution.
…oose/toil into issues/4101-wrong-filename-in-output
@gmloose
Copy link
Contributor Author

gmloose commented Jun 15, 2022

I added a test that seem to properly work. It fails on current master and passes on this branch. I tested it on two different systems in a virtualenv using Python 3.8

@mr-c
Copy link
Contributor

mr-c commented Jun 15, 2022

I added a test that seem to properly work. It fails on current master and passes on this branch. I tested it on two different systems in a virtualenv using Python 3.8

Thanks! I've pushed that fix up for testing.

@mr-c mr-c enabled auto-merge (squash) June 15, 2022 09:45
@mr-c mr-c merged commit 048a5e1 into DataBiosphere:master Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants