Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH+BF: get_parent_paths - make / into sep option and consistently use "/" as path separator #6963

Merged
merged 3 commits into from
Aug 20, 2022

Conversation

yarikoptic
Copy link
Member

Code review of this function came up while reviewing the
#6942 and where get_parent_paths,
which was pretty much "avoided" while resolving infinite recursion of
get_submodules_ -> get_content_info -> get_submodules_, was re-introduced
specifically within get_submodules_ .

BF: use consistently "sep" instead of / for check within the loop, and then
os.sep to check if points to up/down folders.

ENH:

  • sep became an option so we could use it also with os.sep if so desired
    on paths which are not POSIXed. By default it would remain "POSIX"
    (although checks were not checking for the POSIX endings before).

  • The test became parametric to test for various scenarios of using \ vs /
    vs "the default" (/ - POSIX).

…e / as path separator

Code review of this function came up while reviewing the
datalad#6942 and where get_parent_paths,
which was pretty much "avoided" while resolving infinite recursion of
`get_submodules_ -> get_content_info -> get_submodules_`, was re-introduced
specifically within get_submodules_ .

BF: use consistently "sep" instead of / for check within the loop, and then
os.sep to check if points to up/down folders.

ENH:

- sep became an option so we could use it also with os.sep if so desired
  on paths which are not POSIXed.  By default it would remain "POSIX"
  (although checks were not checking for the POSIX endings before).

- The test became parametric to test for various scenarios of using \ vs /
  vs "the default" (/ - POSIX).
@yarikoptic yarikoptic added the semver-patch Increment the patch version when merged label Aug 18, 2022
… get_parent_paths

get_submodules_ returns native paths within "path" which makes sense. But then
we take back relative path and not convert back to POSIX. Now we would!
@yarikoptic
Copy link
Member Author

hm, this

..\datalad\support\gitrepo.py:2738: in get_content_info
    path_strs = get_parent_paths(path_strs, submodules)
..\datalad\support\path.py:191: in get_parent_paths
    _get_parent_paths_check(path, sep, asep)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
path = 'lvl1\\file1', sep = '/', asep = '\\'
    def _get_parent_paths_check(path, sep, asep):
        """A little helper for get_parent_paths"""
        if isabs(path) or path.startswith(pardir + sep) or path.startswith(curdir + sep):
            raise ValueError("Expected relative within directory paths, got %r" % path)
        if sep+sep in path:
            raise ValueError(f"Expected normalized paths, got {path} containing '{sep+sep}'")
        if asep in path:
>           raise ValueError(f"Expected paths with {sep} as separator, got {path} containing '{asep}'")
E           ValueError: Expected paths with / as separator, got lvl1\file1 containing '\'
..\datalad\support\path.py:217: ValueError
============================== warnings summary ===============================

is odd since paths should be posix paths there, unless string a PurePosixPath produces native one... I will finally upgrade my windows VM tomorrow ;) interestingly that it errors test_copy_file_into_dshierarchy!

Apparently (now to me) pathlib.*PosixPath() does not do "conversion"
from native form to POSIX - it would happily preserve \ on Windows
within POSIX filename.  As a result we would still be passing through
Windows paths down.  PurePath.as_posix() is the one we need to do
such conversion, so use it for paths and submodules paths (come out
native from get_submodules_()).

Also do not overload paths and then create some path_strs -- place
into a more clearly named posix_paths variable.
@codecov
Copy link

codecov bot commented Aug 19, 2022

Codecov Report

Merging #6963 (b3d3bcb) into maint (4ecffe8) will increase coverage by 0.95%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            maint    #6963      +/-   ##
==========================================
+ Coverage   89.97%   90.93%   +0.95%     
==========================================
  Files         354      354              
  Lines       46261    46517     +256     
==========================================
+ Hits        41624    42299     +675     
+ Misses       4637     4218     -419     
Impacted Files Coverage Δ
datalad/support/gitrepo.py 92.27% <100.00%> (ø)
datalad/support/path.py 95.40% <100.00%> (+0.28%) ⬆️
datalad/support/tests/test_path.py 98.24% <100.00%> (+0.80%) ⬆️
datalad/_version.py 45.68% <0.00%> (-0.28%) ⬇️
datalad/distributed/tests/test_drop.py 100.00% <0.00%> (ø)
datalad/distributed/drop.py 97.70% <0.00%> (+0.02%) ⬆️
datalad/distribution/tests/test_create_sibling.py 80.47% <0.00%> (+0.81%) ⬆️
datalad/tests/utils.py 65.10% <0.00%> (+11.03%) ⬆️
datalad/tests/test_tests_utils.py 92.34% <0.00%> (+92.34%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@yarikoptic yarikoptic requested a review from mih August 20, 2022 10:46
@yarikoptic
Copy link
Member Author

Woohoo, it is ready to please windows ppl

@mih
Copy link
Member

mih commented Aug 20, 2022

One thing upfront: This change should make the code no worse than it was before. It should be fine to merge if that is the goal.

In general I very much dislike the idea of fixing up any path processing helper in datalad that needs to take Path objects, turns them into to strings for processing. The way forward should be to unconditionally replace them with proper pathlib use.

We have been hit by countless "there is more to path-processing-than-replacing-slash-with-backslash" issues over the years. Also this code is unlikely to properly handle encoding and other handling of unsupported chars etc.

@yarikoptic
Copy link
Member Author

I hear you @mih and would welcome RF/BF which would introduce proper use/manipulation of pathlib objects instead of path strings. It was not the goal of this PR though which was initially to fix inconsistency thus making it work properly on Windows, then due to added extra checks detected additional issues with taken approach and those were fixed. It indeed should not make code worse from what I see, and it should make DataLad perform better on Windows.

Overall I will take your comment as blessing then and will proceed with the merge.

@yarikoptic yarikoptic merged commit 496e5e3 into datalad:maint Aug 20, 2022
@yarikoptic yarikoptic deleted the bf-parent-paths branch October 14, 2022 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants