-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: Fix clone relpath #4026
BF: Fix clone relpath #4026
Conversation
don't revert the original fix as suggested by e8c5d70 (parent commit), but fix it. Keeping the revert commit, however, to include its added test
Codecov Report
@@ Coverage Diff @@
## master #4026 +/- ##
==========================================
- Coverage 89.74% 89.43% -0.31%
==========================================
Files 272 272
Lines 36476 36543 +67
==========================================
- Hits 32735 32682 -53
- Misses 3741 3861 +120
Continue to review full report at Codecov.
|
FTR: Codecov doesn't make sense really. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work. Thx! I made to comments that need fixing to make things work on windows too. I am not sure why the tests don't fail. Maybe the functions are more clever than they should be, but I think it is worth taking care of this ourselves.
datalad/support/gitrepo.py
Outdated
if op.isabs(git_url): | ||
# ... and make it a relative one | ||
# Note: Using os.path here, since pathlib's relative_to isn't what you'd expect | ||
git_url = op.relpath(git_url, gr.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work in general on windows. git_url
will always be posix, gr.path
will always be native. So we should like use
posixpath.relpath(git_url, gr.pathobj.as_posix())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Thx for catching.
datalad/support/gitrepo.py
Outdated
# ... and make it a relative one | ||
# Note: Using os.path here, since pathlib's relative_to isn't what you'd expect | ||
git_url = op.relpath(git_url, gr.path) | ||
path = Path(git_url) | ||
# always in POSIX even on windows | ||
path = path.as_posix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my comment above is implemented, this conversion to and from Path is superfluous.
... even on Windows. Therefore, relpath would mix windows-paths with posix-paths. Fix this, which also renders later conversion to POSIX path superfluous.
FTR: Current appveyor failure is about a failed request for codecov. Tests themselves seem to have passed. |
Verdict in conference call was to go with #4025 to stop undesired behavior quickly, and think this through some more. |
Sorry about that. AFAIK I haven't changed anything on my side. Perhaps this is a change on github's end because I just saw a failure when tried to push to your branch for this pr:
patchFrom 3c3f7ad95cf6a209c80f07a2339b2c0de4c216d4 Mon Sep 17 00:00:00 2001
From: Kyle Meyer <kyle@kyleam.com>
Date: Wed, 15 Jan 2020 10:55:53 -0500
Subject: [PATCH] TST: clone: Reenable test_relative_submodule_url
This was marked as a known failure in e8c5d7069 (BF: clone: Revert
incorrect relative path adjustment to URLs, gh-4025), but the previous
two commits should make it pass.
---
datalad/core/distributed/tests/test_clone.py | 1 -
1 file changed, 1 deletion(-)
diff --git a/datalad/core/distributed/tests/test_clone.py b/datalad/core/distributed/tests/test_clone.py
index 9f86469ad..404bffd5e 100644
--- a/datalad/core/distributed/tests/test_clone.py
+++ b/datalad/core/distributed/tests/test_clone.py
@@ -487,7 +487,6 @@ def test_cfg_originorigin(path):
# test fix for gh-2601/gh-3538
-@known_failure
@with_tempfile()
def test_relative_submodule_url(path):
Dataset(op.join(path, 'origin')).create()
--
2.24.1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpoldrack Thanks for working on the alternative fix. Your approach---taking the configured URL relative to the repository (when a relative path was given by the caller)---makes sense to me, and I haven't come up with a way to break it.
I mentioned on the call (and in e8c5d70) that there are scenarios where sticking with the absolute path would mean that a repo could be relocated without breakage. But I doubt that's relied on much in practice, while we know that rewriting with relative paths would make it possible to move dataset hierarchies that are created with common/promoted approaches. So I think it's reasonable to give the relative path treatment priority, and callers that want the original behavior can get it by explicitly passing absolute paths.
@@ -487,15 +487,37 @@ def test_cfg_originorigin(path): | |||
|
|||
|
|||
# test fix for gh-2601/gh-3538 | |||
@known_failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your changes, this @known_failure
should be dropped. (I tried to push that change, but like you I'm seeing permission issues when trying to push to other people's branches.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, indeed. Thx.
datalad/support/gitrepo.py
Outdated
@@ -937,14 +937,18 @@ def clone(cls, url, path, *args, **kwargs): | |||
# make sure that Git doesn't mangle relative path specification into | |||
# mildly obscure absolute paths | |||
# https://github.com/datalad/datalad/issues/3538 | |||
# Note, that POSIX is required even on windows, since it's an URL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ minor question about this comment, mostly for my own curiosity/understanding ]
Is it true that posix is required because git sees this as a URL? Or is it just because that this is how git would represent windows paths regardless? I have very limited understanding of how git handles windows paths, but from bits like
datalad/datalad/support/gitrepo.py
Lines 3166 to 3170 in a210266
# path matching will happen against what Git reports | |
# and Git always reports POSIX paths | |
# any incoming path has to be relative already, so we can simply | |
# convert unconditionally | |
paths = [ut.PurePosixPath(p) for p in paths] |
my impression was that git represented windows paths as posix paths underneath, converting at some more outer layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should prob. reword that comment. AFAIK it's not technically "required" (I think git can deal with actual system paths), but the standard way of git to represent it. So, we need to expect posix path being reported by git and we want to write posix path to be consistent. Effectively that makes it a "requirement" for us from my point of view. Otherwise we tend to shoot our feet by mixing.
datalad/support/gitrepo.py
Outdated
# mildly obscure absolute paths | ||
# https://github.com/datalad/datalad/issues/3538 | ||
if isinstance(url_ri, PathRI): | ||
path = Path(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is inherited from 02e2b4c (BF: Avoid relpath mangling for submodule url configuration, 2020-01-10), but I'm not a fan of using path
as the variable name. The two main parameters for this method are url
and path
, so I find it confusing to redefine path
with a value that is derived from url
. Perhaps url_path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. url_path
is fine.
# Note: Not sure, whether there are circumstances where this is relative already | ||
if op.isabs(git_url): | ||
# ... and make it a relative one | ||
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on the call, I'm a bit worried that pathlib is avoiding relpath's more extensive behavior for some edge case where it can behave incorrectly. But I can't come up with what that would be, and looking at pathlib's docs and source doesn't reveal any hints, so that worry is probably unfounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there has to be a reason for pathlib to provide such limited functionality with relative_to
. But I can't come up with a scenario where os.path.relpath
wouldn't work here either. May be there's a way to break it, when the path to the repository (our starting point for the relative path) would contain symlinks. But that doesn't apply here, since GitRepo.path
is realpath'd already.
datalad/support/gitrepo.py
Outdated
if op.isabs(git_url): | ||
# ... and make it a relative one | ||
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect | ||
git_url = posixpath.relpath(git_url, gr.pathobj.as_posix()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be worth having a debug level message here reporting the remapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will do.
datalad/support/gitrepo.py
Outdated
# ... and make it a relative one | ||
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect | ||
git_url = posixpath.relpath(git_url, gr.pathobj.as_posix()) | ||
gr.config.set('remote.origin.url', git_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the op.isabs(git_url)
condition is false, the url hasn't changed from the configured one, so we might as well avoid writing it out again with .set
(i.e. move this call under the condition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
- reenable test test_clone.py:test_relative_submodule_url - don't rewrite unchanged origin.url - add debug message - don't confuse 'path' parameter to clone function and variable to represent origin.url as a Path
Thanks much, @kyleam ! |
@kyleam: Yes, saw that. Not completely clear to me yet, what's going on. But I have an idea ... |
1659057
to
2b1c667
Compare
FTR: test in question passes on a native windows box:
Trying to figure what's different with that Github Action thingy. |
Ok. At least I found the problem. So, we need to find out how |
2b1c667
to
251dd51
Compare
251dd51
to
555764e
Compare
As of 367454e (NF: Auto-configure local origin of local origin to make annex available, 2019-12-29), we check whether origin's URL for a newly cloned annex dataset points to a local dataset. If it does, we add _that_ dataset's origin (if any) to the newly cloned dataset. And as of d2a25d0 (ENH: Make configuration of local origin siblings work recursively, 2020-01-05), we continue walking up the chain of local origin's until we hit an origin that is not a local path. However, we create the corresponding dataset instance with the wrong path when origin is a _relative_ path: it should be relative to the cloned repository, not the current working directory. The incorrect path means that there's no "origin of origin" to find, and an "origin-N" remote isn't added. Note that this bug isn't currently very accessible because `git clone` converts relative paths to absolute ones, and the first attempt to adjust these back to relative paths (02e2b4c and 0a80bb4) was reverted in e8c5d70 (BF: clone: Revert incorrect relative path adjustment to URLs, 2020-01-14). But relative paths will be a common case when the second attempt from dataladgh-4026 lands.
|
@bpoldrack This PR is still open. It feels as if that need not be the case. Can you please offer your assessment. Thx! |
6+ month of inactivity, no response from OP, closing. |
This change reincarnates parts of a changeset originally proposed in datalad#4026, that didn't make it in due to a failing Windows test that Ivnever saw (CI logs weren't preserved), but that I suspect to originate in datalad#7180 (Git stitching together posix and windows paths into a non-functional URL). Thus, this change sits on top of datalad#7181, which fixes these URLs to be fully posix. Based on those fixes, this change wrangles absolute URLs originating from relative paths back into relative paths, to keep them functional. This change also enables an old test for this problem, marked as a known failure. Fixes datalad#3538
Based on @kyleam 's #4025 (Couldn't push to your branch)
Don't actually revert, but (hopefully) fix the issue, by manipulating not the originally given URL, but the one git-clone created. Keeping the revert commit in for its added (and a fixed) test.