Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: Fix clone relpath #4026

Closed
wants to merge 5 commits into from
Closed

Conversation

bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Jan 15, 2020

Based on @kyleam 's #4025 (Couldn't push to your branch)

Don't actually revert, but (hopefully) fix the issue, by manipulating not the originally given URL, but the one git-clone created. Keeping the revert commit in for its added (and a fixed) test.

don't revert the original fix as suggested by e8c5d70 (parent commit), but fix it.
Keeping the revert commit, however, to include its added test
@codecov
Copy link

codecov bot commented Jan 15, 2020

Codecov Report

Merging #4026 into master will decrease coverage by 0.3%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4026      +/-   ##
==========================================
- Coverage   89.74%   89.43%   -0.31%     
==========================================
  Files         272      272              
  Lines       36476    36543      +67     
==========================================
- Hits        32735    32682      -53     
- Misses       3741     3861     +120
Impacted Files Coverage Δ
datalad/core/distributed/tests/test_clone.py 92.83% <ø> (+3.89%) ⬆️
datalad/support/gitrepo.py 89.4% <100%> (-0.34%) ⬇️
datalad/customremotes/base.py 69.25% <0%> (-14.89%) ⬇️
datalad/support/tests/test_cookies.py 85.71% <0%> (-14.29%) ⬇️
datalad/customremotes/tests/__init__.py 91.66% <0%> (-8.34%) ⬇️
datalad/log.py 82.69% <0%> (-7.22%) ⬇️
datalad/support/keyring_.py 84.44% <0%> (-6.67%) ⬇️
datalad/tests/test_base.py 96.55% <0%> (-3.45%) ⬇️
datalad/customremotes/tests/test_archives.py 86.27% <0%> (-3.27%) ⬇️
datalad/downloaders/http.py 72.11% <0%> (-2.79%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8c5d70...555764e. Read the comment docs.

@bpoldrack bpoldrack requested a review from kyleam January 15, 2020 09:49
@bpoldrack
Copy link
Member Author

FTR: Codecov doesn't make sense really.

@bpoldrack bpoldrack requested a review from mih January 15, 2020 09:50
Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work. Thx! I made to comments that need fixing to make things work on windows too. I am not sure why the tests don't fail. Maybe the functions are more clever than they should be, but I think it is worth taking care of this ourselves.

if op.isabs(git_url):
# ... and make it a relative one
# Note: Using os.path here, since pathlib's relative_to isn't what you'd expect
git_url = op.relpath(git_url, gr.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work in general on windows. git_url will always be posix, gr.path will always be native. So we should like use

posixpath.relpath(git_url, gr.pathobj.as_posix())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Thx for catching.

# ... and make it a relative one
# Note: Using os.path here, since pathlib's relative_to isn't what you'd expect
git_url = op.relpath(git_url, gr.path)
path = Path(git_url)
# always in POSIX even on windows
path = path.as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my comment above is implemented, this conversion to and from Path is superfluous.

... even on Windows. Therefore, relpath would mix windows-paths with posix-paths. Fix this, which also renders later conversion to POSIX path superfluous.
@bpoldrack
Copy link
Member Author

FTR: Current appveyor failure is about a failed request for codecov. Tests themselves seem to have passed.

@mih
Copy link
Member

mih commented Jan 15, 2020

Verdict in conference call was to go with #4025 to stop undesired behavior quickly, and think this through some more.

@kyleam
Copy link
Contributor

kyleam commented Jan 15, 2020

(Couldn't push to your branch)

Sorry about that. AFAIK I haven't changed anything on my side. Perhaps this is a change on github's end because I just saw a failure when tried to push to your branch for this pr:

1 git … push -v bpoldrack refs/heads/revert-clone-relpath\:refs/heads/revert-clone-relpath
Pushing to git@github.com:bpoldrack/datalad.git
Writing objects: 100% (7/7), 760 bytes | 253.00 KiB/s, done.
Total 7 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.        
To github.com:bpoldrack/datalad.git
 ! [remote rejected]     revert-clone-relpath -> revert-clone-relpath (permission denied)
error: failed to push some refs to 'git@github.com:bpoldrack/datalad.git'
patch
From 3c3f7ad95cf6a209c80f07a2339b2c0de4c216d4 Mon Sep 17 00:00:00 2001
From: Kyle Meyer <kyle@kyleam.com>
Date: Wed, 15 Jan 2020 10:55:53 -0500
Subject: [PATCH] TST: clone: Reenable test_relative_submodule_url

This was marked as a known failure in e8c5d7069 (BF: clone: Revert
incorrect relative path adjustment to URLs, gh-4025), but the previous
two commits should make it pass.
---
 datalad/core/distributed/tests/test_clone.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/datalad/core/distributed/tests/test_clone.py b/datalad/core/distributed/tests/test_clone.py
index 9f86469ad..404bffd5e 100644
--- a/datalad/core/distributed/tests/test_clone.py
+++ b/datalad/core/distributed/tests/test_clone.py
@@ -487,7 +487,6 @@ def test_cfg_originorigin(path):
 
 
 # test fix for gh-2601/gh-3538
-@known_failure
 @with_tempfile()
 def test_relative_submodule_url(path):
     Dataset(op.join(path, 'origin')).create()
-- 
2.24.1

@kyleam kyleam mentioned this pull request Jan 15, 2020
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpoldrack Thanks for working on the alternative fix. Your approach---taking the configured URL relative to the repository (when a relative path was given by the caller)---makes sense to me, and I haven't come up with a way to break it.

I mentioned on the call (and in e8c5d70) that there are scenarios where sticking with the absolute path would mean that a repo could be relocated without breakage. But I doubt that's relied on much in practice, while we know that rewriting with relative paths would make it possible to move dataset hierarchies that are created with common/promoted approaches. So I think it's reasonable to give the relative path treatment priority, and callers that want the original behavior can get it by explicitly passing absolute paths.

@@ -487,15 +487,37 @@ def test_cfg_originorigin(path):


# test fix for gh-2601/gh-3538
@known_failure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your changes, this @known_failure should be dropped. (I tried to push that change, but like you I'm seeing permission issues when trying to push to other people's branches.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, indeed. Thx.

@@ -937,14 +937,18 @@ def clone(cls, url, path, *args, **kwargs):
# make sure that Git doesn't mangle relative path specification into
# mildly obscure absolute paths
# https://github.com/datalad/datalad/issues/3538
# Note, that POSIX is required even on windows, since it's an URL!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ minor question about this comment, mostly for my own curiosity/understanding ]

Is it true that posix is required because git sees this as a URL? Or is it just because that this is how git would represent windows paths regardless? I have very limited understanding of how git handles windows paths, but from bits like

# path matching will happen against what Git reports
# and Git always reports POSIX paths
# any incoming path has to be relative already, so we can simply
# convert unconditionally
paths = [ut.PurePosixPath(p) for p in paths]

my impression was that git represented windows paths as posix paths underneath, converting at some more outer layer.

Copy link
Member Author

@bpoldrack bpoldrack Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should prob. reword that comment. AFAIK it's not technically "required" (I think git can deal with actual system paths), but the standard way of git to represent it. So, we need to expect posix path being reported by git and we want to write posix path to be consistent. Effectively that makes it a "requirement" for us from my point of view. Otherwise we tend to shoot our feet by mixing.

# mildly obscure absolute paths
# https://github.com/datalad/datalad/issues/3538
if isinstance(url_ri, PathRI):
path = Path(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is inherited from 02e2b4c (BF: Avoid relpath mangling for submodule url configuration, 2020-01-10), but I'm not a fan of using path as the variable name. The two main parameters for this method are url and path, so I find it confusing to redefine path with a value that is derived from url. Perhaps url_path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. url_path is fine.

# Note: Not sure, whether there are circumstances where this is relative already
if op.isabs(git_url):
# ... and make it a relative one
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on the call, I'm a bit worried that pathlib is avoiding relpath's more extensive behavior for some edge case where it can behave incorrectly. But I can't come up with what that would be, and looking at pathlib's docs and source doesn't reveal any hints, so that worry is probably unfounded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there has to be a reason for pathlib to provide such limited functionality with relative_to. But I can't come up with a scenario where os.path.relpath wouldn't work here either. May be there's a way to break it, when the path to the repository (our starting point for the relative path) would contain symlinks. But that doesn't apply here, since GitRepo.path is realpath'd already.

if op.isabs(git_url):
# ... and make it a relative one
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect
git_url = posixpath.relpath(git_url, gr.pathobj.as_posix())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be worth having a debug level message here reporting the remapping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will do.

# ... and make it a relative one
# Note: Using posixpath here, since pathlib's relative_to isn't what you'd expect
git_url = posixpath.relpath(git_url, gr.pathobj.as_posix())
gr.config.set('remote.origin.url', git_url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the op.isabs(git_url) condition is false, the url hasn't changed from the configured one, so we might as well avoid writing it out again with .set (i.e. move this call under the condition).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

- reenable test test_clone.py:test_relative_submodule_url
- don't rewrite unchanged origin.url
- add debug message
- don't confuse 'path' parameter to clone function and variable to represent origin.url as a Path
@bpoldrack
Copy link
Member Author

Thanks much, @kyleam !
I completely agree and pushed respective changes.

@kyleam
Copy link
Contributor

kyleam commented Jan 16, 2020

Thanks for the updates. I see that as of 109fd6c the new test still fails on the windows run. Once that's figured out, I think this is good to go.

@bpoldrack
Copy link
Member Author

@kyleam: Yes, saw that. Not completely clear to me yet, what's going on. But I have an idea ...

@bpoldrack bpoldrack force-pushed the revert-clone-relpath branch 4 times, most recently from 1659057 to 2b1c667 Compare January 17, 2020 08:56
@bpoldrack
Copy link
Member Author

FTR: test in question passes on a native windows box:

(base) C:\Users\mih\code\datalad>python -m nose -s -v datalad.core.distributed.tests.test_clone:test_relative_submodule_url
datalad.core.distributed.tests.test_clone.test_relative_submodule_url ...

XXXXXXXXXXX: submodule pathobj: C:/Users/mih/AppData/Local/Temp/datalad_temp_kbgqj8l1/ds/sources
ok
Versions: appdirs=1.4.3 boto=2.49.0 cmd:annex=7.20191107-g8ea269ef7 cmd:bundled-git=UNKNOWN cmd:git=2.23.0.windows.1 cmd:system-git=2.23.0.windows.1 cmd:system-ssh=8.0p1 git=3.0.2 gitdb=2.0.5 humanize=0.5.1 iso8601=0.1.12 keyring=19.1.0 keyrings.alt=3.1.1 msgpack=0.6.1 requests=2.22.0 tqdm=4.32.1 wrapt=1.11.2
Obscure filename: str=b' ;abc' repr=' ;abc'
Encodings: default='utf-8' filesystem='utf-8' locale.prefered='cp1252'
Environment: PATH='C:\\Users\\mih\\Miniconda3;C:\\Users\\mih\\Miniconda3\\Library\\mingw-w64\\bin;C:\\Users\\mih\\Miniconda3\\Library\\usr\\bin;C:\\Users\\mih\\Miniconda3\\Library\\bin;C:\\Users\\mih\\Miniconda3\\Scripts;C:\\Users\\mih\\Miniconda3\\bin;C:\\Users\\mih\\Miniconda3\\condabin;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\iCLS;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\iCLS;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0;C:\\WINDOWS\\System32\\OpenSSH;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\TortoiseGit\\bin;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\iCLS;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\iCLS;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0;C:\\WINDOWS\\System32\\OpenSSH;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\TortoiseGit\\bin;C:\\Users\\mih\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\mih\\Miniconda3\\Library\\mingw64\\bin;C:\\Program Files (x86)\\Midnight Commander;C:\\Program Files (x86)\\7-Zip;.'

----------------------------------------------------------------------
Ran 1 test in 14.796s

OK

Trying to figure what's different with that Github Action thingy.

@bpoldrack
Copy link
Member Author

bpoldrack commented Jan 17, 2020

Ok. At least I found the problem.
In the environment of that Github Action, GitRepo.pathobj.as_posix() reports C:/Users/RUNNER~1/AppData/Local/Temp/datalad_temp__ey2dkku/ds/sources as the submodules root dir, whereas the path we are trying to make relative is represented with the long name runneradmin instead of RUNNER~1. Therefore relpath has to go up to C:/Users. While the resulting relative path is technically correct and a fetch still works, this is not an actually useful relative path.

So, we need to find out how GitRepo can be taught to use the long names (which should work under any remotely recent windows). This might well solve other issues as well.

kyleam added a commit to kyleam/datalad that referenced this pull request Jan 17, 2020
As of 367454e (NF: Auto-configure local origin of local origin to
make annex available, 2019-12-29), we check whether origin's URL for a
newly cloned annex dataset points to a local dataset.  If it does, we
add _that_ dataset's origin (if any) to the newly cloned dataset.  And
as of d2a25d0 (ENH: Make configuration of local origin siblings work
recursively, 2020-01-05), we continue walking up the chain of local
origin's until we hit an origin that is not a local path.

However, we create the corresponding dataset instance with the wrong
path when origin is a _relative_ path: it should be relative to the
cloned repository, not the current working directory.  The incorrect
path means that there's no "origin of origin" to find, and an
"origin-N" remote isn't added.

Note that this bug isn't currently very accessible because `git clone`
converts relative paths to absolute ones, and the first attempt to
adjust these back to relative paths (02e2b4c and 0a80bb4) was
reverted in e8c5d70 (BF: clone: Revert incorrect relative path
adjustment to URLs, 2020-01-14).  But relative paths will be a common
case when the second attempt from dataladgh-4026 lands.
@mih
Copy link
Member

mih commented Jan 20, 2020

So, we need to find out how GitRepo can be taught to use the long names (which should work under any remotely recent windows). This might well solve other issues as well.

#4054 (comment)

@yarikoptic yarikoptic modified the milestones: 0.12.3, 0.12.x Feb 20, 2020
@mih
Copy link
Member

mih commented May 16, 2020

@bpoldrack This PR is still open. It feels as if that need not be the case. Can you please offer your assessment. Thx!

@mih
Copy link
Member

mih commented Oct 4, 2020

6+ month of inactivity, no response from OP, closing.

@mih mih closed this Oct 4, 2020
adswa added a commit to adswa/datalad that referenced this pull request Nov 17, 2022
This change reincarnates parts of a changeset originally proposed in
datalad#4026, that didn't make it in due
to a failing Windows test that Ivnever saw (CI logs weren't preserved),
but that I suspect to originate
in datalad#7180 (Git stitching together posix and windows paths into a
non-functional URL). Thus, this change sits on top of datalad#7181, which fixes
these URLs to be fully posix.
Based on those fixes, this change wrangles absolute URLs originating
from relative paths back into relative paths, to keep them functional.
This change also enables an old test for this problem, marked as a known
failure.

Fixes datalad#3538
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants