-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: Ensure that datalad urls are posix #7183
Conversation
Codecov ReportBase: 88.70% // Head: 88.70% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## maint #7183 +/- ##
=======================================
Coverage 88.70% 88.70%
=======================================
Files 355 355
Lines 46672 46672
Branches 6352 6352
=======================================
Hits 41400 41400
Misses 5257 5257
Partials 15 15 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
datalad/core/distributed/clone.py
Outdated
@@ -385,6 +386,8 @@ def __call__( | |||
# from git default behavior WRT a submodule's name vs | |||
# its path when we made this a new subdataset. | |||
subds_name = path.relative_to(ds.pathobj) | |||
# ensure posix url, relative Windows paths would be butchered | |||
source = Path(source).as_posix() if not source.startswith('ria') else source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpoldrack I would be interested to learn if you know a more elegant way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't even think ria
is an exclusion criterion here. Also: You are assuming source
to always be a path (what is Path('https://somewhere.org/blob')
supposed to do?)
I think the issue is with source
already. If this was the spot to fix it, we would have cloned from something with \
already. Which may work, but is not the intention really. So, in my view, source needs to be fixed before cloning if it happens to be a path. This may already happen somewhere in clone_dataset
. There should be something that determines the giturl
which is what is actually used and it's part of some dict (something_props
?). I guess, this is what is supposed to be used here and possibly fixed for windows paths before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: You are assuming source to always be a path (what is Path('https://somewhere.org/blob') supposed to do?)
I believe pathlib actually works well for non-paths - at least the cases I could think of. E.g., consider
>>>from pathlib import Path
>>>Path('https://somewhere.org/blob').as_posix()
'https:/somewhere.org/blob'
>>>Path('git@github.com:datalad/datalad.git').as_posix()
'git@github.com:datalad/datalad.git'
>>>Path('adina@juseless.inm7.de:/home/scratch/weee').as_posix()
'adina@juseless.inm7.de:/home/scratch/weee'
What else is a potential clone source beyond those URLs, local paths, and ria URLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe pathlib actually works well for non-paths
Well, yes - I think b/c at least on unix such URLs are actually valid paths. Did you try that on Windows? May be it works, too, in that Path
would become a PurePosixPath
or something, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re ria
: The check with startswith
is fine, the question is: What to do then? I think, the same should apply, since everything after the ria prefix is just a URL:
if source.startswith('ria'):
source = 'ria+' + Path(source[4:]).as_posix()
else:
source = Path(source).as_posix()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try that on Windows? May be it works, too, in that
Path
would become aPurePosixPath
or something, but I'm not sure.
This is on Windows 11 with Python 3.10
In [1]: from pathlib import Path
In [2]: Path('https://somewhere.org/blob').as_posix()
Out[2]: 'https:/somewhere.org/blob'
In [3]: Path('git@github.com:datalad/datalad.git').as_posix()
Out[3]: 'git@github.com:datalad/datalad.git'
In [4]: Path('adina@juseless.inm7.de:/home/scratch/weee').as_posix()
Out[4]: 'adina@juseless.inm7.de:/home/scratch/weee'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding a bit of trial-and-error on Windows: It seems that in order to be able to create a RIA store on Windows, the URL has to be Posix (and can never contain the drive (e.g., C:
)
C:\Users\mih\TMP\someds>datalad create-sibling-ria -s new --new-store-ok ria+file:/Users/mih/TMP/somestore
[INFO] create siblings 'new' and 'new-storage' ...
[INFO] Fetching updates for Dataset("C:\Users\mih\TMP\someds")
update(ok): . (dataset)
update(ok): . (dataset)
[INFO] Configure additional publication dependency on "new-storage"
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): C:\Users\mih\TMP\someds (dataset)
action summary:
configure-sibling (ok: 1)
create-sibling-ria (ok: 1)
update (ok: 1)
0.00 [00:04, ?/s]
C:\Users\mih\TMP\someds>git remote -v
new \Users\mih\TMP\somestore\7b9\5bb8c-750f-4e2e-8e91-a083af8b2b13 (fetch)
new \Users\mih\TMP\somestore\7b9\5bb8c-750f-4e2e-8e91-a083af8b2b13 (push)
new-storage
store \Users\mih\TMP\RIA\7b9\5bb8c-750f-4e2e-8e91-a083af8b2b13 (fetch)
store \Users\mih\TMP\RIA\7b9\5bb8c-750f-4e2e-8e91-a083af8b2b13 (push)
store-storage
C:\Users\mih\TMP\someds>datalad create-sibling-ria -s new2 --new-store-ok ria+file:\Users\mih\TMP\someotherstore
[INFO] create siblings 'new2' and 'new2-storage' ...
create-sibling-ria(error): C:\Users\mih\TMP\someds (dataset) [initremote failed.
stdout: initremote new2-storage
failed
stderr: git-annex: Non-absolute object tree base path configuration: \Users\mih\TMP\someotherstore
initremote: 1 failed
]
0.00 [00:03, ?/s]
Cloning from a RIA store (into a superdataset) works with Posix-like paths and also with non posix-like paths., and requires the drive name.
C:\Users\mih\TMP>datalad clone ria+file:///C:/Users/mih/TMP/somestore#7b95bb8c-750f-4e2e-8e91-a083af8b2b13 riaclone
[INFO] Detected a filesystem without fifo support.
[INFO] Disabling ssh connection caching.
[INFO] Detected a crippled filesystem.
[INFO] Disabling core.symlinks.
[INFO] Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
[INFO] Switched to branch 'adjusted/master(unlocked)'
install(ok): C:\Users\mih\TMP\riaclone (dataset)
C:\Users\mih\TMP>datalad clone ria+file:///C:\Users\mih\TMP\somestore#7b95bb8c-750f-4e2e-8e91-a083af8b2b13 riaclone2
[INFO] Detected a filesystem without fifo support.
[INFO] Disabling ssh connection caching.
[INFO] Detected a crippled filesystem.
[INFO] Disabling core.symlinks.
[INFO] Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
[INFO] Switched to branch 'adjusted/master(unlocked)'
install(ok): C:\Users\mih\TMP\riaclone2 (dataset)
In almost akl cases, the recorded URLs are quite butchered:
[submodule "riasubds2"]
path = riasubds2
url = file:///C%3A%5CUsers%5Cmih%5CTMP%5Csomestore/7b9/5bb8c-750f-4e2e-8e91-a083af8b2b13
datalad-id = 7b95bb8c-750f-4e2e-8e91-a083af8b2b13
datalad-url = "ria+file:///C:\\Users\\mih\\TMP\\somestore#7b95bb8c-750f-4e2e-8e91-a083af8b2b13"
[submodule "riaclone3"]
path = riaclone3
url = file:///C%3A/Users/mih/TMP/somestore/7b9/5bb8c-750f-4e2e-8e91-a083af8b2b13
datalad-id = 7b95bb8c-750f-4e2e-8e91-a083af8b2b13
datalad-url = "ria+file:///C:/Users/mih/TMP/somestore#7b95bb8c-750f-4e2e-8e91-a083af8b2b13"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: Using file:
URLs on windows is a story on its own. That's unrelated to what's going on here. If you want to try create-sibling-ria
with such a URL, it should be generated by datalad.support.network.get_local_file_url(.., compatibility='git')
and then prefixed with ria+
. That's not nice (and may still be buggy), but I don't see exactly how create-sibling-ria
is supposed to fix a given URL that is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I reconsidered this entirely. Looking into the issue, the purpose of datalad-url and the commit introducing it (6c8669e), I figure:
It's supposed to store the original URL given in order for later clone calls (via get) to be able to trigger the same treatment (considering ria, url rewrites, etc.). The value is not used by git-clone (that's what url
is for), but by datalad get
.
This means, I think, that URLs should remain unmodified (If there's a problem with a file:
URL especially on windows, that's a different story needing a different solution).
For paths, things are arguably different. Obviously, what's currently stored is invalid and would need fixing indeed. But: Is making it POSIX-y the correct thing to do? It's then not even recognizable as a windows path anymore. Meaning: If there was a clone routine (possibly via patch) that would actually need to be triggered again, should it then be deprived of the information that this was a windows path (which may be the very info relevant)? So, technically correct would likely be to store a valid windows path rather than a POSIX path.
On the other hand, there's nothing in clone I am aware of that would actually trigger any sort of rewrite with a local path as source. In addition, somewhat "secretly" committing local paths is a questionable thing in the first place.
In conclusion, I am currently thinking of changing this patch to simply not store anything in case source
is a path rather than a URL.
WDYT, @datalad/developers?
Thanks for looking into it more.
I don't feel like I know enough about the code that may use this URL, but POSIX-y is at least git clone's own convention, regardless of operating system: https://www.git-scm.com/docs/git-clone#_git_urls
Then maybe pathlibs
I'm not sure I reach the same conclusion. I remember trouble with old datasets without a |
Yes, but this is the irrelevant part here.
It's not currently the case (only actual use in However, I am not entirely sure not storing it is the best solution, but I also don't see how storing something that deviates from what was given to So, to me it's either
|
Verdict from call: Do go with storing the POSIX path. I'll push a minor change and go with it. |
Slight change and squashed. Are you OK with this, @adswa ? |
Late to the party, but nevertheless wanted to mention |
Doesn't work with relative paths, though, @mih. And storing absolute ones seems wrong to me. |
PR released in |
Fixes #7182
If the source in a clone command is a windows path, the resulting
datalad_url
in the .gitmodules file of the superdataset contains it verbatim, including escaped backslashes. Ensuring that it is posix makes it functional, and compliant to git's url format.