New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: datalad-url in .gitmodules #5346
Conversation
Record original URL when creating a subdataset by cloning and let `get` consider it before submodule's url. This makes a difference, if the datalad URL contains pieces to be interpreted by datalad rather than git like a "ria+" prefix, which is supposed to trigger post clone routines. (Closes datalad#5256)
ba15bc9
to
6c8669e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me, and things seemed to be working as expected when I stepped through the (perhaps convoluted) script below.
script
set -eu
cd "$(mktemp -d "${TMPDIR:-/tmp}"/dl-XXXXXXX)"
ria_path="$(pwd)"/ria
datalad create a
(
cd a
datalad create-sibling-ria --no-storage-sibling -s ria \
ria+file:"$ria_path"
datalad push --to=ria --data=nothing
)
ds_id=$(git config --file a/.datalad/config datalad.dataset.id)
datalad create b
(
cd b
datalad clone -d. ria+file:"$ria_path#$ds_id" sub
datalad uninstall sub
datalad get sub
)
Thx, @kyleam ! I'm not entirely sure yet. Main issue is priority, I think. I set it to 590 (cost) to be just in front of anything that would consider the "git-url" in |
@kyleam :
The final |
I wasn't trying to construct a case that wouldn't work without these changes. I was using the script to step through the new code and see whether it worked as I expected.
It did, but that doesn't mean that, with this PR, the new source isn't used. It is: diff --git a/datalad/core/distributed/clone.py b/datalad/core/distributed/clone.py
index 927a4e5e3b..60ee8d5a04 100644
--- a/datalad/core/distributed/clone.py
+++ b/datalad/core/distributed/clone.py
@@ -440,7 +440,9 @@ def clone_dataset(
unit=' Candidate locations',
)
error_msgs = OrderedDict() # accumulate all error messages formatted per each url
+ from pprint import pprint
for cand in candidate_sources:
+ pprint(cand)
log_progress(
lgr.info,
'cloneds',
|
Codecov Report
@@ Coverage Diff @@
## master #5346 +/- ##
===========================================
+ Coverage 59.64% 89.88% +30.24%
===========================================
Files 129 300 +171
Lines 19045 42081 +23036
===========================================
+ Hits 11360 37826 +26466
+ Misses 7685 4255 -3430
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM too. I also checked whether it resolves my original issue, and it does. Thx!
Record original URL when creating a subdataset by cloning and let
get
consider it before submodule's url. This makes a difference, if the
datalad URL contains pieces to be interpreted by datalad rather than git
like a "ria+" prefix, which is supposed to trigger post clone routines.
Closes #5256