Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: datalad-url in .gitmodules #5346

Merged
merged 2 commits into from Jan 21, 2021
Merged

NF: datalad-url in .gitmodules #5346

merged 2 commits into from Jan 21, 2021

Conversation

bpoldrack
Copy link
Member

Record original URL when creating a subdataset by cloning and let get
consider it before submodule's url. This makes a difference, if the
datalad URL contains pieces to be interpreted by datalad rather than git
like a "ria+" prefix, which is supposed to trigger post clone routines.

Closes #5256

Record original URL when creating a subdataset by cloning and let `get`
consider it before submodule's url. This makes a difference, if the
datalad URL contains pieces to be interpreted by datalad rather than git
like a "ria+" prefix, which is supposed to trigger post clone routines.

(Closes datalad#5256)
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me, and things seemed to be working as expected when I stepped through the (perhaps convoluted) script below.

script
set -eu

cd "$(mktemp -d "${TMPDIR:-/tmp}"/dl-XXXXXXX)"

ria_path="$(pwd)"/ria
datalad create a
(
    cd a
    datalad create-sibling-ria --no-storage-sibling -s ria \
            ria+file:"$ria_path"
    datalad push --to=ria --data=nothing
)
ds_id=$(git config --file a/.datalad/config datalad.dataset.id)

datalad create b
(
    cd b
    datalad clone -d. ria+file:"$ria_path#$ds_id" sub
    datalad uninstall sub
    datalad get sub
)

@bpoldrack
Copy link
Member Author

Thx, @kyleam !

I'm not entirely sure yet. Main issue is priority, I think. I set it to 590 (cost) to be just in front of anything that would consider the "git-url" in .gitmodules. However, that might be insufficient as _get_flexible_source_candidates_for_submodule in get checks the superdataset's remotes first and could consider relative paths to those with higher priority. This would defy the purpose of triggering postclone-configs etc. based on the datalad URL, since the base URL in those cases is just the URL of that git remote. _get_flexible_source_candidates_for_submodule might need some more thoughts on that.
Do you have thoughts on that, @mih ?

@bpoldrack
Copy link
Member Author

bpoldrack commented Jan 20, 2021

@kyleam :

the (perhaps convoluted) script below.

The final get sub in your script is actually insufficient to prove the point. That should have worked before, since there is a valid URL in .gitmodules to clone from. But since that one is the resolved one, it lacks the ria+ and wouldn't trigger a possible re-configuration of an associated special remote.

@kyleam
Copy link
Contributor

kyleam commented Jan 20, 2021

The final get sub in your script is actually insufficient to prove the point.

I wasn't trying to construct a case that wouldn't work without these changes. I was using the script to step through the new code and see whether it worked as I expected.

That should have worked before [...]

It did, but that doesn't mean that, with this PR, the new source isn't used. It is:

diff --git a/datalad/core/distributed/clone.py b/datalad/core/distributed/clone.py
index 927a4e5e3b..60ee8d5a04 100644
--- a/datalad/core/distributed/clone.py
+++ b/datalad/core/distributed/clone.py
@@ -440,7 +440,9 @@ def clone_dataset(
         unit=' Candidate locations',
     )
     error_msgs = OrderedDict()  # accumulate all error messages formatted per each url
+    from pprint import pprint
     for cand in candidate_sources:
+        pprint(cand)
         log_progress(
             lgr.info,
             'cloneds',
[...]
[INFO] Cloning dataset to Dataset(/tmp/dl-xlCQKFW/b/sub) 
[INFO] Attempting to clone from file:///tmp/dl-xlCQKFW/ria/66d/4876a-21a3-4495-8b52-21f04aedc7a2 to /tmp/dl-xlCQKFW/b/sub 
[INFO] Completed clone attempts for Dataset(/tmp/dl-xlCQKFW/b/sub) 
[INFO] Scanning for unlocked files (this may take some time) 
[INFO] warning: Not setting branch master as its own upstream. 
{'default_destpath': '66d4876a-21a3-4495-8b52-21f04aedc7a2',
 'giturl': 'file:///tmp/dl-xlCQKFW/ria/66d/4876a-21a3-4495-8b52-21f04aedc7a2',
 'source': 'ria+file:/tmp/dl-xlCQKFW/ria#66d4876a-21a3-4495-8b52-21f04aedc7a2',
 'type': 'ria',
 'version': None}
install(ok): /tmp/dl-xlCQKFW/b/sub (dataset) [Installed subdataset in order to get /tmp/dl-xlCQKFW/b/sub]

@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #5346 (8b1c01d) into master (58ae70a) will increase coverage by 30.24%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5346       +/-   ##
===========================================
+ Coverage   59.64%   89.88%   +30.24%     
===========================================
  Files         129      300      +171     
  Lines       19045    42081    +23036     
===========================================
+ Hits        11360    37826    +26466     
+ Misses       7685     4255     -3430     
Impacted Files Coverage Δ
datalad/local/subdatasets.py 96.52% <ø> (+7.82%) ⬆️
datalad/core/distributed/clone.py 89.51% <100.00%> (+3.53%) ⬆️
datalad/core/distributed/tests/test_clone.py 96.81% <100.00%> (ø)
datalad/distribution/get.py 97.90% <100.00%> (+11.04%) ⬆️
datalad/support/digests.py 100.00% <0.00%> (ø)
datalad/support/exceptions.py 82.67% <0.00%> (ø)
...ad/distributed/tests/test_create_sibling_gitlab.py 100.00% <0.00%> (ø)
datalad/interface/tests/test_rerun_merges.py 100.00% <0.00%> (ø)
datalad/distribution/tests/test_install.py 99.79% <0.00%> (ø)
tools/coverage-bin/git-annex-remote-datalad 100.00% <0.00%> (ø)
... and 262 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58ae70a...8b1c01d. Read the comment docs.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM too. I also checked whether it resolves my original issue, and it does. Thx!

@mih mih merged commit a94cf41 into datalad:master Jan 21, 2021
@mih mih deleted the nf-datalad-url branch January 21, 2021 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get of a subdataset originally clones from ria+ url is differently configured than originally
3 participants