Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RIA/ORA publication dependencies #5415

Merged
merged 1 commit into from Mar 3, 2021
Merged

Conversation

bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Feb 9, 2021

Setting the publication dependency for 'origin' on an ORA remote in
postclone routine didn't properly account for multiple accessible ORA
remotes or the fact that none of them may be associated with the RIA
store we are cloning from.

Now, first check for an ORA remote with the same store URL registered
and if there is none check whether the store has a remote's uuid stored
in the .git/config of the dataset. If we have that remote enabled, set
the dependency on it.

@bpoldrack bpoldrack marked this pull request as draft February 9, 2021 13:43
Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My superficial comment is re performance: Can it happen that we clone from store X, and we have an ORA URL pointing to the same X, but we would make a mistake when we configure a publication dependency. I will like the additional query for a UUID makes things slower, but not more reliable.

@bpoldrack
Copy link
Member Author

I will like the additional query for a UUID makes things slower, but not more reliable.

I think you misread things (or I made a mistake). Intention is to only check the uuid if we have no URL match.

@codecov
Copy link

codecov bot commented Feb 9, 2021

Codecov Report

Merging #5415 (888c948) into master (365ef82) will increase coverage by 0.03%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5415      +/-   ##
==========================================
+ Coverage   89.48%   89.51%   +0.03%     
==========================================
  Files         296      296              
  Lines       41800    41907     +107     
==========================================
+ Hits        37404    37513     +109     
+ Misses       4396     4394       -2     
Impacted Files Coverage Δ
datalad/core/distributed/clone.py 90.93% <92.00%> (+1.09%) ⬆️
datalad/core/distributed/tests/test_clone.py 96.95% <100.00%> (+0.07%) ⬆️
datalad/downloaders/tests/test_http.py 88.59% <0.00%> (-1.91%) ⬇️
datalad/downloaders/providers.py 93.83% <0.00%> (-0.39%) ⬇️
datalad/ui/progressbars.py 75.42% <0.00%> (-0.30%) ⬇️
datalad/tests/utils.py 86.46% <0.00%> (-0.25%) ⬇️
datalad/downloaders/base.py 77.19% <0.00%> (-0.20%) ⬇️
datalad/interface/tests/test_run_procedure.py 100.00% <0.00%> (ø)
datalad/core/local/run.py 99.11% <0.00%> (+<0.01%) ⬆️
datalad/interface/tests/test_rerun.py 98.49% <0.00%> (+0.03%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 365ef82...888c948. Read the comment docs.

@bpoldrack bpoldrack force-pushed the fix-5396 branch 2 times, most recently from fe51741 to 21ced60 Compare February 19, 2021 08:13
@bpoldrack bpoldrack marked this pull request as ready for review February 19, 2021 08:17
if r['annex-uuid'] == org_uuid]
if uuid_matching_remotes:
# we can't have multiple
assert len(uuid_matching_remotes) == 1
Copy link
Contributor

@kyleam kyleam Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge accepted :)

assertion trigger

This snippet depends on having datalad's test ssh target as datalad-test in your ssh_config.

set -eu

cd "$(mktemp -d /tmp/dl-sameas-XXXXXXX)"
mkdir store
store="$(pwd)/store"

datalad create a
(
    cd a
    datalad create-sibling-ria -s s1 ria+ssh://datalad-test"$store"
    git annex initremote s1-store-local --sameas=s1-storage \
        type=external externaltype=ora autoenable=true \
        url=ria+file://"$store"
    datalad push --to=s1
)

dsid=$(git -C a config --file=.datalad/config datalad.dataset.id)
datalad clone ria+file://"$store#$dsid" b || :
(
    cd b
    echo
    datalad -f json siblings query | \
        jq 'select(."annex-externaltype" == "ora") |
            {name, "annex-externaltype", "annex-uuid"}'
    echo
    git config --get-regexp remote.s1
)
[...]
[ERROR] AssertionError() [clone.py:postclonecfg_ria:888] (AssertionError) 

{
  "name": "s1-store-local",
  "annex-externaltype": "ora",
  "annex-uuid": "beee8cb5-090a-4f4c-b121-e4342f89ea73"
}
{
  "name": "s1-storage",
  "annex-externaltype": "ora",
  "annex-uuid": "beee8cb5-090a-4f4c-b121-e4342f89ea73"
}

remote.s1-store-local.annex-externaltype ora
remote.s1-store-local.annex-uuid beee8cb5-090a-4f4c-b121-e4342f89ea73
remote.s1-store-local.annex-config-uuid a2e7a24e-8fe1-42d6-9358-6be571f813fc
remote.s1-store-local.annex-cost 100.0
remote.s1-store-local.annex-availability GloballyAvailable
remote.s1-storage.annex-externaltype ora
remote.s1-storage.annex-uuid beee8cb5-090a-4f4c-b121-e4342f89ea73
remote.s1-storage.annex-cost 200.0
remote.s1-storage.annex-availability GloballyAvailable
remote.s1-storage.annex-ignore false
remote.s1-storage.skipfetchall true

--sameas is probably pretty rare and even rarer with ORA special remotes, but it's still probably worth filtering out sameas remotes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You win! I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amend and force-pushed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

The PR in general looks reasonable to me, though given how little I know about ria/ora, that doesn't count for much.

Setting the publication dependency for 'origin' on an ORA remote in
postclone routine didn't properly account for multiple accessible ORA
remotes or the fact that none of them may be associated with the RIA
store we are cloning from.

Now, first check for an ORA remote with the same store URL registered
and if there is none check whether the store has a remote's uuid stored
in the .git/config of the dataset. If we have that remote enabled, set
the dependency on it.

(Closes datalad#5396)
Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long. I now had the chance to battle-test this in a problematic scenario, different from my original issue, and also there it works as advertised!

Code itself LGTM too.

Thx much!

@mih mih merged commit 415016a into datalad:master Mar 3, 2021
@mih mih deleted the fix-5396 branch March 3, 2021 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants