Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support dataset aliases in RIA stores #4459

Merged
merged 2 commits into from Apr 30, 2020
Merged

ENH: Support dataset aliases in RIA stores #4459

merged 2 commits into from Apr 30, 2020

Conversation

mih
Copy link
Member

@mih mih commented Apr 29, 2020

clone() can now handle URLs of the format

ria+<protocol>://<storelocation>#~<aliasname>

in addition to those containing UUIDs (examples inside).

A future addition should likely equip create-sibling-ria() with the ability
to establish those aliases too. But for now it is not clear to me how to
do that best (in particular how to perform properly on reconfigure).
Given time-pressure this needs to wait for another time.

Fixes gh-4424

`clone()` can now handle URLs of the format

   'ria+<protocol>://<storelocation>#~<aliasname>'

in addition to those containing UUIDs.

A future addition should likely equip `create-sibling-ria()` with the ability
to establish those aliases too. But for now it is not clear to me how to
do that best (in particular how to perform properly on reconfigure).
Given time-pressure this needs to wait for another time.

Fixes dataladgh-4424
@mih mih added the UX label Apr 29, 2020
@mih mih added this to the 0.13.0 milestone Apr 29, 2020
@bpoldrack
Copy link
Member

bpoldrack commented Apr 29, 2020

LGTM. Somehow I like the tilde.

trace = '{}/{}'.format(dsid[:3], dsid[3:])
default_destpath = dsid
elif dsid.startswith('~'):
trace = 'alias/{}'.format(dsid[1:])
Copy link
Member

@yarikoptic yarikoptic Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is more of an internal purpose, why not to have it as .alias/ or even .ria/aliases/ or alike? then you could still have an alias dataset (happen ria allow it at some point)

Copy link
Member Author

@mih mih Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to not have further obfuscation. Having a dataset stored in a directory named like this would represent a fundamental concept change. I doubt that we would keep the name or URL scheme at that point.

@yarikoptic
Copy link
Member

yarikoptic commented Apr 29, 2020

sorry if I am missing something.
Why bother with explicit ~ handling and not just rely on symlink based URL like ria+<protocol>://<storelocation>/<aliasname>? where that aliasname is a symlink? I just symlinked abide as abide-link and git clone http://datasets.datalad.org/abide-link/.git worked just fine having symlinks forwarded.

@mih
Copy link
Member Author

mih commented Apr 29, 2020

All ria URLs require explicit handling. None are meant to expose the underlying store organization. That organization is versioned and may change. URLs must not change in that case.

@yarikoptic
Copy link
Member

yarikoptic commented Apr 29, 2020

That organization is versioned and may change.

haven't spotted any reflection of any possible versioning of the "organization" in the code ATM... but it is ok with me if you say that no direct symlinks would work.

@kyleam
Copy link
Collaborator

kyleam commented Apr 29, 2020

This looks nice implementation-wise, and I don't have any issue with using ~ in the URL or alias/ in the store, but I'm confused by this statement:

All ria URLs require explicit handling. None are meant to expose the underlying store organization.

It seems to me that there is freedom to restructure only in terms of future cloners. Wouldn't it break all existing clones that already point directly to the resolved location? An example clone from a ria test:

$ git remote -v
origin	http://127.0.0.1:43805/850/9d199-8a4f-11ea-9dee-28c63f920482 (fetch)
origin	http://127.0.0.1:43805/850/9d199-8a4f-11ea-9dee-28c63f920482 (push)

@codecov
Copy link

codecov bot commented Apr 29, 2020

Codecov Report

Merging #4459 into master will increase coverage by 39.28%.
The diff coverage is 94.11%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4459       +/-   ##
===========================================
+ Coverage   49.62%   88.90%   +39.28%     
===========================================
  Files         285      285               
  Lines       37844    37900       +56     
===========================================
+ Hits        18779    33696    +14917     
+ Misses      19065     4204    -14861     
Impacted Files Coverage Δ
datalad/distributed/create_sibling_ria.py 76.05% <ø> (+52.81%) ⬆️
datalad/core/distributed/tests/test_clone.py 92.88% <90.00%> (+86.28%) ⬆️
datalad/core/distributed/clone.py 92.44% <100.00%> (+38.96%) ⬆️
datalad/core/local/tests/test_create.py 100.00% <0.00%> (+0.44%) ⬆️
datalad/core/local/create.py 93.00% <0.00%> (+0.69%) ⬆️
datalad/local/subdatasets.py 95.65% <0.00%> (+0.86%) ⬆️
datalad/core/local/diff.py 95.34% <0.00%> (+1.16%) ⬆️
datalad/dochelpers.py 87.40% <0.00%> (+1.48%) ⬆️
datalad/support/tests/test_network.py 100.00% <0.00%> (+1.83%) ⬆️
datalad/support/external_versions.py 96.07% <0.00%> (+1.91%) ⬆️
... and 176 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c872e1...125b87e. Read the comment docs.

@mih
Copy link
Member Author

mih commented Apr 30, 2020

This looks nice implementation-wise, and I don't have any issue with using ~ in the URL or alias/ in the store, but I'm confused by this statement:

All ria URLs require explicit handling. None are meant to expose the underlying store organization.

It seems to me that there is freedom to restructure only in terms of future cloners. Wouldn't it break all existing clones that already point directly to the resolved location? An example clone from a ria test:

$ git remote -v
origin	http://127.0.0.1:43805/850/9d199-8a4f-11ea-9dee-28c63f920482 (fetch)
origin	http://127.0.0.1:43805/850/9d199-8a4f-11ea-9dee-28c63f920482 (push)

Yes, correct. The explicitness is there, after the ria URLs get resolved. That is no change from before. I think, if persistence like that is desired/required than it should either be achieved at the server side, or with a dedicated remote helper (I tried one of those already https://github.com/datalad/git-remote-rclone)

re @yarikoptic You are correct that there is no trace of versioning in the code, because we are still at version 1, but the stores are versioned: http://store.datalad.org/ria-layout-version

@mih
Copy link
Member Author

mih commented Apr 30, 2020

Given the carefully positive feedback, I will merge this now. thx!

@mih mih merged commit 6b6d671 into datalad:master Apr 30, 2020
11 of 12 checks passed
@mih mih deleted the enh-4424 branch Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants