Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: downloaders: Add support for shub:// URLs #4816

Merged
merged 2 commits into from Aug 22, 2020

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Aug 11, 2020

datalad-container resolves shub:// URLs to HTTP URLs by querying
https://singularity-hub.org/api/container/. It then feeds the
result to git annex addurl. That works to download the file, but
the registered URL is short-lived, so it's not useful for retrieving
the image later with {datalad,git annex} get.

Add a downloader that resolves shub:// URLs. When the datalad special
remote is enabled, this allows the shub:// URL, rather than the
resolved one, to be registered with git-annex.

In order for -container to make use of this, containers-add will need
to drop its custom handling when a recent enough datalad is detected.
It should also probably ensure that the datalad special remote is
enabled.

Re: datalad/datalad-container#96


demo
$ datalad create
[INFO   ] Creating a new annex repo at /tmp/dl-UUl9rGd
[INFO   ] Scanning for unlocked files (this may take some time)
create(ok): /tmp/dl-UUl9rGd (dataset)
$ git annex initremote datalad type=external externaltype=datalad encryption=none
$ datalad download-url -O reproin.simg shub://ReproNim/reproin:0.6.0
[INFO   ] Downloading 'shub://ReproNim/reproin:0.6.0' into '/tmp/dl-UUl9rGd/reproin.simg'
[INFO   ] Fetching 'https://singularity-hub.org/api/container/ReproNim/reproin:0.6.0'
download_url(ok): /tmp/dl-UUl9rGd/reproin.simg (file)
add(ok): /tmp/dl-UUl9rGd/reproin.simg (file)
save(ok): /tmp/dl-UUl9rGd (dataset)
action summary:
  add (ok: 1)
  download_url (ok: 1)
  save (ok: 1)
$ git annex whereis reproin.simg
whereis reproin.simg (2 copies)
        9b4511dd-02ea-4184-82d5-159576295911 -- [datalad]
        e733bbc3-e00d-4c3a-b698-e1f43734683d -- kyle@hylob:/tmp/dl-UUl9rGd [here]

  datalad: shub://ReproNim/reproin:0.6.0
ok
$ git annex drop reproin.simg
drop reproin.simg ok
(recording state in git...)
$ datalad get reproin.simg
reproin.simg:  14%|████▋                            | 47.2M/334M
[00:03<00:22, 12.7MB/s]
^C[WARNING] Still have 1 active progress bars when stopping
ERROR:
Interrupted by user while doing magic: KeyboardInterrupt() [cmd.py:_process_one_line:764]

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, but I don't think we should bother caching resolved URLs

datalad/downloaders/shub.py Outdated Show resolved Hide resolved
ok_file_has_content(target, "foo")

other_target = str(path / "other-target")
downloader.download("shub://org/repo", other_target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should get some minimal (BusyBox?) real container on the hub and test its download?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have the shub:// tests disabled in the -container repo, so I don't think we'd want to hit singularity hub here (and in general I don't think it is a good idea to hit it each time our test suite runs). Anyway, I think what is here is a good enough start, and someone interested in adding more can do so later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I guess we could just reenable in -containers whenever we detect that datalad comes with this downloader

@kyleam
Copy link
Contributor Author

kyleam commented Aug 12, 2020

but I don't think we should bother caching resolved URLs

Thanks for the review. Updated.

@kyleam
Copy link
Contributor Author

kyleam commented Aug 12, 2020

There are datalad.customremotes failures.

@yarikoptic
Copy link
Member

yeap, filed #4818 first thinking it is just flaky, but then started questioning that... not sure how this PR could have affected it, but also do not have a clue on what could have potentially started to cause those failures?!

@kyleam
Copy link
Contributor Author

kyleam commented Aug 12, 2020

I think it is the changes in this PR. I'm looking into it.

@yarikoptic
Copy link
Member

yeap, probably -- and due to:

            ('TRANSFER RETRIEVE somekey somefile', 'GETURLS somekey http:'),
            ('VALUE', 'GETURLS somekey https:'),
            ('VALUE', 'GETURLS somekey s3:'),

so I guess there is no shub: as well... test better doesn't hard code but goes through that tuple with protocols?

test_datalad:test_interactions has hard-coded scenarios based off the
schemes in DataladAnnexCustomRemote.SUPPORTED_SCHEMES.  If a scheme is
added (as will be done in the next commit), the test needs to be
updated.

Instead use SUPPORTED_SCHEMES to generate the scenarios.
datalad-container resolves shub:// URLs to HTTP URLs by querying
<https://singularity-hub.org/api/container/>.  It then feeds the
result to `git annex addurl`.  That works to download the file, but
the registered URL is short-lived, so it's not useful for retrieving
the image later with `{datalad,git annex} get`.

Add a downloader that resolves shub:// URLs.  When the datalad special
remote is enabled, this allows the shub:// URL, rather than the
resolved one, to be registered with git-annex.

In order for -container to make use of this, containers-add will need
to drop its custom handling when a recent enough datalad is detected.
It should also probably ensure that the datalad special remote is
enabled.

Re: datalad/datalad-container#96
@kyleam kyleam marked this pull request as ready for review August 12, 2020 15:29
@codecov
Copy link

codecov bot commented Aug 12, 2020

Codecov Report

Merging #4816 into master will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4816      +/-   ##
==========================================
- Coverage   89.68%   89.65%   -0.03%     
==========================================
  Files         287      289       +2     
  Lines       40378    40439      +61     
==========================================
+ Hits        36213    36256      +43     
- Misses       4165     4183      +18     
Impacted Files Coverage Δ
datalad/__init__.py 90.78% <ø> (-0.32%) ⬇️
datalad/customremotes/datalad.py 40.29% <100.00%> (ø)
datalad/customremotes/tests/test_datalad.py 96.07% <100.00%> (+0.24%) ⬆️
datalad/downloaders/providers.py 95.15% <100.00%> (+0.02%) ⬆️
datalad/downloaders/shub.py 100.00% <100.00%> (ø)
datalad/downloaders/tests/test_shub.py 100.00% <100.00%> (ø)
datalad/version.py 40.90% <100.00%> (ø)
datalad/downloaders/http.py 81.85% <0.00%> (-2.71%) ⬇️
datalad/downloaders/tests/test_http.py 87.71% <0.00%> (-2.22%) ⬇️
datalad/distribution/create_sibling.py 85.02% <0.00%> (-0.57%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e3361f...3342916. Read the comment docs.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! I will let other @datalad/developers to chime in or it would be ok with me to proceed with the merge

@yarikoptic yarikoptic added the merge-if-ok OP considers this work done, and requests review/merge label Aug 20, 2020
@yarikoptic
Copy link
Member

@mih @bpoldrack any comments or objections? If not, let's merge.

@yarikoptic yarikoptic merged commit fa9c72e into datalad:master Aug 22, 2020
@kyleam kyleam deleted the shub-downloader branch August 24, 2020 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants