Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: fix authentication + download from S3 for NDA #4824

Merged
merged 9 commits into from Sep 4, 2020

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Aug 15, 2020

  • Uses new version of NDA authenticator to obtain S3 credentials for download
  • Makes it possible to download from NDA bucket(s) which are quite "locked down"
    • default boto3 access to a bucket with verification would fail, so upon that specific error we would try to connect without validation
    • getting a key for which versionId is known is not possible while specifying that versionId. And default boto behavior is to do so. In this PR I change behavior to "kill" version_id from the key we are requesting if original url requested did not have versionId.

Although it is bug fix in nature, I am ok to reposition it against master if so deemed better.

@codecov
Copy link

codecov bot commented Aug 15, 2020

Codecov Report

Merging #4824 into maint will decrease coverage by 0.21%.
The diff coverage is 19.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #4824      +/-   ##
==========================================
- Coverage   89.68%   89.46%   -0.22%     
==========================================
  Files         288      288              
  Lines       40362    40378      +16     
==========================================
- Hits        36197    36123      -74     
- Misses       4165     4255      +90     
Impacted Files Coverage Δ
datalad/downloaders/s3.py 57.66% <0.00%> (-0.86%) ⬇️
datalad/interface/common_cfg.py 100.00% <ø> (ø)
datalad/downloaders/credentials.py 81.87% <8.33%> (-4.80%) ⬇️
datalad/support/s3.py 22.98% <16.66%> (-0.69%) ⬇️
datalad/downloaders/tests/test_s3.py 72.09% <50.00%> (-1.08%) ⬇️
datalad/support/third/nda_aws_token_generator.py 29.76% <100.00%> (ø)
datalad/customremotes/base.py 69.35% <0.00%> (-14.52%) ⬇️
datalad/customremotes/tests/__init__.py 91.66% <0.00%> (-8.34%) ⬇️
datalad/support/keyring_.py 85.10% <0.00%> (-6.39%) ⬇️
datalad/tests/test_base.py 96.87% <0.00%> (-3.13%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 663c5c6...c954269. Read the comment docs.

@yarikoptic yarikoptic changed the title BF: fix authentication to NDA BF: fix authentication + download from S3 for NDA Aug 28, 2020
@yarikoptic yarikoptic marked this pull request as ready for review August 28, 2020 01:07
@yarikoptic yarikoptic added the merge-if-ok OP considers this work done, and requests review/merge label Aug 28, 2020
@yarikoptic
Copy link
Member Author

failed travis debug run is unrelated test_ria_basics fail I think we saw
ERROR: datalad.distributed.tests.test_ria_basics.test_binary_data('datalad-test',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 271, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 731, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 731, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distributed/tests/test_ria_basics.py", line 468, in _test_binary_data
    ds.download_url(url, path=file, message="Add DICOM file from github")
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/distribution/dataset.py", line 503, in apply_func
    return f(**kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 494, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 482, in return_func
    results = list(results)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/utils.py", line 469, in generator_func
    msg="Command did not complete successfully")
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
[{'action': 'download_url',
  'message': "'ConsoleLog' object has no attribute 'yesno' "
             '[__init__.py:__getattribute__:101]',
  'path': '/tmp/datalad_temp__test_binary_datajhq0ejj_/dicomfile',
  'status': 'error',
  'type': 'file'}]

Copy link
Collaborator

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit light on context that would help an unfamiliar reviewer (and I'm not sure there are any people aside from you that are familiar with this area), but I didn't spot any obvious issues on a read through.

datalad/downloaders/configs/nda.cfg Outdated Show resolved Hide resolved
@@ -158,3 +158,12 @@ def test_boto_host_specification(tempfile):
with swallow_outputs():
providers.download(url_dandi1, path=tempfile)
assert_equal(md5sum(tempfile), '97f4290b2d369816c052607923e372d4')


def test_restricted_bucket_on_NDA():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this only runs on your local machine, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes unfortunately. NDA access, and to this specific dataset is required and it is a not that easy procedure to get. And with their specific settings for the bucket - the value from the test is really in talking to NDA.
May be at some point we could setup a github workflow worker which would sit on "trusted" infrastructure and has access to the credentials, but then we might need to (as I think I did with buildbot in the past) to make it run tests only if PRs submitted by trusted github users or a dedicated label is added to signal that PR diff is good (no malicious code which would expose secrets) to be tested on

datalad/downloaders/s3.py Outdated Show resolved Hide resolved
For me it still fails, but may be because ATM I have no access to anything there?
…cessDeniedError

That should cause UI to ask for new credentials
…orbidden

Happens eg. for NDA which has a VERY restricted ACL: you cannot HEAD the bucket etc.
So initial HEAD boto issues for the bucket gets "Forbidden".
Comment in the code diff outlines the situation:  in buckets with minimal ACL
you cannot download a key by specifying versionId, although boto does discover
it while first obtaining the key instance.
I think they are not yet used.  Could be re-added whenever we start testing
against development instance
@kyleam kyleam merged commit 6a00875 into datalad:maint Sep 4, 2020
2 checks passed
@yarikoptic yarikoptic deleted the bf-nda branch October 7, 2020 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants