Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH+BF: S3 and downloaders: support anonymous access + interface to add new providers #2708

Merged
merged 28 commits into from Aug 28, 2018

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jul 21, 2018

  • Basic and Digest HTTP Auth should work now
  • In interactive session would now allow to establish a new provider configuration for which would get saved into a user directory
  • unittests for process_www_authenticate etc
  • Test at least for the major portions of the UI to enter the new provider/credential
  • may be would include or just be part of integration test above to test Basic and Digest auth on https://jigsaw.w3.org/HTTP/Basic and https://jigsaw.w3.org/HTTP/Digest
  • may be more RFing for credentials (allow to make new one persistent etc)
    so marking WIP for now
yarikoptic added 2 commits Jul 21, 2018
Also adjusted logic on handling then auth failure so credentials
get asked if anonimous access fails.

Also do not add versionId option to URL if no version informatino
was obtained
@yarikoptic yarikoptic added the WIP label Jul 21, 2018
@codecov
Copy link

@codecov codecov bot commented Jul 21, 2018

Codecov Report

Merging #2708 into master will increase coverage by <.01%.
The diff coverage is 90.55%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2708      +/-   ##
=========================================
+ Coverage   90.19%   90.2%   +<.01%     
=========================================
  Files         245     245              
  Lines       31211   31607     +396     
=========================================
+ Hits        28151   28510     +359     
- Misses       3060    3097      +37
Impacted Files Coverage Δ
datalad/downloaders/credentials.py 85.31% <ø> (ø) ⬆️
datalad/ui/dialog.py 92.54% <ø> (-0.05%) ⬇️
datalad/ui/tests/test_dialog.py 98.79% <100%> (+0.11%) ⬆️
datalad/downloaders/base.py 80.72% <100%> (+5.62%) ⬆️
datalad/downloaders/tests/test_providers.py 100% <100%> (ø) ⬆️
datalad/downloaders/tests/test_http.py 86.71% <100%> (+1.43%) ⬆️
datalad/downloaders/providers.py 94.22% <100%> (+2.91%) ⬆️
datalad/downloaders/tests/test_s3.py 73.43% <100%> (+0.42%) ⬆️
datalad/tests/utils.py 90% <100%> (-0.74%) ⬇️
datalad/ui/base.py 95.45% <100%> (+6.56%) ⬆️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90d18e3...dc440f3. Read the comment docs.

yarikoptic added 3 commits Jul 23, 2018
…denied

Also BF access to http via basic and digest HTTP auth which
is stateless, so we just need to assign .auth for the session

Information for the provider is "deduced" from the failure
message, although other alternatives are also given as non-default
option.
Copy link
Member

@kyleam kyleam left a comment

I haven't tested this out and I'm not very familiar with the downloaders code, but here are my superficial comments from a read through.

@@ -75,11 +75,12 @@ def __init__(self, credential=None, authenticator=None):
authenticator: Authenticator, optional
Authenticator to use for authentication.
"""
# Allow for anonymous connection if no credential is provided

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

Is this comment misplaced? The code immediately below is unchanged and doesn't check credential. Perhaps it should be above the if not credential line below?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

I guess it was overall like a TODO comment reminder... I just removed it now

# Provide some default sugaring
default = kwargs.pop('default', None)
if default is not None:
if default in {True}:

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

If you're going to have only one value, why not is True here and is False below?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

just curious -- "why not?". I do not think any performance penalty is worth mentioning
Logic for such use instead of is is that if we happen to decide to extend, data structure is there

This comment has been minimized.

@kyleam

kyleam Aug 4, 2018
Member

I find it less readable because I'm not used to seeing it. As for the logic, I don't consider changing is True to in {True, X} if the need arises to be a maintenance burden. And it seems you could use the same logic for pretty much any is ATOM, or any == ATOM for that matter. Anyway, my comment was a "just curious" one as well.

@@ -85,3 +88,6 @@ def nothing(x, k=1):
@with_testsui(responses='a')
def ask():
assert_equal(ui.question('what is a?'), 'a')


ask()

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

AFAICS, this fix and the cosmetic changes above are unrelated to yesno default change, which isn't actually covered in the tests.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

yesno came from a usecase here where I wanted to provide "sensible" default of True for yesno. This one just spotted. but yeah -- need to add a unittest for handling those bools, will do

This comment has been minimized.

@kyleam

kyleam Aug 4, 2018
Member

This one just spotted

That's fine, but it'd be easy to put that in a separate commit. When reviewing, I find it confusing when unrelated bits are tossed in (including cosmetic fixups that just add noise to the diffs).

@@ -185,40 +194,67 @@ def access(self, method, url, allow_old_session=True, **kwargs):
if not ui.is_interactive:
lgr.error(
"Interface is non interactive, so we are "
"reraising: %s" % exc_str(e))
"re-raising: %s" % exc_str(e))

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

I think reraising was fine.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

ok, will change throughout the codebase... i.e. in 1 more place (a comment ;-))

raise DownloadError(
title +
" %s does not have a default credential type, "
"cannot authenticate"

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

Why put "any credential" on a new line?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

what "any credential"? ah -- way above! will fix and compress here a bit. thanks

except AttributeError:
pass
known_providers_by_name = {p.name: p for p in self._providers}
PROVIDERS_USER_DIR = self._get_providers_dirs()['user']

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

This is local to this function. Why the allcaps?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

I finally found CAPSLOCK useful! ;-) I guess leftover from when I hoped to have those just a process-wide constants... will lowercase

if ui.yesno(
title="Known provider %s" % name,
text="Provider with name %s already known. Do you want to "
"use it for this session? If 'No' - enter another name"

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

This could be misinterpreted as "To answer 'NO', enter another name here".

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

indeed... I better move this trailing sentence to avoid confusion - I think it will be ok without it

url_re = re.escape(url)
while True:
url_re = ui.question(
title="New provider regular_expression",

This comment has been minimized.

@kyleam

kyleam Aug 3, 2018
Member

Drop the underscore?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

indeed. thanks!

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Thanks @kyleam for comments --will push fixups shortly

# Provide some default sugaring
default = kwargs.pop('default', None)
if default is not None:
if default in {True}:

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

just curious -- "why not?". I do not think any performance penalty is worth mentioning
Logic for such use instead of is is that if we happen to decide to extend, data structure is there

@@ -85,3 +88,6 @@ def nothing(x, k=1):
@with_testsui(responses='a')
def ask():
assert_equal(ui.question('what is a?'), 'a')


ask()

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

yesno came from a usecase here where I wanted to provide "sensible" default of True for yesno. This one just spotted. but yeah -- need to add a unittest for handling those bools, will do

url_re = re.escape(url)
while True:
url_re = ui.question(
title="New provider regular_expression",

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

indeed. thanks!

if ui.yesno(
title="Known provider %s" % name,
text="Provider with name %s already known. Do you want to "
"use it for this session? If 'No' - enter another name"

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

indeed... I better move this trailing sentence to avoid confusion - I think it will be ok without it

except AttributeError:
pass
known_providers_by_name = {p.name: p for p in self._providers}
PROVIDERS_USER_DIR = self._get_providers_dirs()['user']

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

I finally found CAPSLOCK useful! ;-) I guess leftover from when I hoped to have those just a process-wide constants... will lowercase

raise DownloadError(
title +
" %s does not have a default credential type, "
"cannot authenticate"

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

what "any credential"? ah -- way above! will fix and compress here a bit. thanks

@@ -185,40 +194,67 @@ def access(self, method, url, allow_old_session=True, **kwargs):
if not ui.is_interactive:
lgr.error(
"Interface is non interactive, so we are "
"reraising: %s" % exc_str(e))
"re-raising: %s" % exc_str(e))

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

ok, will change throughout the codebase... i.e. in 1 more place (a comment ;-))

@@ -75,11 +75,12 @@ def __init__(self, credential=None, authenticator=None):
authenticator: Authenticator, optional
Authenticator to use for authentication.
"""
# Allow for anonymous connection if no credential is provided

This comment has been minimized.

@yarikoptic

yarikoptic Aug 4, 2018
Author Member

I guess it was overall like a TODO comment reminder... I just removed it now

yarikoptic added 3 commits Aug 4, 2018
* origin/master: (47 commits)
  DOC: Fix a docstring typo
  RF: show only the version, without license, upon --version
  DOC: Clarify datalad.__doc__'s pointer to datalad.api
  BF: Internal cmd call, leave error handling to outer layer
  DOC: Tell datalad.__doc__ readers how to see a list of commands
  DOC: Add summary of commands to datalad.api.__doc__
  BF(TST): use .call_count since assert_called_once is N/A on older mocks
  RF: Move "get interface doc" code into dedicated function
  RF: Move "load module from interface" code to a dedicated function
  DOC: interface: Add a docstring to get_cmd_summaries
  RF: Use shorter parameter names for get_cmd_summaries
  RF: Move get_cmd_summaries from cmdline/main to interface/base
  RF: Extract CLI-independent parts from get_description_with_cmd_summary
  RF: cmdline: Use a dict rather than a list for short descriptions
  DOC: interface: Add a docstring to get_interface_groups
  RF: Teach get_interface_groups about plugins
  ENH(TST): test that ipython ui frontend gets set in under ZMQInteractiveShell
  ENH: dedicated test for double entry of hidden values if no choices given
  ENH(TST): even for hidden prompt with choices we do not reconfirm the entry
  ENH+BF: missing import and basic (largely smoke) test for IPython ui
  ...
@yarikoptic yarikoptic changed the title ENH+BF: S3 and downloaders: support anonymous access ENH+BF: S3 and downloaders: support anonymous access + interface to add new providers Aug 8, 2018
yarikoptic and others added 8 commits Aug 9, 2018
* origin/master: (28 commits)
  BF: Missing import for bound method
  fix: added a semicolon
  fix: table entries get a pointer cursor to show users they can click on entry
  BF: Unique value hashing could not deal with lists as values
  BF: Recreate old GitRepo.get_git_attributes() to keep old folks going
  TST: Test not up to the complexity of the world
  DOC: Remove question that is not relevant
  BF: Humble attempt to fix unicode issue
  TST: More or less comprehensive GitRepo.(g|s)et_attributes test
  travis: Add a comment about grunt-contrib-qunit pinning
  DOC: set_gitattributes: Fix a minor typo
  TST: travis: Pin grunt-contrib-qunit
  BF: no_annex: Correct path to .gitattributes
  TST: Plain RF to remove 'import *' madness
  RF: no_annex plugin used attributes helper
  RF: `create` no longer hacks .gitattributes, no duplication on --force
  DOC: search: Fix a typo
  RF+ENH: New search mode egrepcs for case-sensitive search
  NF: GitRepo.set_gitattributes
  RF+ENH: Improved GitRepo.get_gitattributes
  ...
The default value for url is None, and re.escape with fail if called
with None.
question() raises a ValueError if one of its choices isn't selected,
so the program should never hit this error.  Using assert here to
declare this impossible is fine (and if datalad eventually works with
"python -O", it's fine that the check would be dropped).
If the provider did not define a url_re, it would fail with a KeyError
rather than getting to the assertion below.  (Although, in general,
providers code should move away from its use of assert statements for
error handling.)
kyleam added 2 commits Aug 22, 2018
Copy link
Member Author

@yarikoptic yarikoptic left a comment

Just a few minor comments. Thanks @kyleam !

@@ -65,6 +65,17 @@
__docformat__ = 'restructuredtext'


def process_www_authenticate(v):

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018
Author Member

Could you please add a short docstring to describe the purpose of the function and why output should be a list? (that is should provide value for the supported_types)

This comment has been minimized.

@kyleam

kyleam Aug 23, 2018
Member

Uhh you wrote that code. I don't know its purpose or why its output should be a list, but, sure, I could guess and add a docstring. In the time it took you to "review" the function you wrote, you could have wrote that docstring.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018
Author Member

ah, damn me -- should've added a docstring indeed, will do

def nonmatching_url():
providers.enter_new(url, auth_types=["http_basic_auth"])
nonmatching_url()
ok_exists(op.join(providers_dir, "foo3.cfg"))

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018
Author Member

Lovely!
I guess the next one would be a similar one for _enter_credentials?

This comment has been minimized.

@kyleam

kyleam Aug 23, 2018
Member

I was hoping I had already fulfilled your request to help with your PR.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018
Author Member

you did help! ok,I could continue whenever I get a chance, but not sure when it would be :-(

This comment has been minimized.

@kyleam

kyleam Aug 23, 2018
Member

I wasn't saying I wouldn't help more, just stating what I had hoped :)

I'll look into filling out the tests more, starting with _enter_credentials.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018
Author Member

thanks in advance! ;-)

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Aug 27, 2018

@kyleam pushed two commits to "enable" access to the s3 buckets with period in their name (apparently needs special handling and now there is openneuro.org bucket)

kyleam added 6 commits Aug 27, 2018
One authenticator, NoneAuthenticator, doesn't actually require
authentication, so there is no need to ask for credentials.
The downstream code makes the distinction between False and None, but,
with the default value of None for self.credential, there is no way
for the code to evaluate to False.

Assume that the intention was for needs_authentication to be set to
requires_authentication (True/False) when there's an authenticator.
The upstream code always sets exc_info when access_denied is truthy.
I can't find a way to trigger this with the tests.  When a downloader
class is initialized with an authenticator but no credentials, the
downloader refuses to continue unless the user enters credentials.  So
it seems impossible to execute a codepath that assumes that there is
an authenticator but no credentials.
A few of the buildbots are failing with keyring errors.  Hopefully
this fixes them.
@kyleam
Copy link
Member

@kyleam kyleam commented Aug 28, 2018

@yarikoptic Tests now cover most of it aside from the s3 bits. Feel free to add a simple ls s3 test hitting a public URL, if you'd like.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Aug 28, 2018

Thank you very much @kyleam

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Aug 28, 2018

Looks great. Let's see how it holds to real life testing

@yarikoptic yarikoptic merged commit adb83d1 into datalad:master Aug 28, 2018
9 checks passed
9 checks passed
@codecov
codecov/patch 90.55% of diff hit (target 90.19%)
Details
@codecov
codecov/project 90.2% (+<.01%) compared to 90d18e3
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yarikoptic
datalad-pr-dl-osx-64 DEV build done.
Details
@yarikoptic
datalad-pr-docker-dl-nd14_04 DEV build done.
Details
@yarikoptic
datalad-pr-docker-dl-nd16_04 DEV build done.
Details
@yarikoptic
datalad-pr-docker-dl-nd80 DEV build done.
Details
@yarikoptic
datalad-pr-docker-dl-nd90 DEV build done.
Details
@yarikoptic yarikoptic deleted the yarikoptic:enh-s3 branch Sep 6, 2018
yarikoptic added a commit that referenced this pull request Sep 13, 2018
0.10.3 (Sep 13, 2018) -- Almost-perfect

This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and operation
on Windows in general.  Among enhancements you will see the
support of public S3 buckets (even with periods in their names),
ability to configure new providers interactively, and improved `egrep`
search backend.

Although we do not require with this release, it is recommended to make
sure that you are using a recent `git-annex` since it also had a variety
of fixes and enhancements in the past months.

Fixes

- Parsing of combined short options has been broken since DataLad
  v0.10.0. ([#2710])
- The `datalad save` instructions shown by `datalad run` for a command
  with a non-zero exit were incorrectly formatted. ([#2692])
- Decompression of zip files (e.g., through `datalad
  add-archive-content`) failed on Python 3.  ([#2702])
- Windows:
  - colored log output was not being processed by colorama.  ([#2707])
  - more codepaths now try multiple times when removing a file to deal
    with latency and locking issues on Windows.  ([#2795])
- Internal git fetch calls have been updated to work around a
  GitPython `BadName` issue.  ([#2712]), ([#2794])
- The progess bar for annex file transferring was unable to handle an
  empty file.  ([#2717])
- `datalad add-readme` halted when no aggregated metadata was found
  rather than displaying a warning.  ([#2731])
- `datalad rerun` failed if `--onto` was specified and the history
  contained no run commits.  ([#2761])
- Processing of a command's results failed on a result record with a
  missing value (e.g., absent field or subfield in metadata).  Now the
  missing value is rendered as "N/A".  ([#2725]).
- A couple of documentation links in the "Delineation from related
  solutions" were misformatted.  ([#2773])
- With the latest git-annex, several known V6 failures are no longer
  an issue.  ([#2777])
- In direct mode, commit changes would often commit annexed content as
  regular Git files.  A new approach fixes this and resolves a good
  number of known failures.  ([#2770])
- The reporting of command results failed if the current working
  directory was removed (e.g., after an unsuccessful `install`). ([#2788])
- When installing into an existing empty directory, `datalad install`
  removed the directory after a failed clone.  ([#2788])
- `datalad run` incorrectly handled inputs and outputs for paths with
  spaces and other characters that require shell escaping.  ([#2798])
- Globbing inputs and outputs for `datalad run` didn't work correctly
  if a subdataset wasn't installed.  ([#2796])
- Minor (in)compatibility with git 2.19 - (no) trailing period
  in an error message now. ([#2815])

Enhancements and new features

- Anonymous access is now supported for S3 and other downloaders.  ([#2708])
- A new interface is available to ease setting up new providers.  ([#2708])
- Metadata: changes to egrep mode search  ([#2735])
  - Queries in egrep mode are now case-sensitive when the query
    contains any uppercase letters and are case-insensitive otherwise.
    The new mode egrepcs can be used to perform a case-sensitive query
    with all lower-case letters.
  - Search can now be limited to a specific key.
  - Multiple queries (list of expressions) are evaluated using AND to
    determine whether something is a hit.
  - A single multi-field query (e.g., `pa*:findme`) is a hit, when any
    matching field matches the query.
  - All matching key/value combinations across all (multi-field)
    queries are reported in the query_matched result field.
  - egrep mode now shows all hits rather than limiting the results to
    the top 20 hits.
- The documentation on how to format commands for `datalad run` has
  been improved.  ([#2703])
- The method for determining the current working directory on Windows
  has been improved.  ([#2707])
- `datalad --version` now simply shows the version without the
  license.  ([#2733])
- `datalad export-archive` learned to export under an existing
  directory via its `--filename` option.  ([#2723])
- `datalad export-to-figshare` now generates the zip archive in the
  root of the dataset unless `--filename` is specified.  ([#2723])
- After importing `datalad.api`, `help(datalad.api)` (or
  `datalad.api?` in IPython) now shows a summary of the available
  DataLad commands.  ([#2728])
- Support for using `datalad` from IPython has been improved.  ([#2722])
- `datalad wtf` now returns structured data and reports the version of
  each extension.  ([#2741])
- The internal handling of gitattributes information has been
  improved.  A user-visible consequence is that `datalad create
  --force` no longer duplicates existing attributes.  ([#2744])
- The "annex" metadata extractor can now be used even when no content
  is present.  ([#2724])
- The `add_url_to_file` method (called by commands like `datalad
  download-url` and `datalad add-archive-content`) learned how to
  display a progress bar.  ([#2738])

* tag '0.10.3': (214 commits)
  Changelog entry for 2.19 git compat fix
  DOC: slight tune ups to the ChangeLog
  ENH: link issue/pull #s in CHANGELOG, use tools/link_issues_CHANGELOG
  BF: remove trailing period while matching a mesage from git
  DOC: try_multiple_dec: Add description of `duration` parameter
  CLN: annexrepo: Fix grammar in a recently added comment
  TST: auto: Reformat comment from 900ee08
  DOC: _rmtree: Drop a word from summary line
  DOC: Info on IDs (fixes gh-2801)
  BF: diff -- when reraising - just raise, do not raise that instance of exception from new location
  BF(TST): precommit before removing submodule so there is no batched processes
  ENH+BF(TST): close established ssh sockets upon test finish
  BF(TST): One more "clost all log handlers in the test"
  CHANGELOG.md: Start adding entries for v0.10.3
  BF(TST): close cookies db in the teardown since atexit is later, so cannot assure no open files
  BF(TST): explicitly close created log handlers
  ENH(TST): @known_failure_windows to replace plain @skip_if_on_windows where there is hope
  BF(TST): do not swallow output while testing AutomagicIO to not cause some open files issue
  ENH(TST): Skip a test when cannot remove curent directory
  BF(TST): explicitly precommit a repo used under swallow_outputs
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants