Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BF/RF: GitHub authentication #3180

Merged
merged 22 commits into from May 7, 2019
Merged

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Feb 26, 2019

This pull request fixes #2004 , fixes #2005, and I also think it might fix #3126 if @effigies says so ;-)

This pull request proposes to do a more thorough pass over possibly available github authentication mechanisms to achieve the goal. I have tried to maintain currently present "non standard" handling of authentication for github:

  • token - we do not store those in the credentials store, but rely on it/them being present in the (git) config. But since multiple tokens could be defined (globally, and then within a dataset), we might need to consider multiple (I somehow get 3 values with 2 unique ones)
  • user/password could be provided in the command line (we don't do that for any other remote etc)

change in behavior

  • provided in cmdline user/password are not stored in the credential store (and thus also not deleted - so no dances with removing some credential if smth doesn't work)

additional changes

  • .ui.question has now repeat option (defaults to None to maintain previous behavior) to be able to force or disable repeated entry for a hidden value. In my case I wanted to get rid of it -- entering 2FA token twice is really not called for
  • .utils.unique got option reverse=False which, if set to True, would return unique values while maintaining the order from the back (not from the front as originally)

overall now for create-sibling-github:

  • multiple ways to authenticate are considered - we pretty much loop until we succeed by offering user to enter new user/password or until we really fail
  • unless both user and password are provided in command line, tokens are considered first (if user/password are provided -- no stored tokens are considered)
  • if user (no password) is provided, only the token(s) associated with that user will be considered
  • regular user/password credential is considered next, if fails, asked to re-enter
  • if it is detected that user/password authentication leads to 2FA error, we offer to generate token (with "user" and "repo" permissions) and store it in config ("global", "local", no "dataset) (or not store at all - empty value)
    • we only alert/error out if a token with our note (DataLad) already exists
  • if session is non-interactive, we don't do much really, and I guess with 2FA even if user/password are provided - we would simply fail. So token must be set via config first then

In addition to authentication related changes

TODOs

  • Move all the _ helper functions which now also import github as gh into datalad.support.github_. I kept that in-place for now with hope to still have some meaningful diff
  • make it possible to avoid reentering the login name if it was specified (see U1 below)
  • consider reverting aforementioned change in behavior that credentials aren't stored if provided in the cmdline. May be there should be ui.question to either do it or not and if not interactive - an INFO msg.
  • Try to come up with at least some rudimentary test for all the authentication logic

I would really appreciate any attempts to use it - once again, I have tried it only in a few scenarios and there were no and remain no tests for all the logic there.

**U1

$> datalad create-sibling-github --github-login yarikoptic-test --name gh-yarikoptic-test shablona
You need to authenticate with 'yarikoptic-test@github' credentials. https://github.com/login provides information on how to gain access
user: 
@yarikoptic yarikoptic changed the base branch from master to 0.11.x Feb 26, 2019
@yarikoptic yarikoptic added this to the Release 0.11.4 milestone Feb 26, 2019
@codecov
Copy link

@codecov codecov bot commented Feb 26, 2019

Codecov Report

Merging #3180 into 0.11.x will increase coverage by 0.18%.
The diff coverage is 87.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3180      +/-   ##
==========================================
+ Coverage   90.84%   91.02%   +0.18%     
==========================================
  Files         252      254       +2     
  Lines       33105    33293     +188     
==========================================
+ Hits        30073    30306     +233     
+ Misses       3032     2987      -45
Impacted Files Coverage Δ
datalad/interface/common_cfg.py 100% <ø> (ø) ⬆️
datalad/ui/base.py 95.45% <ø> (ø) ⬆️
datalad/utils.py 87.21% <100%> (+0.02%) ⬆️
datalad/tests/test_utils.py 96.49% <100%> (+0.02%) ⬆️
datalad/downloaders/tests/test_credentials.py 100% <100%> (ø) ⬆️
datalad/distribution/tests/test_create_github.py 95.71% <100%> (+2.69%) ⬆️
datalad/ui/dialog.py 93.2% <100%> (+0.04%) ⬆️
datalad/support/tests/test_github_.py 100% <100%> (ø)
datalad/ui/tests/test_dialog.py 98.85% <100%> (+0.05%) ⬆️
datalad/consts.py 100% <100%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b12a4cc...c304f4c. Read the comment docs.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Feb 28, 2019

ok, noone is interested to try/review - so I will just plow forward with RF, so diff would increase ;-)

@effigies
Copy link
Contributor

@effigies effigies commented Feb 28, 2019

Hmm. Well, I tried using this branch to test out #3126, but it turned out that I was using 0.11.2 from Neurodebian. After making sure I was using your branch, it still opened up the existing keyring, and did not invalidate the credentials:

$ datalad create-sibling-github test_repo
Please enter password for encrypted keyring: 
[ERROR  ] module 'github' has no attribute 'TwoFactorException' [create_sibling_github.py:_gen_github_ses:171] (AttributeError) 
$ datalad create-sibling-github test_repo
Please enter password for encrypted keyring: 
[ERROR  ] module 'github' has no attribute 'TwoFactorException' [create_sibling_github.py:_gen_github_ses:171] (AttributeError) 

I'll need to figure out where the keyring is to delete it and try again from scratch, but at least this, as currently written, won't lead to a situation where upgrading gets people out of a pre-existing credential jam.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Feb 28, 2019

Smells like outdated github module, what version do you have? May be it is listed in datalad wtf

@effigies
Copy link
Contributor

@effigies effigies commented Feb 28, 2019

$ conda list | grep github
pygithub                  1.34                     py36_0    conda-forge

@effigies
Copy link
Contributor

@effigies effigies commented Feb 28, 2019

Updated to pygithub 1.43.5:

$ datalad create-sibling-github test_repo
You need to authenticate with 'github' credentials. https://github.com/login provides information on how to gain access
user: effigies

password: 
password (repeat): 
[ERROR  ] 401 {'message': 'Must specify two-factor authentication OTP code.', 'documentation_url': 'https://developer.github.com/v3/auth#working-with-two-factor-authentication'} [Requester.py:__check:275] (GithubException) 
$ datalad create-sibling-github test_repo
[ERROR  ] 401 {'message': 'Must specify two-factor authentication OTP code.', 'documentation_url': 'https://developer.github.com/v3/auth#working-with-two-factor-authentication'} [Requester.py:__check:275] (GithubException) 

yarikoptic added 2 commits Mar 4, 2019
As reported by @effigies in datalad#3180 (comment)
and confirmed locally, that version of pygithub throws generic gh.GithubException
instead, so we should do local analysis ourselves
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 4, 2019

FWIW I have python{,3}-github package 1.40-1 from debian buster/sid.

With the changes here, if you already have hub.oauthtoken defined in your .git/config (or globally) it should have not come to the user/password authentication AFAIK if you used this branch. Could you please try running with datalad -l debug ... -- you should see a message like

[DEBUG  ] Selected 1 tokens out of 2 for the login datalad-tester

(in my case there is one token defined in ~/.gitconfig as well for another account).
Meanwhile, I've confirmed that with upgraded pygithub to 1.43.5 there is a regression of some kind when there is no token and we try to generate one - instead of TwoFactorException it throws GithubException. I have added a workaround (for now at least) in 669fa6d -- it MUST work now ;)

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 11, 2019

@effigies I see now that I have forgotten to bring it to your @ttention :-/ the things should work now for you, feedback would be appreciated!

@effigies
Copy link
Contributor

@effigies effigies commented Mar 11, 2019

@yarikoptic This works for me.

$ datalad create-sibling-github test_repo
GitHub credentials - effigies uses 2FA
Generate a GitHub token to proceed? If you already have a token for the account, just say 'no' now and specify it in config (hub.oauthtoken), otherwise say 'yes'  (choices: yes, no): yes

2FA one time password: 
Where to store token f9c...1bd?
Empty string would result in the token not being stored for future reuse, so you will have to adjust configuration manually (choices: global, local, ): global

[INFO   ] Stored hub.oauthtoken=f9c...1bd in global config. 
.: github(-) [https://github.com/effigies/test_repo.git (git)]
'https://github.com/effigies/test_repo.git' configured as sibling 'github' for <Dataset path=/home/cjmarkie/test_repo>

Thanks, this is way better UX than navigating GitHub to generate the token and then guess how to insert it into Datalad's prompts.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Mar 11, 2019

Thank you @effigies for trying it out! I should now find some time to add basic testing of all that UI logic so we have some assurance that it remains working ;-)

@yarikoptic yarikoptic removed this from the Release 0.11.4 milestone Mar 15, 2019
@yarikoptic yarikoptic added this to the 0.11.x milestone Mar 15, 2019
@yarikoptic yarikoptic mentioned this pull request Mar 15, 2019
4 tasks
yarikoptic added 4 commits Apr 22, 2019
Otherwise previous break in case of failed authentication causes it to just
return empty res (thanks the test to identify this issue).

Adds mocked up test for the logic of this function
…ctions using VCR

This is more of a prototype, which somewhat echoes the setup (I think) used by
PyGithub itself.  They pickle interactions for their tests.  I guess whenever Github
API or pygithub behavior changes, this VCR tape would no longer be valid, but that is
yet to be seen.

To produce this tape I generated new token on github, used it within my local account
configuration, then sanitized produced tape a bit to remove it

  sed -E "s,74463[^ ]{35},TOKEN,g" -i datalad/distribution/tests/vcr_cassettes/github_yarikoptic.yaml

and then removed it from github anyways, so there should be no authorization
detail leak in the VCR tape.  In the long run, if we decided to follow this setup
it all could be streamlined more
yarikoptic added 2 commits Apr 23, 2019
Someone on travis (only) for python 3.5 it was testing "as installed" which
lead to test failures, e.g.
https://travis-ci.org/datalad/datalad/jobs/523291128
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 23, 2019

Woohoo, so the testing approach seems to work! Please review, in particular to see if I didn't leak anything sensitive. Next I will try to address coverage of more of the diff and may be minor code shuffling/moving I have mentioned in the original message

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 24, 2019

grr, codecov didn't report after my last push with rewrite... will need to push more

@yarikoptic yarikoptic removed this from the 0.11.x milestone Apr 24, 2019
@yarikoptic yarikoptic added this to the Release 0.11.5 milestone Apr 24, 2019
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented May 7, 2019

@mvdoc Would you be kind/interested to test out this PR if it works for you -- should resolve 2FA issues

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented May 7, 2019

well, indeed, things couldn't become worse with this PR, so -- merging!

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented May 7, 2019

actually, for paranoid me -- would be nice to know what is in those binary blob responses:

+      string: !!binary |
+        H4sIAAAAAAAAA6VTyW7bMBT8F52jSFYdtzYQtIekRQ+FUSCHIhfiSXqSGVOkysWGa+TfO1qSKHGA
+        Au5NpDhDzvKOkTK11NEqKsmTojL27Dzb2Ng6uohkGa3my+Vs8SlLLyJtShbdVvTj5vaw3i4P99nX
+        QL/aTflN7fKHn1fru9tsffN9D2iwCgc33rdulSTUysta+k3ILwvTJGB3ybs3Wm6NE+eAkx6Km3nH
+        2p/JMWBBsjFmeyZHD+3Mcy7wmRwDFiQNNznbM1lG8DEZPh7B14ZcyUL8F+1rjik77dAi+za+ftOl
+        YwGCY1sY7RFS34WQPDXs8+56jieW7AorWy8NeqmDUp2VYsdWVpLRvoqUYyRETqBHpOUf6s6K1poH
+        LryLVt6G8UDfCemNPZz8HlUMrVnNnq2ppes40PfKKGX2cH+ykrruVxvfqDc6J/1+t9qFZfJcCvIY
+        jCydLeN0Hmcf7maLVZphzO67qWkB/ccZf2gZDOuJdiC9wfxCpITbLEZVeKjZa/Cd7j/tvMgtpduK
+        4KgGO4AF1FNuLMG9wYFcKgX9ghuS3XDDWI4H1fFrxV+MZhuck6S7jPtQKwrKDw8bA2HboOd9zBG8
+        KSd1L0iLwa8XhOTnaP3eiIoKvAy/fwdpuUGdBGvK1aQirSJwHyNNTedYZZlxh2upwHL5cXG1yNA9
+        JH/iWgWpsM0hMVw6B2j4Sh8f/wKjzBz6MwUAAA==

I will check on that first ;)

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented May 7, 2019

ok, looked at a few -- seems to be ok

$> xsel -o | sed -e 's,^+ *,,g' | base64 -d | gunzip | jq '.'
{
  "login": "datalad-tester",
  "id": 22943981,
  "node_id": "MDQ6VXNlcjIyOTQzOTgx",
  "avatar_url": "https://avatars1.githubusercontent.com/u/22943981?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/datalad-tester",
  "html_url": "https://github.com/datalad-tester",
  "followers_url": "https://api.github.com/users/datalad-tester/followers",
  "following_url": "https://api.github.com/users/datalad-tester/following{/other_user}",
  "gists_url": "https://api.github.com/users/datalad-tester/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/datalad-tester/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/datalad-tester/subscriptions",
  "organizations_url": "https://api.github.com/users/datalad-tester/orgs",
  "repos_url": "https://api.github.com/users/datalad-tester/repos",
  "events_url": "https://api.github.com/users/datalad-tester/events{/privacy}",
  "received_events_url": "https://api.github.com/users/datalad-tester/received_events",
  "type": "User",
  "site_admin": false,
  "name": null,
  "company": null,
  "blog": "http://datalad.org",
  "location": "Cyberspace",
  "email": null,
  "hireable": null,
  "bio": null,
  "public_repos": 11,
  "public_gists": 0,
  "followers": 0,
  "following": 0,
  "created_at": "2016-10-19T19:30:33Z",
  "updated_at": "2019-03-04T04:29:37Z",
  "private_gists": 0,
  "total_private_repos": 0,
  "owned_private_repos": 0,
  "disk_usage": 12,
  "collaborators": 0,
  "two_factor_authentication": true,
  "plan": {
    "name": "free",
    "space": 976562499,
    "collaborators": 0,
    "private_repos": 10000
  }

@yarikoptic yarikoptic merged commit 3c29ac4 into datalad:0.11.x May 7, 2019
4 of 5 checks passed
@yarikoptic yarikoptic deleted the enh-github branch May 7, 2019
yarikoptic added a commit that referenced this issue May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5': (96 commits)
  [DATALAD RUNCMD] make update-changelog
  Version boost and finalize CHANGELOG.md record
  ENH: new Makefile rule linkissues-changelog to link issues, which now will also be prerequisite for update-changelog
  CHANGELOG.md: Add entries for recently merged PRs
  ENH: require "distro" for python >= 3.8
  ENH: compat with python 3.8 which removed .dist -- try distro
  CLN: wtf: Remove unused (and duplicated) import
  DOC: wtf: Avoid double period in -S's description
  ENH: -D|--decor html_details -- to make it ready for pasting to github issue without clutter
  BF: assure bytes while giving to pyperclip upon its demand (on Py2)
  RF: move always present path + type "section" into "location" section, retain order of sections from cmdline
  RF: switch from nargs="*" to action=append for wtf -S
  ENH: wtf -S to specify which sections to query/display (by default -- all)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for gh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  ...
yarikoptic added a commit that referenced this issue May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5':
  BF: make test for url download more reliable in cases where connection fails
  RF: remove stale commented out duecredit in setup.py.  It has now its own section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants