Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: RIA clone HTTP403 handling and support authenticated access #5025

Merged
merged 2 commits into from Oct 19, 2020

Conversation

mih
Copy link
Member

@mih mih commented Oct 13, 2020

This PR essentially removes a custom HTTP GET request handling and uses a new (but pretty trivial) download helper.

Fixes gh-5023

... although it could be made better by supporting git-credentials as a credential sources in DataLad. This, however, would only make internal changes and no have consequences for the proposed change in clone.

A significant annoyance is that low-level code insists on reporting INFO level log messages that will no report URL to be downloaded to short-lived temporary files with cryptic names. But at least it is functioning properly for now.

mih added a commit to mih/datalad that referenced this pull request Oct 13, 2020
This uses a newly established helper (from dataladgh-5025).

However, while this is working with `git annex get`, it does not with
`datalad get` -- unclear why.
Copy link
Member

@bpoldrack bpoldrack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it could be better but provides a necessary step forward. Eventually we probably want to iron out the downloader/provider/authenticator stuff and use its parts in a more sophisticated way.

@mih mih added the merge-if-ok OP considers this work done, and requests review/merge label Oct 13, 2020
)
config_content = response.text
except requests.RequestException as e:
with TemporaryDirectory() as _dest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mih: Looking at the failure in CrippledFS, using TemporaryDirectory might be the issue. We have dedicated helpers for temp locations that deal with several env vars/configs to figure out where exactly to put it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Just like the interactive/authentication issue below, the CrippledFS failure actually seem to depend on moon phases and the color of my socks. grmpf
However, we ran into inconsistent temp location issues before, so I still think it's better to get that directory from the helpers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just fetch content via our downloaders (used by download_url) directly without saving into a file. E.g.

$> python -c 'from datalad.downloaders.providers import Providers; url="http://datasets.datalad.org/.git/config"; print(Providers.from_config_files().fetch(url))' | head
[INFO] Fetching 'http://datasets.datalad.org/.git/config' 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	sharedrepository = 2
[receive]
...

so there is .fetch to complement .download, and thus you do not need any temp files AFAIK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented now

@bpoldrack
Copy link
Member

FTR: Looking into Travis failure. Actual error message implies an is_interactive issue, since it's trying to interactively ask a yesno, when we aren't interactive. Underlying, however, is an authentication issue it seems:

datalad.downloaders: Level 5: Calling out into <bound method BaseDownloader._download of HTTPDownloader(authenticator=None, credential=None)> for https://github.com/datalad/example-dicom-functional/blob/master/dicoms/MR.1.3.46.670589.11.38317.5.0.4476.2014042516042547586

datalad.downloaders: DEBUG: Authenticated access was denied: Access to https://github.com/datalad/example-dicom-functional/blob/master/dicoms/MR.1.3.46.670589.11.38317.5.0.4476.2014042516042547586 has failed: status code 429 [http.py:check_response_status:107]

Question is how we can end up there, when no authentication is required and HTTPDownloader(authenticator=None, credential=None) suggests that we don't try.

@codecov
Copy link

codecov bot commented Oct 14, 2020

Codecov Report

Merging #5025 into maint will decrease coverage by 0.03%.
The diff coverage is 82.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #5025      +/-   ##
==========================================
- Coverage   89.94%   89.91%   -0.04%     
==========================================
  Files         292      292              
  Lines       40849    40866      +17     
==========================================
+ Hits        36742    36744       +2     
- Misses       4107     4122      +15     
Impacted Files Coverage Δ
datalad/core/distributed/clone.py 89.05% <25.00%> (+0.29%) ⬆️
datalad/support/network.py 86.90% <83.33%> (-0.05%) ⬇️
datalad/downloaders/tests/test_http.py 88.52% <100.00%> (-1.62%) ⬇️
datalad/downloaders/http.py 81.85% <0.00%> (-2.71%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf97236...4439834. Read the comment docs.

@bpoldrack
Copy link
Member

FTR: Tried to debug by pushing a commit with additional logging, but the issue didn't replicate. Of f***** course! :-/

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be fetched without tempfiles

datalad/utils.py Outdated
@@ -2531,6 +2531,39 @@ def split_cmdline(s):
return args


def download_url(url, dest, overwrite=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if anywhere, better to be in support.network I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: I am interpreting this comment as "should be moved" and not as "should not exist", although both interpretations match the statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

)
config_content = response.text
except requests.RequestException as e:
with TemporaryDirectory() as _dest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just fetch content via our downloaders (used by download_url) directly without saving into a file. E.g.

$> python -c 'from datalad.downloaders.providers import Providers; url="http://datasets.datalad.org/.git/config"; print(Providers.from_config_files().fetch(url))' | head
[INFO] Fetching 'http://datasets.datalad.org/.git/config' 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	sharedrepository = 2
[receive]
...

so there is .fetch to complement .download, and thus you do not need any temp files AFAIK

It intentionally does not support all of the capabilities of
downloaders/providers.
Previous request result handling assumed that HTTP error codes
necessarily result in exceptions, which they do not.

Instead of implementing error handling per request, use an existing
abstraction.

Fixes dataladgh-5023

... although it could be made better by supporting git-credentials as
a credential sources in DataLad.
mih added a commit to mih/datalad that referenced this pull request Oct 19, 2020
This uses a newly established helper (from dataladgh-5025).

However, while this is working with `git annex get`, it does not with
`datalad get` -- unclear why.
@mih
Copy link
Member Author

mih commented Oct 19, 2020

OK, reworked. Rebuilding #5024 on top of the current state.

@yarikoptic yarikoptic merged commit ff2fcf3 into datalad:maint Oct 19, 2020
2 of 4 checks passed
@mih mih deleted the bf-5023-1 branch October 19, 2020 14:46
@yarikoptic yarikoptic added this to the 0.13.5 milestone Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants