Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix download-url breakage reported in #2849 #3911

Merged
merged 2 commits into from
Dec 11, 2019
Merged

Fix download-url breakage reported in #2849 #3911

merged 2 commits into from
Dec 11, 2019

Conversation

mih
Copy link
Member

@mih mih commented Dec 10, 2019

This is the least-cost path that made it work for me. Here is the compact use case:

datalad remove -d bf2849 --nocheck ; datalad create bf2849; datalad -C bf2849 download-url https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css

The dataset deletion is critical for the demo, as otherwise git-annex seems to use prior knowledge about the key and doesn't fail on second attempt, even with a git reset --hard HEAD~1.

Fixes gh-2849

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject: [PATCH 2/2] RF: Bypass URL size check in download-url

Not fully confident here. We already have the file at this point,
so no info is lost.

Yeah, I think this should be fine.

Moreover, without this change, the original
use case from #2849
leads to:

download_url(ok): /tmp/this/open-iconic-bootstrap.min.css (file)
add(ok): open-iconic-bootstrap.min.css (file)
save(ok): . (dataset)
[WARNING] Running addurl resulted in stderr output: git-annex: addurl: 1 failed

CommandError: command 'addurl'
Error, annex reported failure for addurl (url='https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css'): {'command': 'addurl', 'success': False, 'error-messages': [' while adding a new url to an already annexed file, url does not have expected file size (use --relaxed to bypass this check) https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css'], 'file': 'open-iconic-bootstrap.min.css'}

Hmm, so I guess this URL is problematic because its header doesn't report its size?

$ curl -sI https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css | grep length

$ curl -sI https://www.datalad.org/img/logo/studyforrest.png | grep length
content-length: 6568

@mih
Copy link
Member Author

mih commented Dec 10, 2019

re URL: Yes, this URL doesn't come with size info. It implied incompatibility with download-url though -- which I think is a bit strict.

@kyleam
Copy link
Contributor

kyleam commented Dec 10, 2019

re URL: Yes, this URL doesn't come with size info. It implied incompatibility with download-url though

I'm not sure I understand the last sentence. Is it saying that it is known that calling git annex addurl with an existing file will fail if the header doesn't have size information?

which I think is a bit strict.

Sure, I wasn't objecting to this change. I was trying to understand why that URL failed. If you understood that it was due to a lack of size in the header, it'd be good to see that mentioned in the commit message.

@mih
Copy link
Member Author

mih commented Dec 10, 2019

re URL: Yes, this URL doesn't come with size info. It implied incompatibility with download-url though

I'm not sure I understand the last sentence. Is it saying that it is known that calling git annex addurl with an existing file will fail if the header doesn't have size information?

Yes.

which I think is a bit strict.

Sure, I wasn't objecting to this change. I was trying to understand why that URL failed. If you understood that it was due to a lack of size in the header, it'd be good to see that mentioned in the commit message.

Just re-read the my message. I agree that it is not clear from the text. Sorry, will fix.

@codecov
Copy link

codecov bot commented Dec 10, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@b11930d). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #3911   +/-   ##
=========================================
  Coverage          ?   78.86%           
=========================================
  Files             ?      274           
  Lines             ?    37883           
  Branches          ?        0           
=========================================
  Hits              ?    29878           
  Misses            ?     8005           
  Partials          ?        0
Impacted Files Coverage Δ
datalad/interface/download_url.py 35.22% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b11930d...da4d29e. Read the comment docs.

Not fully confident here. We already have the file at this point,
so no info is lost. Moreover, without this change, the original
use case from datalad#2849
leads to:

  download_url(ok): /tmp/this/open-iconic-bootstrap.min.css (file)
  add(ok): open-iconic-bootstrap.min.css (file)
  save(ok): . (dataset)
  [WARNING] Running addurl resulted in stderr output: git-annex: addurl: 1 failed

  CommandError: command 'addurl'
  Error, annex reported failure for addurl (url='https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css'): {'command': 'addurl', 'success': False, 'error-messages': ['  while adding a new url to an already annexed file, url does not have expected file size (use --relaxed to bypass this check) https://cdnjs.cloudflare.com/ajax/libs/open-iconic/1.1.1/font/css/open-iconic-bootstrap.min.css'], 'file': 'open-iconic-bootstrap.min.css'}

So any URL where the server doesn't provide a size would fail to work
with `download-url` by default. I think this is too strict behavior,
hence disabling that check.
@mih
Copy link
Member Author

mih commented Dec 11, 2019

Message is amended and force-pushed. No other code change, hence merging.

@mih mih merged commit a9f61f6 into datalad:master Dec 11, 2019
@mih mih deleted the bf-2849 branch December 11, 2019 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

download-url should record the download url in annex
2 participants