Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to resume interrupted downloads in FileDownloader by using HTTP range requests #6791

Merged
merged 8 commits into from Apr 28, 2020
Merged

Try to resume interrupted downloads in FileDownloader by using HTTP range requests #6791

merged 8 commits into from Apr 28, 2020

Conversation

flashdagger
Copy link
Contributor

@flashdagger flashdagger commented Apr 3, 2020

Changelog: Feature: Resume interrupted file downloads if server supports it.
Docs: Omit

Close #5708
Close #3610,

  • Refer to the issue that supports this Pull Request.
  • If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
  • I've read the Contributing guide.
  • I've followed the PEP8 style guides for Python code.
  • I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

@CLAassistant
Copy link

@CLAassistant CLAassistant commented Apr 3, 2020

CLA assistant check
All committers have signed the CLA.

@flashdagger flashdagger changed the title Feature/download resume Fix interrupted downloads in FileDownloader Apr 3, 2020
@flashdagger flashdagger changed the title Fix interrupted downloads in FileDownloader Try to resume interrupted downloads in FileDownloader by using HTTP range requests Apr 6, 2020
Copy link
Member

@memsharded memsharded left a comment

This is looking good, it is implemented elegantly and easy to understand with that recursion.

I would say that it is missing to cover the case where downloads are done to memory directly, not to file (arg path is None). Probably a unittest checking that will show a bug there (the return of the method is the actual contents of the file transferred in this case). Typically done for small Conan files only, but better be complete, just in case. Could you please have a look?

conans/client/rest/file_downloader.py Show resolved Hide resolved
conans/client/rest/file_downloader.py Outdated Show resolved Hide resolved
@flashdagger
Copy link
Contributor Author

@flashdagger flashdagger commented Apr 8, 2020

Thanks for your feedback. For the memory case I would need to carry the content over to the next iteration/recursion. Do you want me to introduce an additional parameter?

@memsharded
Copy link
Member

@memsharded memsharded commented Apr 8, 2020

Thanks for your feedback. For the memory case I would need to carry the content over to the next iteration/recursion. Do you want me to introduce an additional parameter?

Actually, as a second thought, I think it is good enough to avoid the recursion with resumable download when path=None. A unit test would be nice though, to make sure that it is covered, and it cannot resume the download and have corrupted data.

@flashdagger
Copy link
Contributor Author

@flashdagger flashdagger commented Apr 8, 2020

I see. I'll add one or two more test cases to cover the false path of the following condition:

if try_resume and file_path and os.path.exists(file_path):

flashdagger added 6 commits Apr 10, 2020
try to resume at the last position and append to wriiten file

retry as usual if not making any progress
…on if the filedownload doesn't make any progress.
the server sometimes doesn't respond with the "Content-Range" field
we rather throw an error then
@flashdagger flashdagger requested a review from memsharded Apr 10, 2020
Copy link
Member

@memsharded memsharded left a comment

Hi @flashdagger

Very nice work here, I think we are almost there. I had a couple of concerns/questions about the content length, so I have branched from your work here: https://github.com/memsharded/conan/tree/feature/download-resume-review

Please have a look to the comments and such branch (I tried to do a PR for review to your fork, but apparently I can't).

That branch also includes the minor changes to the Progress class to avoid private access.

Please let me know if that makes sense or not. Thanks!

conans/client/rest/file_downloader.py Outdated Show resolved Hide resolved
conans/client/rest/file_downloader.py Outdated Show resolved Hide resolved
conans/client/rest/file_downloader.py Outdated Show resolved Hide resolved
conans/client/rest/file_downloader.py Outdated Show resolved Hide resolved
@flashdagger
Copy link
Contributor Author

@flashdagger flashdagger commented Apr 10, 2020

Hello @memsharded
thanks for the suggestion. I'm not an HTTP protocol expert but you have to decide what you can rely on :-) Downloading files is a crucial part not only in conan and you would get soon feedback from the users if something is broken.
Since your changes look fine to me I added them to this PR.

@flashdagger flashdagger requested a review from memsharded Apr 10, 2020
Copy link
Member

@memsharded memsharded left a comment

Thanks @flashdagger !

It is looking good to me. I will ask for another review, to make sure, lets try to include this in next 1.25.

Thanks for all the efforts.

@memsharded memsharded added this to the 1.25 milestone Apr 11, 2020
@memsharded memsharded requested a review from jgsogo Apr 11, 2020
Copy link
Contributor Author

@flashdagger flashdagger left a comment

Some feedback from my company: We tried a huge package download from our Artifactory Pro server (v6.18) via the VPN. As the connection reliably drops at 1 GB, conan was able to continue from there. 😄
I figured out if you set the try_resume parameter to True by default it even works within the retry loop (e.g. when there is a network error). But then you have to make sure the file does not exist initially...

@memsharded
Copy link
Member

@memsharded memsharded commented Apr 19, 2020

Some feedback from my company: We tried a huge package download from our Artifactory Pro server (v6.18) via the VPN. As the connection reliably drops at 1 GB, conan was able to continue from there.

Excellent, great news!

I figured out if you set the try_resume parameter to True by default it even works within the retry loop (e.g. when there is a network error). But then you have to make sure the file does not exist initially...

Oh, yes, that would make sense, it would be a problem if the file was already there (and maybe corrupted) from previous Conan runs (not the retry loop), but in theory the mechanisms (the dirty flag, and removing files) should take care of this. Do you think then it is better to activate try_resume=True and check the file existence? Would you like to contribute it now or would it be ok as-is as a first approach?

@flashdagger
Copy link
Contributor Author

@flashdagger flashdagger commented Apr 19, 2020

Do you think then it is better to activate try_resume=True and check the file existence? Would you like to contribute it now or would it be ok as-is as a first approach?

I'm fine with the implementation so far. If users need more robust solutions they can provide more feedback and we can work on it. What do you think?

@memsharded
Copy link
Member

@memsharded memsharded commented Apr 19, 2020

I'm fine with the implementation so far. If users need more robust solutions they can provide more feedback and we can work on it. What do you think?

yeah, agree. Let's wait for feedback then.

jgsogo
jgsogo approved these changes Apr 28, 2020
Copy link
Member

@jgsogo jgsogo left a comment

I've been trying to find a flaw in the implementation but it looks like any scenario I can think of will result in the same behavior as before, so 💯. Really good job here, @flashdagger

@memsharded memsharded merged commit 0373d74 into conan-io:develop Apr 28, 2020
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants