Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validation performance #69

Merged
merged 1 commit into from Oct 4, 2018

Conversation

@diogocp
Copy link
Contributor

diogocp commented Sep 22, 2018

Do not attempt to extract the tarball if the MD5 matches. This makes validation an order of magnitude faster (in one test, from around 12 hours to less than 12 minutes).

Do not attempt to extract the tarball if the MD5 matches. This makes
validation an order of magnitude faster (in one test, from around 12
hours to less than 12 minutes).
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 22, 2018

Codecov Report

Merging #69 into master will decrease coverage by 1.94%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #69      +/-   ##
==========================================
- Coverage   96.88%   94.94%   -1.95%     
==========================================
  Files           2        2              
  Lines         257      257              
==========================================
- Hits          249      244       -5     
- Misses          8       13       +5
Impacted Files Coverage Δ
conda_mirror/conda_mirror.py 94.88% <20%> (-1.97%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f92926...b3648be. Read the comment docs.

@parente parente closed this Sep 25, 2018
@parente parente reopened this Sep 25, 2018
@parente

This comment has been minimized.

Copy link
Collaborator

parente commented Sep 27, 2018

@diogocp Thanks for the PR. I've looked through it and the change of logic to validate md5 and size before doing an extract makes sense to me.

Test coverage dropped quite a bit. I'd like to look at the tests a bit to make sure validation failures are being shortcircuited for the right reason.

@parente parente merged commit eaaeaf8 into Valassis-Digital-Media:master Oct 4, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@parente

This comment has been minimized.

Copy link
Collaborator

parente commented Oct 4, 2018

Got myself re-acquainted with the tests and project. I'm comfy merging what's here now. Thanks for the PR @diogocp. 🍰

@magnuhho

This comment has been minimized.

Copy link
Contributor

magnuhho commented Oct 15, 2018

Wouldn't it be better to check the size before the md5?
Size check is faster than hash calculation and if there is a size mismatch chances are pretty small the md5 is right. The other way around, chances are slim the size is NOT correct if the md5 validates.

@diogocp

This comment has been minimized.

Copy link
Contributor Author

diogocp commented Oct 15, 2018

Yes, it would be reasonable to do it that way. However, I expect that you will almost never see a validation failure, so it won't make much of a difference either way.

What really slowed down the process a lot was decompressing every single package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.