Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4456 - Improve performance of OpenStudio::UnzipFile::extractAllFiles #4493

Merged
merged 4 commits into from Nov 15, 2021

Conversation

jmarrec
Copy link
Collaborator

@jmarrec jmarrec commented Nov 4, 2021

Pull request overview

Bench results, using a local zip that is about 53MB (every EnergyPlus weatherdata/ file TWICE), between the Current implementation on develop and this Modified one.

2021-11-04T16:33:33+01:00
Running Products/Zip_Benchmark
Run on (12 X 4100 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 9216 KiB (x1)
Load Average: 1.54, 1.21, 1.27
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_Current/1024/iterations:5        2367 ms         2363 ms            5
BM_Current/2048/iterations:5        2042 ms         2041 ms            5
BM_Current/4096/iterations:5        1908 ms         1907 ms            5
BM_Current/8192/iterations:5        1846 ms         1845 ms            5
BM_Current/16384/iterations:5       1787 ms         1786 ms            5
BM_Current/32768/iterations:5       1804 ms         1802 ms            5
BM_Current/65536/iterations:5       1729 ms         1728 ms            5
BM_Mod/1024/iterations:5            1603 ms         1601 ms            5
BM_Mod/2048/iterations:5            1392 ms         1389 ms            5
BM_Mod/4096/iterations:5            1249 ms         1248 ms            5
BM_Mod/8192/iterations:5            1234 ms         1233 ms            5
BM_Mod/16384/iterations:5           1157 ms         1156 ms            5
BM_Mod/32768/iterations:5           1092 ms         1091 ms            5
BM_Mod/65536/iterations:5           1063 ms         1062 ms            5

Note that it remains considerably slower than using ruby's own rubyzip gem which is included in the CLI so please use that instead

require 'zip'
Zip::File.open(path) do |zip_file|
    zip_file.each do |f|
      fpath = File.join(destination, f.name)
      zip_file.extract(f, fpath) unless File.exist?(fpath)
    end
  end

rubyzip is much more complicated than our implementation and infers stuff more effectively. If you need to unzip several hundred MB files, then use that!

As can be seen it's marginally faster than the previous version... Not much else we can do. Benchcode used is on #4456. Original benchmarking on #4456 was wrong: I realized only later that I needed to an expensive operation inside a loop to ensure correctness of the output...

image

Pull Request Author

  • Model API Changes / Additions
  • Any new or modified fields have been implemented in the EnergyPlus ForwardTranslator (and ReverseTranslator as appropriate)
  • Model API methods are tested (in src/model/test)
  • EnergyPlus ForwardTranslator Tests (in src/energyplus/Test)
  • If a new object or method, added a test in NREL/OpenStudio-resources: Add Link
  • If needed, added VersionTranslation rules for the objects (src/osversion/VersionTranslator.cpp)
  • Verified that C# bindings built fine on Windows, partial classes used as needed, etc.
  • All new and existing tests passes
  • If methods have been deprecated, update rest of code to use the new methods

Labels:

  • If change to an IDD file, add the label IDDChange
  • If breaking existing API, add the label APIChange
  • If deemed ready, add label Pull Request - Ready for CI so that CI builds your PR

Review Checklist

This will not be exhaustively relevant to every PR.

  • Perform a Code Review on GitHub
  • Code Style, strip trailing whitespace, etc.
  • All related changes have been implemented: model changes, model tests, FT changes, FT tests, VersionTranslation, OS App
  • Labeling is ok
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified

Bench results, using a local zip that is about 53MB (every EnergyPlus weatherdata/ file TWICE)

```
2021-11-04T16:33:33+01:00
Running Products/Zip_Benchmark
Run on (12 X 4100 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 9216 KiB (x1)
Load Average: 1.54, 1.21, 1.27
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_Current/1024/iterations:5        2367 ms         2363 ms            5
BM_Current/2048/iterations:5        2042 ms         2041 ms            5
BM_Current/4096/iterations:5        1908 ms         1907 ms            5
BM_Current/8192/iterations:5        1846 ms         1845 ms            5
BM_Current/16384/iterations:5       1787 ms         1786 ms            5
BM_Current/32768/iterations:5       1804 ms         1802 ms            5
BM_Current/65536/iterations:5       1729 ms         1728 ms            5
BM_Mod/1024/iterations:5            1603 ms         1601 ms            5
BM_Mod/2048/iterations:5            1392 ms         1389 ms            5
BM_Mod/4096/iterations:5            1249 ms         1248 ms            5
BM_Mod/8192/iterations:5            1234 ms         1233 ms            5
BM_Mod/16384/iterations:5           1157 ms         1156 ms            5
BM_Mod/32768/iterations:5           1092 ms         1091 ms            5
BM_Mod/65536/iterations:5           1063 ms         1062 ms            5
```
…ize to 32768, restore benchmarking to a workable state
@jmarrec jmarrec requested a review from tijcolem November 4, 2021 16:36
@jmarrec jmarrec self-assigned this Nov 4, 2021
@jmarrec jmarrec added the Pull Request - Ready for CI This pull request if finalized and is ready for continuous integration verification prior to merge. label Nov 4, 2021
@shorowit
Copy link
Contributor

shorowit commented Nov 5, 2021

Thanks for trying, @jmarrec. I didn't realize rubyzip is included in the CLI and so much faster. That works fine for our purposes. I'm fine if you want to close my issue -- whether you merge this or not.

@@ -32,7 +32,7 @@

namespace openstudio {

UnzipFile::UnzipFile(const openstudio::path& filename) : m_unzFile(unzOpen(openstudio::toString(filename).c_str())) {
UnzipFile::UnzipFile(const openstudio::path& filename) : m_unzFile(unzOpen(openstudio::toString(filename).c_str())), m_chunksize(32768) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing this change has such a big improvement.

@tijcolem tijcolem merged commit 6977f79 into develop Nov 15, 2021
@tijcolem tijcolem deleted the 4456_UnzipPerformance branch November 15, 2021 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component - Utilities Other Performance Pull Request - Ready for CI This pull request if finalized and is ready for continuous integration verification prior to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenStudio::UnzipFile extractAllFiles is very slow
4 participants