-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenStudio::UnzipFile extractAllFiles is very slow #4456
Comments
@shorowit You should be using the I've done some benchmarking, and indeed OpenStudio's Unzip file is slow. rubyzip apparently uses zlib, same as OpenStudio (which uses minizip which is based on zlib). But the rubyzip code is much, much more complicated, UnzipFile.cpp is less than 100 lines of code... OpenStudio/src/utilities/core/UnzipFile.cpp Lines 35 to 123 in c93fb58
UnzipFile in OpenStudio was never really meant to decompress 800MB archives... Benchmarking methodologyI created a first zip 'TestZipSmaller' where I zipped all E+ weather_data/ files. I then wrote this benchmark file to test : OpenStudio, rubyzip, and system (ubuntu 20.04). bench_unzip.rb require 'openstudio'
require 'zip'
require 'fileutils'
require 'json'
def test_openstudio(test_case)
destination = "OpenStudio-#{test_case}"
path = "#{test_case}.zip"
FileUtils.rm_rf(destination)
FileUtils.mkdir_p(destination)
t = Time.now
uf = OpenStudio::UnzipFile.new(path)
uf.extractAllFiles(destination)
t = Time.now - t
puts "OpenStudio-#{test_case}: #{t}"
return t
end
def test_rubyzip(test_case)
destination = "Ruby-#{test_case}"
path = "#{test_case}.zip"
FileUtils.rm_rf(destination)
FileUtils.mkdir_p(destination)
t = Time.now
Zip::File.open(path) do |zip_file|
zip_file.each do |f|
fpath = File.join(destination, f.name)
zip_file.extract(f, fpath) unless File.exist?(fpath)
end
end
t = Time.now - t
puts "Ruby-#{test_case}: #{t}"
return t
end
def test_system(test_case)
destination = "System-#{test_case}"
path = "#{test_case}.zip"
FileUtils.rm_rf(destination)
FileUtils.mkdir_p(destination)
t = Time.now
system("unzip #{path} -d #{destination} > /dev/null")
t = Time.now - t
puts "System-#{test_case}: #{t}"
return t
end
timings = {}
tests = ['TestZipSmaller', 'TestZipBigger', 'TestZipTwiceBigger', 'TestZipThriceBigger']
tests.each do |test_case|
timings[test_case] = {}
timings[test_case]['openstudio'] = test_openstudio(test_case)
timings[test_case]['rubyzip'] = test_rubyzip(test_case)
timings[test_case]['system'] = test_system(test_case)
end {
"TestZipSmaller": {
"openstudio": 0.326467787,
"rubyzip": 0.343753568,
"system": 0.297883918,
},
"TestZipBigger": {
"openstudio": 2.252606288,
"rubyzip": 1.495430877,
"system": 1.735657113,
},
"TestZipTwiceBigger": {
"openstudio": 6.975214381,
"rubyzip": 3.090054221,
"system": 4.047724282,
"Reference-openstudio": 6.975214381
},
"TestZipThriceBigger": {
"openstudio": 12.320878892,
"rubyzip": 4.761708978,
"system": 6.060392038,
}
} |
I found a way to avoid doing extra work. Using the ThriceBigger, before/after:
Re running my ruby benchmark code:
|
Just changing the size of the chunk improves it significantly.
rubyzip uses 32768: https://github.com/rubyzip/rubyzip/blob/e70e1d3080efc09fa83963b0b2b08116532ee760/lib/zip/decompressor.rb#L5 |
Current is the current implementation except you can set the chunksize. Mod is the other where I've made changes. Testing chunksize of 1024 to 65536.
|
Fix #4456 - Improve performance of OpenStudio::UnzipFile::extractAllFiles
Issue overview
The OpenStudio::UnzipFile extractAllFiles method is slower than it seems like it should be. It appears to get exponentially slower as the unzip operation progresses.
Unzip times for test below:
cc @aspeake
Current Behavior
Slow operation.
Expected Behavior
Faster operation.
Steps to Reproduce
Download this file: https://data.nrel.gov/system/files/156/BuildStock_TMY3_FIPS.zip
Create this unzip.rb script at the same location:
openstudio unzip.rb
and time the operationPossible Solution
Details
Environment
Some additional details about your environment for this issue (if relevant):
Context
The text was updated successfully, but these errors were encountered: