Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompressing 'replay.game.events' is very slow in rbzip2 #6

Open
RyanNielson opened this issue Mar 17, 2013 · 16 comments
Open

Decompressing 'replay.game.events' is very slow in rbzip2 #6

RyanNielson opened this issue Mar 17, 2013 · 16 comments

Comments

@RyanNielson
Copy link
Contributor

It was taking me upwards of 1m30s to decompress 'replay.game.events' using rbzip2. I switched to bzip2-ruby (wrapper around native implemention), and it's much faster. I know this gem is meant to be all Ruby, but that's a drastic hit in performance. It's fine for attributes and details, but game.events seems to just have so much more information.

Might be a good idea to switch back to bzip2-ruby, unless you're completely against adding some native dependencies. Like I said, not a huge deal until I tried to parse game events.

@cadwallion
Copy link
Contributor

I ran into this exact problem while working on functionality to parse all event data. I believe the original reason for the switch was primarily JRuby compatibility, so I would like to keep compat but definitely improve performance. Let me look into it more.

@RyanNielson
Copy link
Contributor Author

Ya, I could see the jruby compatibility being an issue. I plan to do a bunch of analysis on replays, and game event data was going to be a big part of that. Because of this, rbzip2 isn't really suitable. I'll switch it over for now in my fork, and I'll keep an eye on this to see if you find any good solutions.

Good luck!

@czarneckid
Copy link
Member

Any way to abstract out that part of the parser so you could specify via configuration option what underlying bzip library to use?

@RyanNielson
Copy link
Contributor Author

I did the change locally, and all it took was a change in the require statement and replacing one line.

The only issue I can see is since it uses native bzip2-ruby, installation of that gen may fail if there are headers missing or they can't compile it maybe. Definitely a good middle ground if we can get it to work though.

@logankoester
Copy link
Contributor

I commented some benchmarks comparing the two on the original pull request #2

An key goal of the tassadar library has always been to provide a pure-ruby alternative to the other replay parsers available, so despite the (significant) performance regression we will not be depending on bzip2-ruby in the agoragames fork.

My suggestion would be to rename your forked repository to something like tassadar-native and just publish a complimentary gem matching that name.

@logankoester
Copy link
Contributor

I would of course value any input from original author @guitsaru on this issue as well.

@cadwallion
Copy link
Contributor

I would like to look at data tonight before making any decisions. The library ran on a compiled dependency for quite a while before only recently changing, and I'd like to look at solutions that don't require forking a project for the sake of speed.

@RyanNielson
Copy link
Contributor Author

Ya, I agree @cadwallion, and I think the data will show the huge performance difference . Forking is an easy solution, and it would keep this version completely in Ruby. I could see this leading to confusion and making the two versions harder to keep in sync though. Some tasks just need to be closer to the metal, especially if this gem is to be improved to support game events, which I'm currently trying to do.

I'm fine with putting together a native fork and publishing it, but I'd like to avoid that if possible. Either via configuration variables some how, or making the pure ruby sacrifice here. I have a feeling bundler/rubygems may not allow conditional dependencies though.

Here are the changes I made, if you want to mimic them for your tests.

find_data.rb:

require 'bzip2' # Add at top of file
Bzip2.uncompress(data[1, data.size - 1]) # Replace RBzip2 function in FileData#decompress

tassadar.gemspec:

s.add_dependency("bzip2-ruby") # add at bottom

@guitsaru
Copy link
Contributor

The original goal when I wrote the gem was to be able to use this library from any computer that ruby was available without having to install extra dependencies. When using MRI, bzip2 is usually available at a system level, but JRuby support is also very important.

One other option would be to default to rbzip2, but use bzip2-ruby if it's available.

I've done this on my fork in the bzip2-ruby branch at #7.

@cadwallion
Copy link
Contributor

After looking at the benchmarks of rbzip2 and bzip2-ruby, as well as the implementation, I think there is an opportunity to improve the rbzip2 implementation to get the two closer. rbzip2 has had little in the way of optimizations and stands to have some significant improvement if some love was given to it. I was starting down this road last night when the commotion over #7 popped up.

I am willing to merge #7, but if my efforts to improve rbzip2 put it more in line with bzip2-ruby, especially with regards to parsing the game event data, I would prefer that over instituting an abstract configuration method.

@cadwallion
Copy link
Contributor

From a JRuby support standpoint, a binding around bzip2 using FFI instead of the C API would resolve the matter and still be performant.

@RyanNielson
Copy link
Contributor Author

Sounds like a reasonable alternative, and a way to keep it in Ruby. Can't say I can contribute much to the rbzip2 improvements as compression isn't really an area of expertise. Depending on how long it will take to improve rbzip2, maybe it's worth waiting on #7 instead of just reverting back to rbzip2 in the near future. That's if enough improvements can be made.

@cadwallion
Copy link
Contributor

I am going to sit on #7 for another 24 hours and see if I can make substantial progress on optimizing rbzip2 tonight, and gauge it from there.

@RyanNielson
Copy link
Contributor Author

@cadwallion Any chance to look into rbzip2 optimizations yet?

@cadwallion
Copy link
Contributor

Yes, I have profiled the rbzip2 library and found some pretty drastic performance problems with its Rbzip::Decompressor#get_and_move_to_front_decode implementation causing anywhere from 15-50x speed hits based on the total rotations per block. I had started to make some improvements to its capability but was hitting the wall on what I can do without having a better understanding on the subcomponents of the bzip2 algorithm.

I started an FFI-based implementation of bzip2 to give my brain a break from optimizations. Compression and Decompression appear functional though it's a limited implementation (currently handles only OSX, capable of cross-platform support). I'm hoping to have some time tonight to get it hooked up to Tassadar and benchmark all three side-by-side to determine if continuing to improve rbzip2 is worth it or if bzip2-ffi is a better bet for balancing speed and compatibility.

@RyanNielson
Copy link
Contributor Author

Good to hear, hopefully you find a great solution that meets all the project's needs and requirements. I'm probably just going to use native bzip2 locally for the time being, and work on parsing game events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants