Added Snappy compression #15

Merged
merged 1 commit into from Dec 19, 2012

5 participants

@bcotton

This branch adds streaming snappy compression support that is wire compatible with Java.

Currently this branch relies on a fork of the ruby snappy library. A pull request has been submitted to that project. https://github.com/watersofoblivion/snappy/tree/snappy-streaming

We've had both of these in production for months without issue.

@ept
ept commented Dec 6, 2012

This is cool — I was about to add producer support for compression, and then saw this :) Quick comments:

  • Do you think we could make the dependency on the snappy gem optional? Especially if it currently requires a forked version. For people not using Snappy, it's annoying to bloat the gemset with unused code, making deployment slower.
  • Adding Gemfile is a good idea, but adding Gemfile.lock to git is not recommended when developing a gem.
@bcotton

Sure. We'll take care of these.

@crawlik crawlik and 1 other commented on an outdated diff Dec 6, 2012
@@ -0,0 +1,7 @@
+source 'https://rubygems.org'
+
+# Specify your gem's dependencies in foo.gemspec
+gemspec
+
+gem "rake"
+gem "snappy", :git => "git://github.com/watersofoblivion/snappy.git", :branch => "snappy-streaming"
@crawlik
crawlik added a line comment Dec 6, 2012

You better include gems in gemspec file. Gemfile should stay unchanged since gem was generated by bundler.

@watersofoblivion
watersofoblivion added a line comment Dec 7, 2012

Snappy is included in the gemspec, but the gemspec does not allow (AFAIK) pulling from git, which is necessary until the main snappy repo accepts the pending pull request. Putting Snappy in the Gemfile overrides the location that the gemspec uses. That line will go away eventually.

Rake is not necessary for Kafka-rb's functionality, only for development. Therefore, it goes in the Gemfile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@crawlik

Noticed that Rakefile uses rcov. rcov doesn't support ruby 1.9.

@bcotton

Ok. Fixes these suggestions, squashed the commits and force pushed the branch.

Take another look.

@ept ept commented on the diff Dec 7, 2012
- s.summary = %q{A Ruby client for the Kafka distributed publish/subscribe messaging service}
-
- if s.respond_to? :specification_version then
- current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
- s.specification_version = 3
-
- if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
- s.add_development_dependency(%q<rspec>, [">= 0"])
- else
- s.add_dependency(%q<rspec>, [">= 0"])
- end
- else
- s.add_dependency(%q<rspec>, [">= 0"])
- end
-end
+spec = eval(File.open("kafka-rb.gemspec", "r").read)
@ept
ept added a line comment Dec 7, 2012

@bcotton Minor nit-pick: this doesn't close the file. Better would be:

spec = eval(File.read('kafka-rb.gemspec'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ept ept commented on an outdated diff Dec 7, 2012
@@ -14,10 +14,16 @@
# limitations under the License.
require 'socket'
require 'zlib'
+require 'snappy'
@ept
ept added a line comment Dec 7, 2012

@bcotton This forces use of the snappy gem. To make it optional, applications will have to require 'snappy' themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@bcotton

Missed that require. Thanks.
Pushed

@bcotton

got a few more edits. stand by :)

@bcotton

Thanks for the heads up on the compressed message sets. We are taking a look at this this morning.
How would you want to iterate on this from a git perspective?

@bcotton

Force pushed the branch again.

Compression

  • Fixed the compression/MessageSet problem.
  • Specs added.

In the process, discovered that Iconv was stripping high-bits from the message payload. Under 1.9, #force_encoding says "this string is encoded using encoding X", but under 1.8, Iconv says "convert this string (from the default ASCII) to encoding X." This was causing it to mutate the string which was already correctly encoded.

Snappy Dependency

  • Removed the dependency on snappy, and all specs pass with and without snappy present.
  • Moved ensure_snappy! into Message.
@ept

You're still missing this fix for the Producer. Otherwise I can't see any more issues :)

Bob Cotton Added support for Snappy compression to the Consumer, and added compr…
…ession (Snappy and GZip) to the Producer and MultiProducer.

To enable snappy compression, include the following in your Gemfile (This branch is necessary until a pending pull request is accepted):

    gem "snappy", "0.0.4", :git => "git://github.com/watersofoblivion/snappy.git", :branch => "snappy-streaming"
f4eda48
@ept

I tested the Ruby Snappy implementation against the Java implementation just now, and it appears that they are incompatible. Since the official Kafka producer/consumer library uses snappy-java, this patch does not work with the official client. :(

This screencast shows how I compress a message using watersofoblivion/snappy at 5497c71, and try to decompress it using snappy-java version 1.0.4.1. The decompression throws an exception: NativeException: java.io.IOException: FAILED_TO_UNCOMPRESS(5).

@ept

Sorry, that last screencast was flawed because I was trying to decompress a Kafka message object rather than just the snappy-compressed payload. Here is an example that uses only Snappy, not the Kafka message encoding. It demonstrates the incompatibility.

@watersofoblivion

Snappy::Reader and Snappy::Writer in the Ruby Snappy library correspond to the SnappyInputStream and SnappyOutputStream in the Java Snappy library. They compress the stream in chunks. These are the same classes that Kafka and Kafka-rb use to encode and decode the data.

Compressing with Snappy::Writer and trying to uncompress with Snappy.uncompress or compressing with Snappy.compress and uncompressing with Snappy::Reader will fail. They are not the same.

@ept
@watersofoblivion

The update to snappy was just merged into miyucy/snappy. When the new version of the gem is released to RubyGems, we'll update the pull request (docs, etc.).

@acrosa acrosa was assigned Dec 19, 2012
@acrosa acrosa merged commit cd3d312 into acrosa:master Dec 19, 2012
@elee elee referenced this pull request in cainus/Prozess Jul 1, 2013
Open

Add Compression Support In Producer #34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment