Much faster and lighter rails attribute serialization using json or Marshal, with gzip support
Ruby
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib Bump to 0.3.0 May 29, 2014
spec Add builders/encoders May 29, 2014
.document Version bump to 0.0.0 Jul 15, 2010
.gitignore ignore gems Jul 9, 2013
.rspec rspec options Jul 9, 2013
.ruby-gemset gemset Jul 9, 2013
.ruby-version rb 1.9.3 Jul 9, 2013
Gemfile Make gemmable Jul 9, 2013
Gemfile.lock Add builders/encoders May 29, 2014
Guardfile Gemfile, guard Jul 9, 2013
LICENSE.txt Make gemmable Jul 9, 2013
MIT-LICENSE Initial Commit Aug 21, 2009
README.rdoc Rails 3, Ruby 1.9 Jul 9, 2013
better_serialization.gemspec Allow any version of Rails 3 Sep 27, 2013
circle.yml STOP May 29, 2014
init.rb Initial Commit Aug 21, 2009

README.rdoc

Only works on Rails 3!

This has been updated to only work on Rails 3.

As well, it was only tested on Ruby 1.9.

Faster, Smaller, Gzip-able

This adds marshal_serialize and json_serialize to ActiveRecord::Base (class methods), which act similar to rails' own serialize except they don't use yaml, which is slooooow and biiiiig (byte-wise). Also supports adding gzip to the serialization process for really large stuff (can save a ton of database memory with minimal added insert time). Note that gzip columns must be of type “binary”, “blob”, etc (not text).

Note that there's a limitation with the plugin: you can't call destructive actions on attributes using better_serialization (ex. << for an array), and update_attribute etc won't work. See TODO for an explanation.

Example

see tests, but here's the gist:

class OrderLog < ActiveRecord::Base
  marshal_serialize :line_items_cache, :gzip => true
  marshal_serialize :product_cache
  json_serialize :customer_cache
end

order_log = OrderLog.new
product = Product.new(:name => "Woot")
order_log.product_cache = product

order_log.product_cache # => normal
order_log[:product_cache] # => raw Marshal.dump of product

Notes for json_serialize

JSON serialization is faster and smaller, but needs more configuration. This is because Marshal and the default YAML serialization dump the whole object, instance variables and all. JSON just stores an instances' attribute values. Thus, you need to tell it what class to instantiate, or tell it not to instantiate anything at all for storing a hash or array of data.

Also, when serializing ActiveRecord objects, if ActiveRecord::Base.include_root_in_json == true (the new rails defalut), it'll all 'just work'. Otherwise, you will need to use the :class_name option if it's not inferable by the association name, and will probably want modify that class's to_json to include the id.

The Problem With YAML Serialization: speed and size

NOTE: this code does not represent the BetterSerialization api, but rather the underlying ideas

The following examples came from the real-world objects that inspired the plugin.

lis is an 18-member array of line items that have many associations loaded into memory, the point being that they're relatively big objects.

marshal_data, yaml_data, json_data = nil

Benchmark.bm(7) do |x|
  x.report("Marshal") { marshal_data = Marshal.dump(lis) }
  x.report("to_yaml") { yaml_data    = lis.to_yaml       }
  x.report('to_json') { json_data    = lis.to_json       }
end

             user     system      total        real
Marshal  0.590000   0.000000   0.590000 (  0.605567)
to_yaml 11.730000   0.010000  11.740000 ( 12.179310)
to_json  0.010000   0.000000   0.010000 (  0.006129)

The reason JSON is so fast is that rails' ActiveRecord::Base#to_json only jsonifies the attributes hash, not the actual ruby object. If you want the full object though, Marshal is waaaaay faster than to_yaml, and it is representing the full ruby object.

Marshal is also a lot smaller than yaml:

>> marshal_data.size
=> 1119279
>> yaml_data.size
=> 1958610
>> json_data.size
=> 7886

Whammy. What about deserialization?

Benchmark.bm(7) do |x|
  x.report("Marshal") { Marshal.load(marshal_data) }
  x.report("to_yaml") { YAML::load(yaml_data) }
  x.report('to_json') { ActiveSupport::JSON.decode(json_data).collect {|hash| BuylistLineItem.new(hash)} }
end

             user     system      total        real
Marshal  0.040000   0.000000   0.040000 (  0.046616)
to_yaml  0.380000   0.000000   0.380000 (  0.380686)
to_json  0.020000   0.000000   0.020000 (  0.019096)

With json you can't just unmarshal it 'cuz it's just representing an attribute array, but even compensating with object creation it's still the fastest.

Finally, if you need to save some space, you can use gzip with the serialization:

>> Benchmark.measure {gz_marshal_data = Zlib::Deflate.deflate(marshal_data)}
=> #<Benchmark::Tms:0x10a7ee54 @utime=0.0300000000000011, @cstime=0.0, @cutime=0.0, @total=0.0300000000000011, @label="", @stime=0.0, @real=0.0292189121246338>
>> marshal_data.size - gz_marshal_data.size
=> 1036979
>> Benchmark.measure {Zlib::Inflate.inflate(marshal_data)}
=> #<Benchmark::Tms:0x10a8db98 @utime=0.0, @cstime=0.0, @cutime=0.0, @total=0.0, @label="", @stime=0.0, @real=0.00509905815124512>

So on this machine it would add 30ms on insert and 5ms on load, but save ~1mb of database space. Seems worth it if you're saving this much stuff.

EDIT: Also note that from personal experience, gzipping json will still use 1/10th the overall space with ~.5ms performance hit - WIN

Summary

If you want speed/small data and don't need to flash-freeze the whole object, use json. Otherwise, use Marshal. If you want to use minimal database space, add gzip to those (adds a relatively small performance hit). Use the default yaml if you really, really need to save the whole object and also have a more human-readable datastore… eh, actually, don't use yaml.

TODO

  • deserialization should be cached, and it would be nice if the deserialized value was stored in the attributes hash.

  • Allow destructive actions

    • The problem is that serializing and de-serializing creates new objects, so if you do object.array_attribute << “blah”, you will have deserialized a new object, destructively added a value to it, and thrown the new object away. ActiveRecord avoids this issue by calling to_yaml as a catch-all on the result of read_attribute in activerecord/lib/active_record/connection_adapters/abstract/quoting.rb#quote (as of 2.3.4). What we need is something like being able to specify what AR's catch-all case is (although that wouldn't work for some option/value combos, ex. gzip option on a string or date). Gratefully accepting ideas :)

    • Also, just thought of another way. have the raw data be in the raw_#{attribute} column, and object.attribute be a method that loads raw_#{attribute}. That + save hooks = win.

Copyright © 2009 Woody Peterson, released under the MIT license