ActiveAvro is a gem used for automatically deriving the Avro schemas for subclasses of ActiveRecord::Base in a non-trivial way. When using the Avro RPC protocol one can write serialization schemas for each type individually and then embed type references in other types by using only the type's name. However when using just Avro schemas only a single root type may be defined so all embedded complex types must include their definitions (i.e. if a Person has many Pets a Pet record type definition embedded within the Person record type definition is required).
ActiveAvro requires Rails 3.x or later.
In your Gemfile:
gem active_avro, :git => "git://github.com/cfeduke/active_avro.git"
ActiveAvro includes a Rails generator which can be used to visually verify the schema that will be generated for your types.
rails generate active_avro:schema Person
(Where Person
is one of your models.)
ActiveAvro derives its configuration from two files in your Rails application's /config
directory:
active_avro_enums.yml
active_avro_ignore_filter.yml
Defines the ActiveRecord models that should be treated as enumerated values instead of full blown models. By
convention the id
attribute is assumed to be the enum's value and the name
attribute its name - and this is what is
placed in the schema definition. Because Avro's enums are zero based a 0-value name is automatically generated when
the enumeration's values are retrieved with the literal value "Unknown."
The options zero_name
, value_attribute_name
and name_attribute_name
are configurable through the
active_avro_enums.yml
file.
Example:
AdGroupType: { }
CampaignStatus: { zero_name: 'None', name_attribute_name: 'code' }
In the above example the ActiveRecord models AdGroupType and CampaignStatus will be placed in the schema definition of any class associated with them as enums instead of records.
Defines the classes and attributes to ignore when generating the schema. Some fields, like foreign key identifiers, may
not need to be serialized if the entire child class' hierarchy is included in the schema. Wildcard (as '*'
- ticks
required in a YAML file for the asterisk) and ActiveRecord class names can be placed in this file. Regular expressions
may be specified for attribute names.
Example:
AdStatus:
- ad_groups
'*':
- created_at
- updated_at
- !ruby/regexp '/.*_id$/'
- active_admin_comments
- ads
In the above example the base ActiveRecord model class is Ad
so we wildcard ignore anything that may include a
collection of ads, as well as several other attributes. Additionally the ad_groups
attribute for is of the AdStatus
ActiveRecord class type is also ignored.
Generation of schemas at runtime can be done but it is costly in terms of CPU time. Its recommended that schemas are generated once and kept in memory for the lifetime of your application. In memory schemas are necessary to serialize ActiveRecord class instances to Avro.
Use the ActiveAvro::Options
class to specify the path to your active_avro_enums.yml
and active_avro_ignore_filter.yml
files at runtime:
options = ActiveAvro::Options.new('config/active_avro_ignore_filter.yml', 'config/active_avro_enums.yml')
ad_schema = Schema.new(Ad, options)
A schema can be used to serialize the class hierarchy for the model class its based upon:
first_ad = Ad.first
ad_hash = ad_schema.cast(first_ad)
# at this point ad_hash is a Hash representative of the schema contained in ad_schema
# you can use the JSON library to serialize it for transmission
The Apache Avro library works with JSON schema definitions for Avro and hashes. The ActiveAvro gem was written specifically with this in mind:
require 'avro'
writer = StringIO.new
schema = Avro::Schema.parse(ad_schema.to_json)
dw = Avro::IO::DatumWriter.new(schema)
dw.write(ad_hash, Avro::IO::BinaryEncoder.new(writer))