High performance sync integration for Elastic Search
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
app
bin
config
db/migrate
lib
spec
.gitignore
.rspec
.rubocop.yml
.travis.yml
Gemfile
LICENSE
README.md
Rakefile
solidus_elastic_product.gemspec

README.md

Solidus Elastic Product

Build Status

This integration for Elastic Search provides a performance way to index products for Solidus ecommerce stores. To achieve that, products are concurrently serialized & uploaded with background jobs in batches.

The gem is used in production at:

Serialization of 500 products takes ~ 20 seconds. Already serialized 200K products can be uploaded to Elastic in ~ 10 mins.

The integration focuses on the backend synchronization of products with Elastic Search, and as such, does not have any frontend views, and does not construct any frontend search queries.

It has a dependency on the official Elasticsearch Model library and exposes its full interface through the Index and State classes.

Background

Existing integrations for indexing data to Elastic Search perform the serialization and upload operations on the fly, which does not allow for any optimizations to be added. Instead, by adding an intermediate storage for the serialized data, and separating the serialization and upload operations, we get to:

  • pre-load the data for serialization, thereby reducing sql lookups, avoiding N+1 query problem

  • upload batches of products to Elastic Search, reducing network trips and number of index operations performed by Elastic

  • inspect the serialized data with ease, as well as display it in an admin user interface

  • serialize and upload in parallel

  • do a full indexation of all products within minutes (~ 10 mins per 200K products) - which is useful in two situations:

    1. Change of elastic mappings
    2. Recover from cluster failure (this eliminates the need to pay for redundant search clusters)
  • perform inline update of the generated json data if only a single property is changed (such as view_count) avoiding full-reserialization of the product

Installation and quick start

Add solidus_elastic_product to your Gemfile:

gem 'solidus_elastic_product'

Bundle your dependencies and run the installation generator:

bundle
bundle exec rake railties:install:migrations
bundle exec rake db:migrate

Serialize all products for the first time

Solidus::ElasticProduct::Schedule.serialize_all

# monitor the serialized products or just tail the logs
Solidus::ElasticProduct::State.needing_upload.count

# once serialized (or can stop midway if testing), upload them all to elastic
Solidus::ElasticProduct::ReindexJob.perform_now

To connect to Elastic Search

Cleanest is really to place an ELASTICSEARCH_URL environmental variable, for example in .env. No such is necessary for localhost.

Workflow

To work with an intermediate storage of the serialized data, the following workflow has been set up:

  1. A corresponding one-to-one record in a Elastic::Product::State table is created for every product. It is used to store the state of an indexed product and consists of the following fields:
{
                             :id => nil,
                     :product_id => nil,
                           :json => nil,
                       :uploaded => false,
    :locked_for_serialization_at => nil,
           :locked_for_upload_at => nil
}

Fields:

  • json - string representation of a serialized product;
  • uploaded - boolean flag to indicate if the product has been synced with Elastic
  • locked_for_serialization_at - time when a worker has started processing the product for serialization
  • locked_for_upload_at - time when a worker has started uploading the product for Elastic Search

The two locked columns ensure that concurrent serialization and upload processes do not overlap each other.

  1. To serialize products:
  1. To upload products to Elastic:

To operate through Elastic Model

  • Use the Solidus::ElasticProduct::Index class to perform class operations (define index name, do mappings, perform search or manipulate the index)

  • Use the Solidus::ElasticProduct::State class to perform instance level operations with individual indexed products (update, destroy, etc..)

To configure Elastic Search settings and mappings

All of Elastic Search Model class methods are available through the Index class. So, you can directly customize them from an initializer:

# config/initializers/elastic_product.rb
Solidus::ElasticProduct::Index.index_name
Solidus::ElasticProduct::Index.document_type
Solidus::ElasticProduct::Index.mapping

For example, to change the default Elastic Search mappings, in an initializer (or Index decorator) do:

# config/initializers/elastic_product.rb
options = { ... }

Solidus::ElasticProduct::Index.mappings(options) do
  indexes :name,          type: 'string', analyzer: 'snowball'
  indexes :created_at,    type: 'date'
  indexes :taxons,        type: 'nested' do
    indexes :permaname,   type: 'keyword', index: 'not_analyzed'
    indexes :child do
      indexes :permaname, type: 'keyword', index: 'not_analyzed'
      indexes :child do
        indexes :permaname, type: 'keyword', index: 'not_analyzed'
      end
    end
  end
end

To customize the serialization

  1. Change the default indexed product hash

Just define a as_indexed_hash method in your Spree product_decorator. Your method will then take precedence. Ex:

def as_indexed_hash
  {
    name: name,
    popularity: indexed_popularity,
    view_count: view_count,
    image: display_image.as_indexed_hash
  }
end
  1. Change any other serialized class (Variant, Property, Image) - again, just define as_indexed_hash method on your class, and follow the default logic in the ElasticRepresentation module.

  2. Change the SerializationIterator preloader

You have two options:

a) redefine the full Serializer class by creating and specifying a Serializer class of your own.

```ruby
# config/initializers/spree.rb or so
Solidus::Elastic::Config.serializer_class = MyElasticSerializer
```

Your custom _serializer_class_ must respond to `#generate_json` method and define an ActiveRecord refinement method `#each_for_serialization` to preload associations. See the default [Product::Serializer](https://github.com/boomerdigital/solidus_elastic_search/blob/master/app/models/solidus/elastic/product/serializer.rb) as an example.

b) do the decorator drill, and for example re-define only the each_for_serialization iterator. Ex:

```ruby
# solidus/elastic_product/serializer_decorator.rb
module Solidus::ElasticProduct::Serializer::SerializationIterator
  refine ActiveRecord::Relation do
    def each_for_serialization &blk
      # your code
    end
  end
end
```

To set-up background workers

To perform the serialization, ideally, you'd have multiple single threaded processes as it is a CPU intensive task. A sidekiq example would be:

# config/deploy.rb
set :sidekiq_processes, 3

set :sidekiq_options_per_process, [
  "--config config/sidekiq.yml",
  "--config config/sidekiq-single-concurrency.yml",
  "--config config/sidekiq-single-concurrency.yml"
# config/sidekiq-single-concurrency.yml
:concurrency: 1
:queues:
  - elastic_serializer
  - paperclip

For upload - although you can upload in parallel, it could be advisable to avoid overwhelming the Elastic indexer with concurrent requests, but instead only have a single process single thread upload worker. The upload operation on the worker is not the bottleneck in this case, so there is little to gain in parallelizing that.

To run a sandbox app

cd spec/dummy
bin/rake db:drop
bin/rake db:reset
bin/rake spree_sample:load

Install ElasticSearch

  • Install Java - sudo apt-get install openjdk-8-jre
  • Follow elastic guide to install
  • Install Kibana for a user interface to elastic

Testing

First bundle your dependencies, then run rake. rake will default to building the dummy app if it does not exist, then it will run specs, and Rubocop static code analysis (not yet). The dummy app can be regenerated by using rake test_app.

bundle
bundle exec rake test_app

When testing your applications integration with this extension you may use it's factories. Simply add this require statement to your spec_helper:

require 'solidus_elastic_product/factories'

Copyright (c) 2016 Martin Tomov; Eric Anderson, released under the New BSD License