AlgoliaSearch integration to your favorite ORM
Ruby
Latest commit 7da2fa1 Feb 19, 2017 @algoliareadmebot algoliareadmebot committed with redox Update README [skip ci] (#209)

README.md

Algolia Search API Client for Rails

Algolia Search is a hosted full-text, numerical, and faceted search engine capable of delivering realtime results from the first keystroke.

Build Status Gem Version Code Climate ActiveRecord Mongoid Sequel

This gem let you easily integrate the Algolia Search API to your favorite ORM. It's based on the algoliasearch-client-ruby gem. Rails 3.x, 4.x and 5.x are all supported.

You might be interested in the sample Ruby on Rails application providing a autocomplete.js-based auto-completion and instantsearch.js-based instant search results page: algoliasearch-rails-example.

API Documentation

You can find the full reference on Algolia's website.

Table of Contents

  1. Setup

  2. Usage

  3. Options

  4. Indices

  5. Testing

Setup

Install

gem install algoliasearch-rails

Add the gem to your Gemfile:

gem "algoliasearch-rails"

And run:

bundle install

Configuration

Create a new file config/initializers/algoliasearch.rb to setup your APPLICATION_ID and API_KEY.

AlgoliaSearch.configuration = { application_id: 'YourApplicationID', api_key: 'YourAPIKey' }

The gem is compatible with ActiveRecord, Mongoid and Sequel.

Timeouts

You can configure a various timeout thresholds by setting the following options at initialization time:

AlgoliaSearch.configuration = {
  application_id: 'YourApplicationID',
  api_key: 'YourAPIKey'
  connect_timeout: 2,
  receive_timeout: 30,
  send_timeout: 30,
  batch_timeout: 120,
  search_timeout: 5
}

Notes

This gem makes intensive use of Rails' callbacks to trigger the indexing tasks. If you're using methods bypassing after_validation, before_save or after_commit callbacks, it will not index your changes. For example: update_attribute doesn't perform validations checks, to perform validations when updating use update_attributes.

All methods injected by the AlgoliaSearch module are prefixed by algolia_ and aliased to the associated short names if they aren't already defined.

Contact.algolia_reindex! # <=> Contact.reindex!

Contact.algolia_search("jon doe") # <=> Contact.search("jon doe")

Usage

Index Schema

The following code will create a Contact index and add search capabilities to your Contact model:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    attribute :first_name, :last_name, :email
  end
end

You can either specify the attributes to send (here we restricted to :first_name, :last_name, :email) or not (in that case, all attributes are sent).

class Product < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    # all attributes will be sent
  end
end

You can also use the add_attribute method, to send all model attributes + extra ones:

class Product < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    # all attributes + extra_attr will be sent
    add_attribute :extra_attr
  end

  def extra_attr
    "extra_val"
  end
end

Relevancy

We provide many ways to configure your index allowing you to tune your overall index relevancy. The most important ones are the searchable attributes and the attributes reflecting record popularity.

class Product < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    # list of attribute used to build an Algolia record
    attributes :title, :subtitle, :description, :likes_count, :seller_name

    # the `searchableAttributes` (formerly known as attributesToIndex) setting defines the attributes
    # you want to search in: here `title`, `subtitle` & `description`.
    # You need to list them by order of importance. `description` is tagged as
    # `unordered` to avoid taking the position of a match into account in that attribute.
    searchableAttributes ['title', 'subtitle', 'unordered(description)']

    # the `customRanking` setting defines the ranking criteria use to compare two matching
    # records in case their text-relevance is equal. It should reflect your record popularity.
    customRanking ['desc(likes_count)']
  end

end

Indexing

To index a model, simple call reindex on the class:

Product.reindex

To index all of your models, you can do something like this:

Rails.application.eager_load! # Ensure all models are loaded (required in development).

algolia_models = ActiveRecord::Base.descendants.select{ |model| model.respond_to?(:reindex) }

algolia_models.each(&:reindex)

Frontend Search (realtime experience)

Traditional search implementations tend to have search logic and functionality on the backend. This made sense when the search experience consisted of a user entering a search query, executing that search, and then being redirected to a search result page.

Implementing search on the backend is no longer necessary. In fact, in most cases it is harmful to performance because of added network and processing latency. We highly recommend the usage of our JavaScript API Client issuing all search requests directly from the end user's browser, mobile device, or client. It will reduce the overall search latency while offloading your servers at the same time.

The JS API client is part of the gem, just require algolia/v3/algoliasearch.min somewhere in your JavaScript manifest, for example in application.js if you are using Rails 3.1+:

//= require algolia/v3/algoliasearch.min

Then in your JavaScript code you can do:

var client = algoliasearch(ApplicationID, Search-Only-API-Key);
var index = client.initIndex('YourIndexName');
index.search('something', { hitsPerPage: 10, page: 0 })
  .then(function searchDone(content) {
    console.log(content)
  })
  .catch(function searchFailure(err) {
    console.error(err);
  });

We recently (March 2015) released a new version (V3) of our JavaScript client, if you were using our previous version (V2), read the migration guide

Backend Search

Notes: We recommend the usage of our JavaScript API Client to perform queries directly from the end-user browser without going through your server.

A search returns ORM-compliant objects reloading them from your database. We recommend the usage of our JavaScript API Client to perform queries to decrease the overall latency and offload your servers.

hits =  Contact.search("jon doe")
p hits
p hits.raw_answer # to get the original JSON raw answer

A highlight_result attribute is added to each ORM object:

hits[0].highlight_result['first_name']['value']

If you want to retrieve the raw JSON answer from the API, without re-loading the objects from the database, you can use:

json_answer = Contact.raw_search("jon doe")
p json_answer
p json_answer['hits']
p json_answer['facets']

Search parameters can be specified either through the index's settings statically in your model or dynamically at search time specifying search parameters as second argument of the search method:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    attribute :first_name, :last_name, :email

    # default search parameters stored in the index settings
    minWordSizeForApprox1 4
    minWordSizeForApprox2 8
    hitsPerPage 42
  end
end
# dynamical search parameters
p Contact.raw_search("jon doe", { :hitsPerPage => 5, :page => 2 })

Backend Pagination

Even if we highly recommend to perform all search (and therefore pagination) operations from your frontend using JavaScript, we support both will_paginate and kaminari as pagination backend.

To use :will_paginate, specify the :pagination_backend as follow:

AlgoliaSearch.configuration = { application_id: 'YourApplicationID', api_key: 'YourAPIKey', pagination_backend: :will_paginate }

Then, as soon as you use the search method, the returning results will be a paginated set:

# in your controller
@results = MyModel.search('foo', hitsPerPage: 10)

# in your views
# if using will_paginate
<%= will_paginate @results %>

# if using kaminari
<%= paginate @results %>

Tags

Use the tags method to add tags to your record:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    tags ['trusted']
  end
end

or using dynamical values:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    tags do
      [first_name.blank? || last_name.blank? ? 'partial' : 'full', has_valid_email? ? 'valid_email' : 'invalid_email']
    end
  end
end

At query time, specify { tagFilters: 'tagvalue' } or { tagFilters: ['tagvalue1', 'tagvalue2'] } as search parameters to restrict the result set to specific tags.

Faceting

Facets can be retrieved calling the extra facets method of the search answer.

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    # [...]

    # specify the list of attributes available for faceting
    attributesForFaceting [:company, :zip_code]
  end
end
hits = Contact.search("jon doe", { :facets => '*' })
p hits                    # ORM-compliant array of objects
p hits.facets             # extra method added to retrieve facets
p hits.facets['company']  # facet values+count of facet 'company'
p hits.facets['zip_code'] # facet values+count of facet 'zip_code'
raw_json = Contact.raw_search("jon doe", { :facets => '*' })
p raw_json['facets']

Faceted search

You can also search for facet values.

Product.search_for_facet_values('category', 'Headphones') # Array of {value, highlighted, count}

This method can also take any parameter a query can take. This will adjust the search to only hits which would have matched the query.

# Only sends back the categories containing red Apple products (and only counts those)
Product.search_for_facet_values('category', 'phone', {
  query: 'red',
  filters: 'brand:Apple'
}) # Array of phone categories linked to red Apple products

Group by

More info on distinct for grouping can be found here.

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    # [...]

    # specify the attribute to be used for distinguishing the records
    # in this case the records will be grouped by company
    attributeForDistinct "company"
  end
end

Geo-Search

Use the geoloc method to localize your record:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    geoloc :lat_attr, :lng_attr
  end
end

At query time, specify { aroundLatLng: "37.33, -121.89", aroundRadius: 50000 } as search parameters to restrict the result set to 50KM around San Jose.

Options

Auto-indexing & asynchronism

Each time a record is saved; it will be - asynchronously - indexed. On the other hand, each time a record is destroyed, it will be - asynchronously - removed from the index. That means that a network call with the ADD/DELETE operation is sent synchronously to the Algolia API but then the engine will asynchronously process the operation (so if you do a search just after, the results may not reflect it yet).

You can disable auto-indexing and auto-removing setting the following options:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch auto_index: false, auto_remove: false do
    attribute :first_name, :last_name, :email
  end
end

Temporary disable auto-indexing

You can temporary disable auto-indexing using the without_auto_index scope. This is often used for performance reason.

Contact.delete_all
Contact.without_auto_index do
  1.upto(10000) { Contact.create! attributes } # inside this block, auto indexing task will not run.
end
Contact.reindex! # will use batch operations

Queues & background jobs

You can configure the auto-indexing & auto-removal process to use a queue to perform those operations in background. ActiveJob (Rails >=4.2) queues are used by default but you can define your own queuing mechanism:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch enqueue: true do # ActiveJob will be triggered using a `algoliasearch` queue
    attribute :first_name, :last_name, :email
  end
end

Things to Consider

If you are performing updates & deletions in the background then a record deletion can be committed to your database prior to the job actually executing. Thus if you were to load the record to remove it from the database than your ActiveRecord#find will fail with a RecordNotFound.

In this case you can bypass loading the record from ActiveRecord and just communicate with the index directly:

class MySidekiqWorker
  def perform(id, remove)
    if remove
      # the record has likely already been removed from your database so we cannot
      # use ActiveRecord#find to load it
      index = Algolia::Index.new("index_name")
      index.delete_object(id)
    else
      # the record should be present
      c = Contact.find(id)
      c.index!
    end
  end
end

With Sidekiq

If you're using Sidekiq:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch enqueue: :trigger_sidekiq_worker do
    attribute :first_name, :last_name, :email
  end

  def self.trigger_sidekiq_worker(record, remove)
    MySidekiqWorker.perform_async(record.id, remove)
  end
end

class MySidekiqWorker
  def perform(id, remove)
    c = Contact.find(id)
    remove ? c.remove_from_index! : c.index!
  end
end

With DelayedJob

If you're using delayed_job:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch enqueue: :trigger_delayed_job do
    attribute :first_name, :last_name, :email
  end

  def self.trigger_delayed_job(record, remove)
    if remove
      record.delay.remove_from_index!
    else
      record.delay.index!
    end
  end
end

Synchronism & testing

You can force indexing and removing to be synchronous (in that case the gem will call the wait_task method to ensure the operation has been taken into account once the method returns) by setting the following option: (this is NOT recommended, except for testing purpose)

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch synchronous: true do
    attribute :first_name, :last_name, :email
  end
end

Custom index name

By default, the index name will be the class name, e.g. "Contact". You can customize the index name by using the index_name option:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch index_name: "MyCustomName" do
    attribute :first_name, :last_name, :email
  end
end

Per-environment indices

You can suffix the index name with the current Rails environment using the following option:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch per_environment: true do # index name will be "Contact_#{Rails.env}"
    attribute :first_name, :last_name, :email
  end
end

Custom attribute definition

You can use a block to specify a complex attribute value

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    attribute :email
    attribute :full_name do
      "#{first_name} #{last_name}"
    end
    add_attribute :full_name2
  end

  def full_name2
    "#{first_name} #{last_name}"
  end
end

Notes: As soon as you use such code to define extra attributes, the gem is not anymore able to detect if the attribute has changed (the code uses Rails's #{attribute}_changed? method to detect that). As a consequence, your record will be pushed to the API even if its attributes didn't change. You can work-around this behavior creating a _changed? method:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch do
    attribute :email
    attribute :full_name do
      "#{first_name} #{last_name}"
    end
  end

  def full_name_changed?
    first_name_changed? || last_name_changed?
  end
end

Nested objects/relations

You can easily embed nested objects defining an extra attribute returning any JSON-compliant object (an array or a hash or a combination of both).

class Profile < ActiveRecord::Base
  include AlgoliaSearch

  belongs_to :user
  has_many :specializations

  algoliasearch do
    attribute :user do
      # restrict the nested "user" object to its `name` + `email`
      { name: user.name, email: user.email }
    end
    attribute :public_specializations do
      # build an array of public specialization (include only `title` and `another_attr`)
      specializations.select { |s| s.public? }.map do |s|
        { title: s.title, another_attr: s.another_attr }
      end
    end
  end

end

Custom objectID

By default, the objectID is based on your record's id. You can change this behavior specifying the :id option (be sure to use a uniq field).

class UniqUser < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch id: :uniq_name do
  end
end

Restrict indexing to a subset of your data

You can add constraints controlling if a record must be indexed by using options the :if or :unless options.

It allows you to do conditional indexing on a per document basis.

class Post < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch if: :published?, unless: :deleted? do
  end

  def published?
    # [...]
  end

  def deleted?
    # [...]
  end
end

Notes: As soon as you use those constraints, addObjects and deleteObjects calls will be performed in order to keep the index synced with the DB (The state-less gem doesn't know if the object don't match your constraints anymore or never matched, so we force ADD/DELETE operations to be sent). You can work-around this behavior creating a _changed? method:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch if: :published do
  end

  def published
    # true or false
  end

  def published_changed?
    # return true only if you know that the 'published' state changed
  end
end

You can index a subset of your records using either:

# will generate batch API calls (recommended)
MyModel.where('updated_at > ?', 10.minutes.ago).reindex!

or

MyModel.index_objects MyModel.limit(5)

Sanitizer

You can sanitize all your attributes using the sanitize option. It will strip all HTML tags from your attributes.

class User < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch per_environment: true, sanitize: true do
    attributes :name, :email, :company
  end
end

If you're using Rails 4.2+, you also need to depend on rails-html-sanitizer:

gem 'rails-html-sanitizer'

UTF-8 Encoding

You can force the UTF-8 encoding of all your attributes using the force_utf8_encoding option:

class User < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch force_utf8_encoding: true do
    attributes :name, :email, :company
  end
end

Notes: This option is not compatible with Ruby 1.8

Exceptions

You can disable exceptions that could be raised while trying to reach Algolia's API by using the raise_on_failure option:

class Contact < ActiveRecord::Base
  include AlgoliaSearch

  # only raise exceptions in development env
  algoliasearch raise_on_failure: Rails.env.development? do
    attribute :first_name, :last_name, :email
  end
end

Configuration example

Here is a real-word configuration example (from HN Search):

class Item < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch per_environment: true do
    # the list of attributes sent to Algolia's API
    attribute :created_at, :title, :url, :author, :points, :story_text, :comment_text, :author, :num_comments, :story_id, :story_title

    # integer version of the created_at datetime field, to use numerical filtering
    attribute :created_at_i do
      created_at.to_i
    end

    # `title` is more important than `{story,comment}_text`, `{story,comment}_text` more than `url`, `url` more than `author`
    # btw, do not take into account position in most fields to avoid first word match boost
    searchableAttributes ['unordered(title)', 'unordered(story_text)', 'unordered(comment_text)', 'unordered(url)', 'author']

    # tags used for filtering
    tags do
      [item_type, "author_#{author}", "story_#{story_id}"]
    end

    # use associated number of HN points to sort results (last sort criteria)
    customRanking ['desc(points)', 'desc(num_comments)']

    # google+, $1.5M raises, C#: we love you
    separatorsToIndex '+#$'
  end

  def story_text
    item_type_cd != Item.comment ? text : nil
  end

  def story_title
    comment? && story ? story.title : nil
  end

  def story_url
    comment? && story ? story.url : nil
  end

  def comment_text
    comment? ? text : nil
  end

  def comment?
    item_type_cd == Item.comment
  end

  # [...]
end

Indices

Manual indexing

You can trigger indexing using the index! instance method.

c = Contact.create!(params[:contact])
c.index!

Manual removal

And trigger index removing using the remove_from_index! instance method.

c.remove_from_index!
c.destroy

Reindexing

The gem provides 2 ways to reindex all your objects:

Atomical reindexing

To reindex all your records (taking into account the deleted objects), the reindex class method indices all your objects to a temporary index called <INDEX_NAME>.tmp and moves the temporary index to the final one once everything is indexed (atomically). This is the safest way to reindex all your content.

Contact.reindex

Notes: if you're using an index-specific API key, ensure you're allowing both <INDEX_NAME> and <INDEX_NAME>.tmp.

Regular reindexing

To reindex all your objects in place (without temporary index and therefore without deleting removed objects), use the reindex! class method:

Contact.reindex!

Clearing an index

To clear an index, use the clear_index! class method:

Contact.clear_index!

Using the underlying index

You can access the underlying index object by calling the index class method:

index = Contact.index
# index.get_settings, index.partial_update_object, ...

Primary/replica

You can define replica indices using the add_replica method. Use inherit: true on the replica block if you want it to inherit from the primary settings.

class Book < ActiveRecord::Base
  attr_protected

  include AlgoliaSearch

  algoliasearch per_environment: true do
    searchableAttributes [:name, :author, :editor]

    # define a replica index to search by `author` only
    add_replica 'Book_by_author', per_environment: true do
      searchableAttributes [:author]
    end

    # define a replica index with custom ordering but same settings than the main block
    add_replica 'Book_custom_order', inherit: true, per_environment: true do
      customRanking ['asc(rank)']
    end
  end

end

To search using a replica, use the following code:

Book.raw_search 'foo bar', replica: 'Book_by_editor'
# or
Book.search 'foo bar', replica: 'Book_by_editor'

Share a single index

It can make sense to share an index between several models. In order to implement that, you'll need to ensure you don't have any conflict with the objectID of the underlying models.

class Student < ActiveRecord::Base
  attr_protected

  include AlgoliaSearch

  algoliasearch index_name: 'people', id: :algolia_id do
    # [...]
  end

  private
  def algolia_id
    "student_#{id}" # ensure the teacher & student IDs are not conflicting
  end
end

class Teacher < ActiveRecord::Base
  attr_protected

  include AlgoliaSearch

  algoliasearch index_name: 'people', id: :algolia_id do
    # [...]
  end

  private
  def algolia_id
    "teacher_#{id}" # ensure the teacher & student IDs are not conflicting
  end
end

Notes: If you target a single index from several models, you must never use MyModel.reindex and only use MyModel.reindex!. The reindex method uses a temporary index to perform an atomic reindexing: if you use it, the resulting index will only contain records for the current model because it will not reindex the others.

Target multiple indices

You can index a record in several indices using the add_index method:

class Book < ActiveRecord::Base
  attr_protected

  include AlgoliaSearch

  PUBLIC_INDEX_NAME  = "Book_#{Rails.env}"
  SECURED_INDEX_NAME = "SecuredBook_#{Rails.env}"

  # store all books in index 'SECURED_INDEX_NAME'
  algoliasearch index_name: SECURED_INDEX_NAME do
    searchableAttributes [:name, :author]
    # convert security to tags
    tags do
      [released ? 'public' : 'private', premium ? 'premium' : 'standard']
    end

    # store all 'public' (released and not premium) books in index 'PUBLIC_INDEX_NAME'
    add_index PUBLIC_INDEX_NAME, if: :public? do
      searchableAttributes [:name, :author]
    end
  end

  private
  def public?
    released && !premium
  end

end

To search using an extra index, use the following code:

Book.raw_search 'foo bar', index: 'Book_by_editor'
# or
Book.search 'foo bar', index: 'Book_by_editor'

Testing

Notes

To run the specs, please set the ALGOLIA_APPLICATION_ID and ALGOLIA_API_KEY environment variables. Since the tests are creating and removing indexes, DO NOT use your production account.

You may want to disable all indexing (add, update & delete operations) API calls, you can set the disable_indexing option:

class User < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch :per_environment => true, :disable_indexing => Rails.env.test? do
  end
end

class User < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch :per_environment => true, :disable_indexing => Proc.new { Rails.env.test? || more_complex_condition } do
  end
end

Or you may want to mock Algolia's API calls. We provide a WebMock sample configuration that you can use including algolia/webmock:

require 'algolia/webmock'

describe 'With a mocked client' do

  before(:each) do
    WebMock.enable!
  end

  it "shouldn't perform any API calls here" do
    User.create(name: 'My Indexed User')  # mocked, no API call performed
    User.search('').should == {}          # mocked, no API call performed
  end

  after(:each) do
    WebMock.disable!
  end

end