Skip to content
jkraemer edited this page Sep 13, 2010 · 2 revisions

Using the integrated DRb Server

What’s this good for?

In production environments most often multiple processes are responsible for serving client requests. Sometimes these processes are even spread across several physical machines.

Just like the database, the Ferret index of an application is a unique resource that has to be shared among all servers. To achieve this, acts_as_ferret comes with a built in DRb server that acts as the central hub for all indexing and searching in your application.

Prerequisites

You’ll need recent versions of Ferret ( >= 0.10) and acts_as_ferret (>= 0.4) for this to work.
Your application should be running fine using acts_as_ferret in development/test environments.

Switch your app to remote indexing for production mode

ferret_server.yml

If you installed aaf using script/plugin install, a configuration file stub already has been created in config/ferret_server.yml.
In that file you can define the DRb server hostname/ip address and port for each Rails environment (similar to database.yml).
Usually you’ll only want this for production mode:

production:
  host: ferret.yourdomain.com
  port: 9009
  pid_file: log/ferret.pid

The pid file path is relative to RAILS_ROOT and is used by the start/stop scripts.

Model Classes

Add the :remote => true option to your calls to acts_as_ferret in your model classes.
This will let aaf connect to the DRb server for indexing and searching, but only if there is a server configured in ferret_server.yml for the current environment. So your tests and development environment will happily run against the local index, while on your production system the DRb server gets used.

class MyModel
  acts_as_ferret( { :fields => [ :title, :content ], :remote => true },
                  { :analyzer => MyCustomAnalyzer.new } )
end

Never mind the other options in the example above, I just wanted to make explicitly clear which
options hash the :remote option belongs to.

On your server

Start the Ferret server process with

script/ferret_server -e production start

In case you are lucky enough to have multiple application servers, you’d only start the Ferret server process on one of them, of course.

You might want to run the ferret server as your deploy user (sudo -u deploy) where deploy is the user you’re using in production. Running the ferret server as root or as another user will change the permissions on the index files.

To stop the server, run

script/ferret_server -e production stop

The above commands to start and stop the ferret server only seem to work if using bash shell. It fails if using sh or tcsh.

If you’re using monit you can set the user using the uid flag — “start program …. as uid deploy and gid deploy_group”. I couldn’t work out how to pass the RAILS_ENV=production to the ferret start/stop script except by writing a trivial shell script (as monit sanitizes the env variables). Perhaps someone else can figure out how to do this…

Note for Rails 1.1.6

It seems that older versions of Rails don’t support the way the server is launched. If you get errors when running the scripts, try to start them like this:

RAILS_ENV=production script/runner "load 'script/ferret_start'"
RAILS_ENV=production script/runner "load 'script/ferret_stop'"

Note for Rails 1.0.0

Rails 1.0.0 will run the start script fine, but the stop script has to be run in the fashion illustrated above.

Check that it works

So how do you see that it works? The server writes every arriving method call to RAILS_ROOT/log/ferret_server.log, so there you should see all the saves and searches fly by if your app is in use.

Performance Concerns

Have you done any performance benchmarking to compare the search speed against the default setup? I have had poor production performance in other applications where drb was a bottleneck.

I just tested this. It seems pretty stable. I hammered on it with 5 mongrels using ab on a UP machine. However, I too have performance concerns. I’m not sure if it’s drb or the fact that it’s that searches are blocking (which I assume they are). I think eventually I’ll want a ferret cluster that can handle multiple searches in parallel, possibly residing on separate machines.

This post states a performance of 5 updates and 20-30 searches per second, which imho is not too bad. What are your numbers?

[It would be interesting to see a benchmark using UNIX Socket DRb.]

Regarding the blocking of searches – they should not block unless an update is going on. This case could be optimized by having a copy of the index in RAM, however updating that copy after index updates would take it’s time, too. I also could imagine using something other than DRb for communicating with the server. But, since DRb is so dead easy to work with, every other method would be more effort to implement and therefore should bring a real speed gain ;-) Suggestions anybody?

Benchmarks

Caleb Jones did a comparison between acts_as_ferret’s DRb server and acts_as_solr, which connects via HTTP to the Solr search server.

The comparison covers multiple test scenarios:

  • random search with no background updates
  • cached search with no background updates
  • random search with continuous background updates
  • cached search with continuous background updates.

Results were very close, with acts_as_ferret being slightly faster in 3 of four test scenarios.

Howto

Reindexing

AAF 0.4.1

If you change a model that is being indexed, you’ll want to reindex your model. If you’re having trouble with the re-indexing (errors), one thing you can do is to turn off the ferret server, stop the creation/updating of the ferret index in the instance_methods.rb file in vendor/plugin/acts_as_ferret/lib …(just comment out one line), then from script/console production do a “Model.rebuild_index”

AAF stable

Call Model.rebuild_index from a console on your server as stated above. Aaf will now build a completely new index version, leaving the original index untouched. Searches and updates will continue to work as normal on the old index while the rebuild runs. After finishing the rebuild, the DRb server will switch to using the new index automatically.

Please note that changes made to the old index while the rebuild runs may or may not be reflected in the newly built index, so you’ll have to make sure for yourself that everything is correctly indexed (i.e. by recording the start time of the rebuild, and, after the rebuild, reindexing all records that have been changed since that time).

Use monit to monitor the DRb server

You can use monit to ensure your DRb server stays up and running.
A suitable [source:trunk/plugin/acts_as_ferret/doc/monit-example monit configuration example] is shipped with acts_as_ferret.

How to launch DRb server on reboot (linux)

Many people have had a difficult time getting their DRb server to launch at reboot on newer Linux distributions. This is caused by a PATH issue that comes about when users have installed Ruby in /usr/local/bin and their linux distribution utilizes SELinux. Here’s a fix (and a startup script):

#!/bin/bash
#
# This script starts and stops the ferret DRb server
# chkconfig: 2345 89 36
# description: Ferret search engine for ruby apps.
#
# save the current directory
CURDIR=@pwd@
PATH=/usr/local/bin:$PATH
RORPATH="/path/to/ror_root"
case "$1" in
  start)
     cd $RORPATH
     echo "Starting ferret DRb server."
     FERRET_USE_LOCAL_INDEX=1 \
                script/runner -e production \
                vendor/plugins/acts_as_ferret/script/ferret_start
     ;;
  stop)
     cd $RORPATH
     echo "Stopping ferret DRb server."
     FERRET_USE_LOCAL_INDEX=1 \
                script/runner -e production \
                vendor/plugins/acts_as_ferret/script/ferret_stop
     ;;
  *)
     echo $"Usage: $0 {start, stop}"
     exit 1
     ;;
esac
cd $CURDIR

running the DRb server as a windows service

http://www.pluitsolutions.com/2007/07/30/acts-as-ferret-drbserver-win32-service/