public
Fork of frabcus/acts_as_xapian
Description: Xapian full text search plugin for Ruby on Rails
Homepage: http://github.com/frabcus/acts_as_xapian/wikis
Clone URL: git://github.com/boone/acts_as_xapian.git
Search Repo:
name age message
folder LICENSE.txt Wed May 14 15:55:12 -0700 2008 Add license file (MIT) [Francis Irving]
folder README.txt Sun May 18 06:40:10 -0700 2008 'number' -> 'identifier' for doc consistency [boone]
folder generators/ Thu May 15 20:31:00 -0700 2008 Created generator for the acts_as_xapian databa... [boone]
folder init.rb Tue May 13 02:58:46 -0700 2008 Moved into git [Francis Irving]
folder lib/ Fri May 16 07:27:10 -0700 2008 Merge commit 'boone/master' [Francis Irving]
folder tasks/ Thu May 15 22:29:00 -0700 2008 Updated rake tasks to work in my environment. [boone]
README.txt
Contents
========

* a. Introduction to acts_as_xapian 
* b. Installation
* c. Comparison to acts_as_solr (as on 24 April 2008)
* d. Documentation - indexing
* e. Documentation - querying


a. Introduction to acts_as_xapian
=================================

"Xapian":http://www.xapian.org is a full text search engine library, which has
Ruby bindings. acts_as_xapian adds support for it to Rails. It is an
alternative to acts_as_solr or acts_as_ferret.

Xapian is an *offline indexing* search library - only one process can have the
Xapian database open for writing at once, and others that try meanwhile are
unceremoniously kicked out. For this reason, acts_as_xapian does not support
immediate writing to the database when your models change.

Instead, there is a ActsAsXapianJob model which stores which models need
updating or deleting in the search index. A rake task 'xapian:update_index'
then performs the updates since last change. Run it on a cron job, or similar.

Xapian 1.0.5 and associated Ruby bindings are required. In Debian or Ubuntu
install the packages libxapian15 and libxapian-ruby1.8.

Email francis@mysociety.org with patches.


b. Installation
===============

Retrieve the plugin directly from the git version control system by running
this command within your Rails app.

    git clone git://github.com/frabcus/acts_as_xapian.git vendor/plugins/acts_as_xapian


c. Comparison to acts_as_solr (as on 24 April 2008)
=============================

* Offline indexing only mode - which is a minus if you want changes
immediately reflected in the search index, and a plus if you were going to
have to implement your own offline indexing anyway.

* Collapsing - the equivalent of SQL's "group by". You can specify a field
to collapse on, and only the most relevant result from each value of that
field is returned. Along with a count of how many there are in total.
acts_as_solr doesn't have this.

* No highlighting - Xapian can't return you text highlighted with a search
query. You can try and make do with TextHelper::highlight (combined with
words_to_highlight below). I found the highlighting in acts_as_solr didn't
really understand the query anyway.

* Date range searching - maybe this works in acts_as_solr, but I never found
out how.

* Spelling correction - "did you mean?" built in and just works.

* Multiple models - acts_as_xapian searches multiple models if you like,
returning them mixed up together by relevancy. This is like multi_solr_search,
only it is the default mode of operation and is properly supported.

* No daemons - However, if you have more than one web server, you'll need to
work out how to use "Xapian's remote backend":http://xapian.org/docs/remote.html.

* One layer - full-powered Xapian is called directly from the Ruby, without
Solr getting in the way whenever you want to use a new feature from Lucene.

* No Java - an advantage if you're more used to working in the rest of the
open source world. acts_as_xapian, it's pure Ruby and C++.

* Xapian's awesome email list - the kids over at 
"xapian-discuss":http://lists.xapian.org/mailman/listinfo/xapian-discuss
are super helpful. Useful if you need to extend and improve acts_as_xapian. The
Ruby bindings are mature and well maintained as part of Xapian.


d. Documentation - indexing
===========================

1. Put acts_as_xapian in your models that need search indexing. e.g.

    acts_as_xapian :texts => [ :name, :short_name ],
       :values => [ [ :created_at, 0, "created_at", :date ] ],
       :terms => [ [ :variety, 'V', "variety" ] ]

Options must include:

* :texts, an array of fields for indexing with full text search. 
e.g. :texts => [ :title, :body ]

* :values, things which have a range of values for sorting, or for collapsing. 
Specify an array quadruple of [ field, identifier, prefix, type ] where 
** identifier is an arbitary numeric identifier for use in the Xapian database
** prefix is the part to use in search queries that goes before the :
** type can be any of :string, :number or :date

e.g. :values => [ [ :created_at, 0, "created_at", :date ],
[ :size, 1, "size", :string ] ]

* :terms, things which come after a : in search queries. Specify an array
triple of [ field, char, prefix ] where 
** char is an arbitary single upper case char used in the Xapian database
** prefix is the part to use in search queries that goes before the :

e.g. :terms => [ [ :variety, 'V', "variety" ] ]
        
A 'field' is a symbol referring to either an attribute or a function which
returns the text, date or number to index. Both 'identifier' and 'char' must be
the same for the same prefix in different models.

Alternatively, 
* :instead_index, a field which refers to another model that should be reindexed
         instead of this one.

Options may include:
* :eager_load, added as an :include clause when looking up search results in
database
* :if, either an attribute or a function which if returns false means the
object isn't indexed

2. Generate a database migration to create the ActsAsXapianJob model:

    script/generate acts_as_xapian
    rake db:migrate

3. Call 'rake xapian:rebuild_index models="ModelName1 ModelName2"' to build the index
the first time (you must specify all your indexed models). It's put in a
development/test/production dir in acts_as_xapian/xapiandbs.

4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index'


e. Documentation - querying
===========================

If you just want to test indexing is working, you'll find this rake task
useful (it has more options, see tasks/xapian.rake)

    rake xapian:query models="PublicBody User" query="moo"

To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
* model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
* query_string - Google like syntax, see below

And then a hash of options:
* :offset - Offset of first result
* :limit - Number of results per page
* :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
* :sort_by_ascending - Default true, set to false for descending sort
* :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)

Google like query syntax is as described in 
    "Xapian::QueryParser Syntax":http://www.xapian.org/docs/queryparser.html
Queries can include prefix:value parts, according to what you indexed in the
acts_as_xapian part above. You can also say things like model:InfoRequestEvent 
to constrain by model in more complex ways than the :model parameter, or
modelid:InfoRequestEvent-100 to only find one specific object.

Returns an ActsAsXapian::Search object. Useful methods are:
* description - a techy one, to check how the query has been parsed
* matches_estimated - a guesstimate at the total number of hits
* spelling_correction - the corrected query string if there is a correction, otherwise nil
* words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
* results - an array of hashes each containing:
** :model - your Rails model, this is what you most want!
** :weight - relevancy measure
** :percent - the weight as a %, 0 meaning the item did not match the query at all
** :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix


For more details about anything, see source code in lib/acts_as_xapian.rb -
please though do patch this file if there is documentation missing / wrong.
It's called README.txt and is in git, using Textile formatting. The wiki page
is just copied from the README.txt file.