public
Description: High level Ruby library for interacting with Xapian, a full text search engine.
Homepage:
Clone URL: git://github.com/ryanb/xapit.git
xapit /
name age message
file .gitignore Thu May 28 15:47:36 -0700 2009 releasing xapit as a gem (v0.1.0) [ryanb]
file CHANGELOG Tue Jul 07 16:16:46 -0700 2009 releasing 0.2.7 [ryanb]
file LICENSE Tue Mar 10 16:41:16 -0700 2009 adding readme and license [ryanb]
file README.rdoc Fri Sep 04 08:04:02 -0700 2009 updating xapian installation instructions in re... [ryanb]
file Rakefile Tue Jul 07 16:16:46 -0700 2009 releasing 0.2.7 [ryanb]
directory features/ Tue Jul 07 10:28:18 -0700 2009 adding or_search method to collection for perfo... [ryanb]
file init.rb Mon Jun 15 07:44:20 -0700 2009 loading Xapit::Membership into ActiveRecord dee... [ryanb]
file install.rb Fri Jun 12 15:23:55 -0700 2009 fixing install script [ryanb]
directory lib/ Tue Jul 07 16:15:03 -0700 2009 adding support for punctuation in wildcard matc... [ryanb]
directory rails_generators/ Fri Jun 12 15:27:51 -0700 2009 fixing name of xapit generator class [ryanb]
directory spec/ Tue Jul 07 16:15:03 -0700 2009 adding support for punctuation in wildcard matc... [ryanb]
directory tasks/ Fri Jun 12 11:56:08 -0700 2009 moving rake task into xapit/rake_tasks.rb so it... [ryanb]
file uninstall.rb Fri Mar 13 08:52:33 -0700 2009 only generating/removing setup_xapit.rb initial... [ryanb]
README.rdoc

Xapit

Xapit (pronounced "zap it") is a high level interface for working with a Xapian database.

Note: This project is early in development and the API is subject to change.

Install

If you haven’t already, first install Xapian and the Xapian Bindings for Ruby. wiki.github.com/ryanb/xapit/xapian-installation

To install as a Rails plugin, run this command.

  script/plugin install git://github.com/ryanb/xapit.git

Or to install as a gem in Rails first add this to config/environment.rb.

  config.gem 'xapit'

And then install the gem and run the generator.

  sudo rake gems:install
  script/generate xapit

Important: only run the generator script on a gem install, not for the plugin.

Setup

Simply call "xapit" in the model and pass a block to define the indexed attributes.

  class Article < ActiveRecord::Base
    xapit do |index|
      index.text :name, :content
      index.field :category_id
      index.facet :author_name, "Author"
      index.sortable :id, :category_id
    end
  end

First we index "name" and "content" attributes for full text searching. The "category_id" field is indexed for :conditions searching. The "author_name" is indexed as a facet with "Author" being the display name of the facet. See the facets section below for details. Finally the "id" and "category_id" attributes are indexed as sortable attributes so they can be included in the :order option in a search.

Because the indexing happens in Ruby these attributes do no have to be database columns. They can be simple Ruby methods. For example, the "author_name" attribute mentioned above can be defined like this.

  def author_name
    author.name
  end

This way you can create a completely custom facet by simply defining your own method. Multiple facet options or field values per record are supported if you return an array.

  def author_names
    authors.map(&:name) # => ["John", "Bob"]
  end

Finally, you can pass any find options to the xapit method to determine what gets indexed or improve performance with eager loading or a different batch size.

  xapit(:batch_size => 100, :include => :author, :conditions => { :visible => true })

You can specify a :weight option to give a text attribute more importance. This will cause search terms matching that attribute to have a higher rank. The default weight is 1. Decimal (0.5) weight values are not supported.

  index.text :name, :weight => 10

Index

To perform the indexing, run the xapit:index rake task.

  rake xapit:index

It can also be triggered through Ruby code using this command.

  Xapit.remove_database
  Xapit.index_all

You may want to trigger this via a cron job on a recurring schedule (i.e. every day) to update the Xapian database. However it will only take effect after the Rails application is restarted because the Xapian database is stored in memory.

There are two projects in development to help improve this reindexing.

Search

You can then perform a search on the model.

  # perform a simple full text search
  @articles = Article.search("phone")

  # add pagination if you're using will_paginate
  @articles = Article.search("phone", :per_page => 10, :page => params[:page])

  # search based on indexed fields
  @articles = Article.search("phone", :conditions => { :category_id => params[:category_id] })

  # search for multiple negative conditions (doesn't match 3, 5, or 8)
  @articles = Article.search(:not_conditions => { :category_id => [3, 5, 8] })

  # search for range of conditions by number
  @articles = Article.search(:conditions => { :released_at => 2.years.ago..Time.now })

  # manually sort based on any number of indexed fields, sort defaults to most relevant
  @articles = Article.search("phone", :order => [:category_id, :id], :descending => true)

  # basic boolean matching is supported
  @articles = Article.search("phone OR fax NOT email")

You can also search all indexed models through Xapit.search.

  # search all indexed models
  @records = Xapit.search("phone")

Results

Simply iterate through the returned set to display the results.

  <% for article in @articles %>
    <%= article.name %>
    <%= article.xapit_relevance %>
  <% end %>

The "xapit_relevance" holds a percentage (between 0 and 100) determining how relevant the given document is to the user’s search query.

Spelling

If the searched term isn’t found, but it is similar to another term then it will show up as a spelling suggestion.

  <% if @articles.spelling_suggestion %>
    Did you mean <%= link_to h(@articles.spelling_suggestion), :overwrite_params => { :keywords => @articles.spelling_suggestion } %>?
  <% end %>

Facets

Facets allow you to further filter the result set based on certain attributes.

  <% for facet in @articles.facets %>
    <%= facet.name %>
    <% for option in facet.options %>
      <%= link_to option.name, :overwrite_params => { :facets => option } %>
      (<%= option.count %>)
    <% end %>
  <% end %>

The to_param method is defined on option to return an identifier which will be passed through the URL. Use this in the search.

  Article.search("phone", :facets => params[:facets])

You can also list the applied facets along with a remove link.

  <% for option in @articles.applied_facet_options %>
    <%=h option.name %>
    <%= link_to "remove", :overwrite_params => { :facets => option } %>
  <% end %>

Config

When installing Xapit as a Rails plugin, an initializer file is automatically created to setup. It looks like this.

  Xapit.setup(:database_path => "#{Rails.root}/db/xapiandb")

There are many other options you can pass into here. This is a more advanced configuration setting which changes the stemming language, disables spelling, and changes the indexer and parser to a classic variation. The classic ones use Xapian’s built-in term generator and query parser instead of the ones offered by Xapit.

  Xapit.setup(
    :database_path => "#{Rails.root}/db/external/xapiandb",
    :spelling => false,
    :stemming => "german",
    :indexer => ClassicIndexer,
    :query_parser => ClassicQueryParser
  )

Adapters

Adapters are used to support multiple ORMs since not everyone uses ActiveRecord. The right adapter is detected automatically so you should not have to do anything for popular ORMs. However if your ORM is not supported then it is very easy to make your own adapter. See AbstractAdapter class for details.

Bug Reports

If you have found a bug to report or a feature to request, please add it to the GitHub issue tracker if it is not there already.

github.com/ryanb/xapit/issues

Development

This project can be found on github at the following URL.

github.com/ryanb/xapit

If you would like to contribute to this project, please fork the repository and send me a pull request.