public
Description: Xapian full text search plugin for Ruby on Rails
Homepage: http://github.com/frabcus/acts_as_xapian/wikis
Clone URL: git://github.com/frabcus/acts_as_xapian.git
acts_as_xapian / README.txt
d796402b » frabcus 2009-03-23 Debugging. 1 The official page for acts_as_xapian is now the Google Groups page.
2
3 http://groups.google.com/group/acts_as_xapian
4
5 frabcus's github repository is no longer the official repository,
6 find the official one from the Google Groups page.
7
8 ------------------------------------------------------------------------
9
e09cf95f » frabcus 2008-07-15 Update help lots 10 Do patch this file if there is documentation missing / wrong. It's called
11 README.txt and is in git, using Textile formatting. The wiki page is just
12 copied from the README.txt file.
13
52307e3a » Francis Irving 2008-05-15 Textile docs. 14 Contents
15 ========
16
17 * a. Introduction to acts_as_xapian
18 * b. Installation
19 * c. Comparison to acts_as_solr (as on 24 April 2008)
20 * d. Documentation - indexing
21 * e. Documentation - querying
99cb51d9 » frabcus 2008-09-12 Tidy up help for database c... 22 * f. Configuration
5883a7e2 » frabcus 2008-09-22 Log time taken to do Xapian... 23 * g. Performance
24 * h. Support
52307e3a » Francis Irving 2008-05-15 Textile docs. 25
26
27 a. Introduction to acts_as_xapian
28 =================================
29
f0f69a56 » frabcus 2008-07-15 Some links tweaked. 30 "Xapian":http://www.xapian.org is a full text search engine library which has
52307e3a » Francis Irving 2008-05-15 Textile docs. 31 Ruby bindings. acts_as_xapian adds support for it to Rails. It is an
43c66979 » frabcus 2008-07-20 More search engine plugins ... 32 alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed,
33 acts_as_searchable or acts_as_tsearch.
52307e3a » Francis Irving 2008-05-15 Textile docs. 34
e09cf95f » frabcus 2008-07-15 Update help lots 35 acts_as_xapian is deployed in production on these websites.
36 * "WhatDoTheyKnow":http://www.whatdotheyknow.com
37 * "MindBites":http://www.mindbites.com
52307e3a » Francis Irving 2008-05-15 Textile docs. 38
f0f69a56 » frabcus 2008-07-15 Some links tweaked. 39 The section "c. Comparison to acts_as_solr" below will give you an idea of
e09cf95f » frabcus 2008-07-15 Update help lots 40 acts_as_xapian's features.
52307e3a » Francis Irving 2008-05-15 Textile docs. 41
e09cf95f » frabcus 2008-07-15 Update help lots 42 acts_as_xapian was started by Francis Irving in May 2008 for search and email
43 alerts in WhatDoTheyKnow, and so was supported by "mySociety":http://www.mysociety.org
44 and initially paid for by the "JRSST Charitable Trust":http://www.jrrt.org.uk/jrsstct.htm
52307e3a » Francis Irving 2008-05-15 Textile docs. 45
46
47 b. Installation
48 ===============
49
50 Retrieve the plugin directly from the git version control system by running
51 this command within your Rails app.
52
53 git clone git://github.com/frabcus/acts_as_xapian.git vendor/plugins/acts_as_xapian
54
e09cf95f » frabcus 2008-07-15 Update help lots 55 Xapian 1.0.5 and associated Ruby bindings are also required.
56
57 Debian or Ubuntu - install the packages libxapian15 and libxapian-ruby1.8.
58
59 Mac OSX - follow the instructions for installing from source on
60 the "Installing Xapian":http://xapian.org/docs/install.html page - you need the
61 Xapian library and bindings (you don't need Omega).
62
63 There is no Ruby Gem for Xapian, it would be great if you could make one!
64
52307e3a » Francis Irving 2008-05-15 Textile docs. 65
66 c. Comparison to acts_as_solr (as on 24 April 2008)
67 =============================
68
69 * Offline indexing only mode - which is a minus if you want changes
70 immediately reflected in the search index, and a plus if you were going to
71 have to implement your own offline indexing anyway.
72
73 * Collapsing - the equivalent of SQL's "group by". You can specify a field
74 to collapse on, and only the most relevant result from each value of that
75 field is returned. Along with a count of how many there are in total.
76 acts_as_solr doesn't have this.
77
78 * No highlighting - Xapian can't return you text highlighted with a search
79 query. You can try and make do with TextHelper::highlight (combined with
80 words_to_highlight below). I found the highlighting in acts_as_solr didn't
81 really understand the query anyway.
82
4cf07bdc » frabcus 2008-07-13 Improve wording. 83 * Date range searching - this exists in acts_as_solr, but I found it
84 wasn't documented well enough, and was hard to get working.
52307e3a » Francis Irving 2008-05-15 Textile docs. 85
86 * Spelling correction - "did you mean?" built in and just works.
87
e09cf95f » frabcus 2008-07-15 Update help lots 88 * Similar documents - acts_as_xapian has a simple command to find other models
89 that are like a specified model.
90
91 * Multiple models - acts_as_xapian searches multiple types of model if you
92 like, returning them mixed up together by relevancy. This is like
93 multi_solr_search, only it is the default mode of operation and is properly
94 supported.
52307e3a » Francis Irving 2008-05-15 Textile docs. 95
96 * No daemons - However, if you have more than one web server, you'll need to
97 work out how to use "Xapian's remote backend":http://xapian.org/docs/remote.html.
98
99 * One layer - full-powered Xapian is called directly from the Ruby, without
100 Solr getting in the way whenever you want to use a new feature from Lucene.
101
102 * No Java - an advantage if you're more used to working in the rest of the
103 open source world. acts_as_xapian, it's pure Ruby and C++.
104
105 * Xapian's awesome email list - the kids over at
106 "xapian-discuss":http://lists.xapian.org/mailman/listinfo/xapian-discuss
107 are super helpful. Useful if you need to extend and improve acts_as_xapian. The
108 Ruby bindings are mature and well maintained as part of Xapian.
109
110
111 d. Documentation - indexing
112 ===========================
113
e09cf95f » frabcus 2008-07-15 Update help lots 114 Xapian is an *offline indexing* search library - only one process can have the
115 Xapian database open for writing at once, and others that try meanwhile are
116 unceremoniously kicked out. For this reason, acts_as_xapian does not support
117 immediate writing to the database when your models change.
118
119 Instead, there is a ActsAsXapianJob model which stores which models need
120 updating or deleting in the search index. A rake task 'xapian:update_index'
121 then performs the updates since last change. You can run it on a cron job, or
122 similar.
123
124 Here's how to add indexing to your Rails app:
125
52307e3a » Francis Irving 2008-05-15 Textile docs. 126 1. Put acts_as_xapian in your models that need search indexing. e.g.
127
128 acts_as_xapian :texts => [ :name, :short_name ],
129 :values => [ [ :created_at, 0, "created_at", :date ] ],
130 :terms => [ [ :variety, 'V', "variety" ] ]
131
132 Options must include:
133
134 * :texts, an array of fields for indexing with full text search.
135 e.g. :texts => [ :title, :body ]
136
137 * :values, things which have a range of values for sorting, or for collapsing.
138 Specify an array quadruple of [ field, identifier, prefix, type ] where
64028541 » boone 2008-05-15 Minor fixes to README.txt 139 ** identifier is an arbitary numeric identifier for use in the Xapian database
52307e3a » Francis Irving 2008-05-15 Textile docs. 140 ** prefix is the part to use in search queries that goes before the :
141 ** type can be any of :string, :number or :date
142
64028541 » boone 2008-05-15 Minor fixes to README.txt 143 e.g. :values => [ [ :created_at, 0, "created_at", :date ],
144 [ :size, 1, "size", :string ] ]
52307e3a » Francis Irving 2008-05-15 Textile docs. 145
c884d555 » frabcus 2008-07-15 Improve :terms docs 146 * :terms, things which come with a prefix (before a :) in search queries.
147 Specify an array triple of [ field, char, prefix ] where
148 ** char is an arbitary single upper case char used in the Xapian database, just
149 pick any single uppercase character, but use a different one for each prefix.
52307e3a » Francis Irving 2008-05-15 Textile docs. 150 ** prefix is the part to use in search queries that goes before the :
c884d555 » frabcus 2008-07-15 Improve :terms docs 151 For example, if you were making Google and indexing to be able to later do a
152 query like "site:www.whatdotheyknow.com", then the prefix would be "site".
52307e3a » Francis Irving 2008-05-15 Textile docs. 153
154 e.g. :terms => [ [ :variety, 'V', "variety" ] ]
155
156 A 'field' is a symbol referring to either an attribute or a function which
f336f057 » boone 2008-05-18 'number' -> 'identifier' fo... 157 returns the text, date or number to index. Both 'identifier' and 'char' must be
52307e3a » Francis Irving 2008-05-15 Textile docs. 158 the same for the same prefix in different models.
159
160 Options may include:
161 * :eager_load, added as an :include clause when looking up search results in
162 database
163 * :if, either an attribute or a function which if returns false means the
164 object isn't indexed
165
6ac85e6c » boone 2008-05-15 Created generator for the a... 166 2. Generate a database migration to create the ActsAsXapianJob model:
167
168 script/generate acts_as_xapian
169 rake db:migrate
52307e3a » Francis Irving 2008-05-15 Textile docs. 170
64028541 » boone 2008-05-15 Minor fixes to README.txt 171 3. Call 'rake xapian:rebuild_index models="ModelName1 ModelName2"' to build the index
52307e3a » Francis Irving 2008-05-15 Textile docs. 172 the first time (you must specify all your indexed models). It's put in a
99cb51d9 » frabcus 2008-09-12 Tidy up help for database c... 173 development/test/production dir in acts_as_xapian/xapiandbs. See f. Configuration
174 below if you want to change this.
52307e3a » Francis Irving 2008-05-15 Textile docs. 175
176 4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index'
177
178
179 e. Documentation - querying
180 ===========================
181
193cc787 » frabcus 2008-05-18 Facility to find models whi... 182 Testing indexing
183 ----------------
184
52307e3a » Francis Irving 2008-05-15 Textile docs. 185 If you just want to test indexing is working, you'll find this rake task
186 useful (it has more options, see tasks/xapian.rake)
187
188 rake xapian:query models="PublicBody User" query="moo"
189
193cc787 » frabcus 2008-05-18 Facility to find models whi... 190 Performing a query
191 ------------------
192
52307e3a » Francis Irving 2008-05-15 Textile docs. 193 To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
194 * model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
195 * query_string - Google like syntax, see below
196
197 And then a hash of options:
66e6024b » frabcus 2008-07-15 Make limit default all results 198 * :offset - Offset of first result (default 0)
0749d914 » frabcus 2008-07-28 Make setting limit compulso... 199 * :limit - Number of results per page
52307e3a » Francis Irving 2008-05-15 Textile docs. 200 * :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
7527ebd2 » frabcus 2008-09-12 Make help on sort_by_ascend... 201 * :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort
52307e3a » Francis Irving 2008-05-15 Textile docs. 202 * :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)
203
204 Google like query syntax is as described in
205 "Xapian::QueryParser Syntax":http://www.xapian.org/docs/queryparser.html
206 Queries can include prefix:value parts, according to what you indexed in the
207 acts_as_xapian part above. You can also say things like model:InfoRequestEvent
208 to constrain by model in more complex ways than the :model parameter, or
209 modelid:InfoRequestEvent-100 to only find one specific object.
210
211 Returns an ActsAsXapian::Search object. Useful methods are:
212 * description - a techy one, to check how the query has been parsed
213 * matches_estimated - a guesstimate at the total number of hits
214 * spelling_correction - the corrected query string if there is a correction, otherwise nil
215 * words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
216 * results - an array of hashes each containing:
217 ** :model - your Rails model, this is what you most want!
218 ** :weight - relevancy measure
219 ** :percent - the weight as a %, 0 meaning the item did not match the query at all
220 ** :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix
221
193cc787 » frabcus 2008-05-18 Facility to find models whi... 222 Finding similar models
223 ----------------------
224
225 To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:
226 * model_classes - list of model classes to return models from within
227 * models - list of models that you want to find related ones to
228
229 Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except
230 for words_to_highlight. In addition has:
231 * important_terms - the terms extracted from the input models, that were used to search for output
232 You need the results methods to get the similar models.
233
234
2779c795 » donncha 2008-08-24 Added configuration section... 235 f. Configuration
236 ================
237
5883a7e2 » frabcus 2008-09-22 Log time taken to do Xapian... 238 If you want to customise the configuration of acts_as_xapian, it will look for
239 a file called 'xapian.yml' under RAILS_ROOT/config. As is familiar from the
240 format of the database.yml file, separate :development, :test and :production
241 sections are expected.
2779c795 » donncha 2008-08-24 Added configuration section... 242
243 The following options are available:
5883a7e2 » frabcus 2008-09-22 Log time taken to do Xapian... 244 * base_db_path - specifies the directory, relative to RAILS_ROOT, in which
245 acts_as_xapian stores its search index databases. Default is the directory
246 xapiandbs within the acts_as_xapian directory.
2779c795 » donncha 2008-08-24 Added configuration section... 247
5883a7e2 » frabcus 2008-09-22 Log time taken to do Xapian... 248
249 g. Performance
250 ==============
251
252 On development sites, acts_as_xapian automatically logs the time taken to do
253 searches. The time displayed is for the Xapian parts of the query; the Rails
254 database model lookups will be logged separately by ActiveRecord. Example:
255
256 Xapian query (0.00029s) Search: hello
257
258 To enable this, and other performance logging, on a production site,
259 temporarily add this to the end of your config/environment.rb
260
261 ActiveRecord::Base.logger = Logger.new(STDOUT)
262
263
264 h. Support
e09cf95f » frabcus 2008-07-15 Update help lots 265 ==========
52307e3a » Francis Irving 2008-05-15 Textile docs. 266
f0f69a56 » frabcus 2008-07-15 Some links tweaked. 267 Please ask any questions on the
268 "acts_as_xapian Google Group":http://groups.google.com/group/acts_as_xapian
390c5552 » frabcus 2008-05-22 Link to github homepage fro... 269
f0f69a56 » frabcus 2008-07-15 Some links tweaked. 270 The official home page and repository for acts_as_xapian are the
271 "acts_as_xapian github page":http://github.com/frabcus/acts_as_xapian/wikis
e09cf95f » frabcus 2008-07-15 Update help lots 272
273 For more details about anything, see source code in lib/acts_as_xapian.rb
f61a3ecf » frabcus 2008-08-05 Link to git merging docs. 274
275 Merging source instructions "Using git for collaboration" here:
276 http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html