Full text indexing for CouchDB. (I really mean it this time.)
There are a couple dependancies that should hopefully be easy to overcome. I have things working on OS X 10.4 so Linux should be a breeze. Windows installation is left as an exercise to the reader.
- CouchDB - Obviously
- HyperEstraier - Full text indexing goodness
- Spidermonkey - Requirement of CouchDB as well Might require Spidermonkey 1.7
- hypy - Python bindings to HyperEstraier
- python-spidermonkey - My version on github
- hypercouch - This project?
OpenBSD: $ pkg_add -iv py-simplejson $ pkg_add -iv py-httplib2 $ pkg_add -iv py-nose
OS X python modules:
$ sudo port install py25-simplejson
$ sudo port install py25-httplib2
$ sudo port install py25-nose
Ubuntu:
$ sudo apt-get install simplejson
$ sudo apt-get isntall httplib2
$ sudo apt-get install python-nose
And couchdb-python:
$ svn checkout http://couchdb-python.googlecode.com/svn/trunk/ couchdb-python
$ cd couchdb-python
$ sudo python setup.py install
For the moment I haven't performed this installation on Linux. Tomorrow I'll sit down and install on one of my severs but for now I'll just go over the OS X procedure and hope the Linux equivalent isn't too far off.
I won't go over installing CouchDB because it's pretty well covered on the [wiki][wiki]. If you need help with it check there or hop onto the IRC channel (#couchdb on irc.freenode.net) and ask questions.
Make sure you have the python dev stuff installed for building extensions:
$ sudo apt-get install python2.5-dev
HyperEstraier should probably be in your package manager. On OS X it's merely:
$ sudo port install libestraier-dev libqdbm-dev
Ubuntu:
$ sudo apt-get install libestraier-dev
Openbsd:
Install Hyperestraier from sources:
$ pkg_add -iv qdbm
$ ftp http://hyperestraier.sourceforge.net/hyperestraier-1.4.13.tar.gz
$ tar xvzf hyperestraier-1.4.13.tar.gz
$ cd hyperestraier-1.4.13
$ ./configure && gmake && gmake install
Spidermonkey is similar
$ sudo port install spidermonkey
Ubuntu:
$ sudo apt-get install libmozjs-dev
OpenBSD:
$ pkg_add -iv spidermonkey
If either of your port commands fails due to activation conflicts, you can just deactivate and then rerun the install to get things right:
$ sudo port deactivate [package]
$ sudo port install [package]
Hopefully that was the hard part. The rest of the stuff is just about getting the python bindings installed.
Get the source here: http://hypy-source.goonmill.org/archive/tip.tar.gz
For me hypy was easy to install with the caveat that I had to make a minor tweak to setup.py
to help it find HyperEstraier installed by ports
. We just need to add /opt/local/include
to the list of include directories:
From: (About line 10 in Hypy's setup.py)
ext = Extension("_estraiernative",
["estraiernative.c"],
libraries=["estraier"],
include_dirs=["/usr/include/estraier", "/usr/include/qdbm"],
)
To:
ext = Extension("_estraiernative",
["estraiernative.c"],
libraries=["estraier"],
library_dirs=["/usr/local/lib", "/usr/lib"],
include_dirs=["/usr/local/include", "/usr/include/estraier", "/usr/include/qdbm", "/opt/local/include/"],
)
Hopefully that builds just dandy for you.
I actually had to download and patch this project to allow the execution of JavaScript functions from Python. I'm very unsure of the build stability but hopefully it works without too much effort. For the time being you can either git clone
it or download the tarball to install. At some point in the near future I'm going to give it a more thorough re-working to make it a real project.
Using git:
$ git clone git://github.com/davisp/python-spidermonkey.git
$ cd python-spidermonkey
No git:
$ wget http://github.com/davisp/python-spidermonkey/tarball/master
$ tar -xvzf davisp-python-spidermonkey-${HASH}.tar.gz
$ cd davisp-python-spidermonkey-${HASH}
Installing:
$ python setup.py build
$ sudo python setup.py isntall
Installing HyperCouch should be relatively straight forward. Just git clone
or download the tarball and install.
Using git:
$ git clone git://github.com/davisp/hypercouch.git
$ cd hypercouch
No git:
$ wget http://github.com/davisp/hypercouch/tarball/master
$ tar -xvzf davisp-hypercouch-${HASH}.tar.gz
$ cd davisp-hypercouch-${HASH}
Installing:
$ sudo python setup.py install
You'll need to edit your local.ini
or alternatively your local_dev.ini
if you're a fan of make dev
like I am.
[external]
hyper = /path/to/python -m hypercouch
[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"hyper">>}
Alternatively, in your [external]
section you can use /path/to/hypercouch/dev.sh
without installing to make dev work easier for when you start submitting bug reports and patches. (This avoids the constant sudo python setup.py install
running when you change hypercouch
sources.)
To use hypercouch
all you need to do is add a JavaScript function to your _design/doc
's in the ft_index
member. This function has two special JavaScript functions you can call to add indexing info for your document.
index(data)
- addsdata
to the full text indexproperty(name, value)
- adds properties to the document for use in sorting and limiting results
Example _design/document
:
{
"_id": "_design/foo",
"_rev": "32498230012",
"ft_index": "function(doc) {if(doc.body) index(doc.body); if(doc.foo) property("foo", doc.foo);}"
}
Example URL's:
$ curl http://127.0.0.1:5984/db_name/_fti?q=term1+term2
$ curl http://127.0.0.1:5984/db_name/_fti?q=term1+AND+term2
Caveat:
It may take a few seconds before the indexed results become available. There's no guarantee that a document has been indexed as soon as you commit it to the database.
q
- Requests with arbitrary AND/OR type of boolean logic.limit
andskip
parameters - For paging type stuff (Beware when not using a specified sort)matching
- Specify a HyperEstraier parsing methodorder
- Order results by an arbitrary propert. Syntax:prop_name [STRA|STRD|NUMA|NUMD]
highlight
- Retrieve a highlighted snipped. Currently only supports an html format viahighlight=html
- Other attribute limiting via
attr_name=METHOD param
See Search Conditions for specifics.