swsnider / search.py
- Source
- Commits
- Network (0)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
| name | age | message | |
|---|---|---|---|
| |
README.markdown | ||
| |
search/ | ||
| |
setup.py |
Lucene Module Documentation
The luceneUtil module provides an abstraction layer over lucene that has been compiled with JCC as a python extension library. It does not work with lucene compiled using GCJ as a python extension library -- but that approach is currently deprecated for other reasons anyway.
Prerequisites
- JCC-compiled PyLucene module
- PyYAML
API
- getAnalyzer() --Return the Lucene Analyzer object that will be used elsewhere in lucene calls. For more information on what the hell an analyzer is and why you may or may not care, see the lucene docs
- getWriter(store, analyzer, create=False) -- Abstracts out the creation of a Lucene Writer object that will be used by the other mutator methods in this file.
- delLock(objName) -- This functions is a kludge for when a lucene exception is thrown. Lucene will, in that case, not clean out the lock, and so the index becomes inaccessible to other writers/readers. This just deletes the file. It may be smart at some point to have a bit of logic for detecting corruption here.
- getSearcher(store) -- Returns a Lucene Searcher object that will be used by the doSearch() function. See the lucene documentation for more info.
- getStore(objName) -- This gets a FileStore object for lucene to use since lucene is written in java, and a bare file object is (apparently) insufficiently OO for it.
- getReader(store) -- Returns a lucene Reader object. Again, see the lucene documentation for more info.
- initIndex(tbl,rowset=None) -- One of the actual powerhouse functions. Call initIndex with a Model class as the first parameter, and optionally, pass in an iterable containing instances of that class as the second parameter. The lucene index will be rewritten to have only the cases in the rowset. If no rowset if specified, it assumes that you want to index all rows.
- reindex(row, whatsChanged={}) -- Pass in a row, and optionally a dictionary containing the attributes that have actually changed (a holdover from sqlobject) and this will change the lucene index to reflect the new row.
- doSearch(objName, queryS, field="id", defaultField="body", maxResults=200, defaultOperator=lucene.QueryParser.AND_OPERATOR) -- Performs a query on the lucene index. Objname is the classname (as a string) of the model that was used as the basis of the index, queryS is the actual query string, field is the name of the field that was marked as stored in the index that you want returned, defaultField is the field to search on if the query string does not specify one, maxResults puts an upper limit on the number of results to return, and defaultOperator tells lucene what operator to use when none is specified (i.e. in the query 'monkey sauce' which could be 'monkey AND sauce' or 'monkey OR sauce'.).
YAML Config File Format
The top-level of lucene.yml is a dictionary that contains two keys: "Lucene Settings" and "Lucene Index Definitions".
"Lucene Settings" must contain a key "Index Directory" that specifies the base index directory for lucene indices. Indices will be placed under it in directories bearing the name of the model class on which they are based.
"Lucene Index Definitions" contains one key per model that is used as an index basis, named after the class name of the model. Each of these model sections contains two keys: "fields" and "modules"
"fields" is a mandatory key that contains a list of dictionaries, each with two keys: "name" and "value"
"name" conatins the name of the lucene field you are defining
"value", which is optional, contains the python expression that, when evaluated, will produce the value to place in the field. Each of these expressions can use the variable "row" to refer to the current database row object. If "value" is not present, the value is assumed to be the result of evaluating the python expression: eval(".".join("row",name))
"modules" is optional. If present, it specifies a list of modules that will be imported before evaluating any of the "value" expressions listed in fields.
Notes
- SETTINGS_DIR should be set to the directory in which you have placed the settings files. By default, this is "."


