Skip to content

PostgresSearch

brendonh edited this page Sep 13, 2010 · 1 revision

Over a Postgres database, Warp can help you to manage a fulltext search index.

Basics

Define your searchable models as warp.common.fulltext.Searchable subclasses, with a searchColumns attribute, e.g.:

from warp.common import fulltext

class Product(fulltext.Searchable):
  __storm_table__ = "product"
  searchColumns = ("name", "description")

  id = Int(primary=True)
  name = Unicode()
  description = Unicode(default=u"")

The Searchable base class includes a __storm_flushed__ method which will retrieve your search column values, remove None values, concatenate what remains, and create (or update) an index entry in the magically-existing warp_fulltext table. It will also remove index entries when objects are deleted.

To initially create the index, or if you’ve made changes to your data outside Warp, you can call fulltext.reindex(), which will delete the existing index, and then re-index every object in every Searchable model.

To search, just call fulltext.search(term), which will return a generator of results in order of relevance (using Postgres’ ts_rank_cd function). Note that the results may be of different types, if you have more than one Searchable subclass.

E.g. in a Mako template, with no mixed-type worries:

  <% term = u"Hello World" %>
  %for result in fulltext.search(term):
    <p>${link(result.name, getNode("products"), "view", [result.id])}</p>
  % endfor

Advanced use and Multilanguage Support

Sometimes the default “cat these attributes together” support isn’t good enough. Two examples:

  • When you want to control the Postgres language dictionary the document is indexed using
  • When you want to include values that aren’t simple attributes on the object

In this case, you can omit the searchColumns attribute and instead supply two methods, getSearchVals (returning a list of values to concatenate) and getSearchLanguage (returning a Postgres language dictionary name, like "english" or "simple").

You can then give an extra language argument to fulltext.search:

  for result in fulltext.search(term, "simple"):
    ...

A real example: We have a Product table with basic product information, and then a ProductDescription table with several rows in different languages for each product. We want to index the languages separately, and we want to include some product information (like the barcode) in each index entry. Our ProductDescription model looks like this:

class ProductDescription(Searchable):
    __storm_table__ = "product_description"

    id = Int(primary=True)

    product_id = Int()
    product = Reference(product_id, Product.id)

    language = columns.NonEmptyUnicode(default=u"en")

    name = columns.NonEmptyUnicode()

    description = columns.Text(default=u"")
    directions = columns.Text(default=u"")
    features = columns.Text(default=u"")
    ingredients = columns.Text(default=u"")
    skintype = columns.Text(default=u"")

    def getSearchVals(self):
        product = self.product
        vals = [self.name, self.description, self.directions, 
                self.features, self.ingredients, self.skintype, 
                product.code, product.barcode]
        vals.extend(c.name for c in product.concerns)
        return vals

    def getSearchLanguage(self):
        if self.language == 'en': return 'english'
        else: return 'simple'

    def renderAsSearchResult(self):
        if not self.product.is_primary:
            return None
        return '<b>Product</b>: <a href="%s">%s</a>' % (
            self.product.publicURL(), self.name)

This causes each ProductDescription to be indexed using the appropriate Postgres dictionary, and include various information from its Product.

Later, when rendering search results, we use renderAsSearchResult to give a link to the Product, with the name in the description’s language:

<h1>Results for '${term}':</h1>

<%
if request.session.language == 'zh':
  results = fulltext.search(term, "simple")
else:
  results = fulltext.search(term, "english")
%>

<div style="margin: 10px; padding: 10px; background-color: #eee; border: 1px dotted #999">
  <% found = False %>
  % for result in results:
    <% found = True %>
    <% render = result.renderAsSearchResult() %>
    <% if render is None: continue %>
    <p style="margin: 0 0 5px 20px">${render}</p>
  % endfor
  % if not found:
    ${"Sorry, no results" | t}
  % endif
</div>

By using a method on the model to render the result, we can also handle mixed result types by giving each Searchable subclass its own renderAsSearchResult method.

A quick note to myself on Chinese support:

The “simple” language config seems to work okay for basic searches, but it would be better to get this installed:

http://code.google.com/p/nlpbamboo/wiki/TSearch2

Something went wrong with that request. Please try again.