Skip to content

Commit

Permalink
FTS doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
coleifer committed Jan 8, 2015
1 parent 540b474 commit 3076446
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
36 changes: 36 additions & 0 deletions docs/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,42 @@ You can combine filters with ordering:
Mickey
Huey
Full-text search
----------------

I've added a really (really) simple full-text search index type. Here is how to use it:

.. code-block:: pycon
>>> class Note(Model):
... database = db
... content = TextField(fts=True) # Note the "fts=True".
>>> Note.create('this is a test of walrus FTS.')
>>> Note.create('favorite food is walrus-mix.')
>>> Note.create('do not forget to take the walrus for a walk.')
>>> for note in Note.query(Note.content.match('walrus')):
... print note.content
favorite food is walrus-mix.
this is a test of walrus FTS.
do not forget to take the walrus for a walk.
>>> for note in Note.query(Note.content.match('walk walrus')):
... print note.content
do not forget to take the walrus for a walk.
>>> for note in Note.query(Note.content.match('walrus mix')):
... print note.content
favorite food is walrus-mix.
It is very limited in terms of what it does, but I hope to make it better as time goes on. Currently there is no stemming, which is the biggest problem and which I plan to address soon by adding a porter stemming implementation. The limitations are:

* No stemming, so plural/singular forms are considered separate words.
* Default conjunction is *AND* and there is no way to override this. I plan on supporting *OR* but I'm not sure yet on the API.
* Partial strings are not matched.
* Very naive scoring function.

Need more power?
----------------

Expand Down
2 changes: 1 addition & 1 deletion walrus.py
Original file line number Diff line number Diff line change
Expand Up @@ -1438,7 +1438,7 @@ def _load_stopwords(self):
self._stopwords = set(stopwords.splitlines())

def tokenize(self, value):
value = re.sub('[\.,;:"\'\\/!@#\$%\*\(\)]', ' ', value)
value = re.sub('[\.,;:"\'\\/!@#\$%\*\(\)\-\=_]', ' ', value)
words = value.lower().split()
fraction = 1. / len(words)
scores = {}
Expand Down

0 comments on commit 3076446

Please sign in to comment.