FTS doc.

coleifer · Jan 8, 2015 · 3076446 · 3076446
1 parent 540b474
commit 3076446
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 1 deletion.
diff --git a/docs/models.rst b/docs/models.rst
@@ -162,6 +162,42 @@ You can combine filters with ordering:
     Mickey
     Huey
 
+Full-text search
+----------------
+
+I've added a really (really) simple full-text search index type. Here is how to use it:
+
+.. code-block:: pycon
+
+    >>> class Note(Model):
+    ...     database = db
+    ...     content = TextField(fts=True)  # Note the "fts=True".
+
+    >>> Note.create('this is a test of walrus FTS.')
+    >>> Note.create('favorite food is walrus-mix.')
+    >>> Note.create('do not forget to take the walrus for a walk.')
+
+    >>> for note in Note.query(Note.content.match('walrus')):
+    ...     print note.content
+    favorite food is walrus-mix.
+    this is a test of walrus FTS.
+    do not forget to take the walrus for a walk.
+
+    >>> for note in Note.query(Note.content.match('walk walrus')):
+    ...     print note.content
+    do not forget to take the walrus for a walk.
+
+    >>> for note in Note.query(Note.content.match('walrus mix')):
+    ...     print note.content
+    favorite food is walrus-mix.
+
+It is very limited in terms of what it does, but I hope to make it better as time goes on. Currently there is no stemming, which is the biggest problem and which I plan to address soon by adding a porter stemming implementation. The limitations are:
+
+* No stemming, so plural/singular forms are considered separate words.
+* Default conjunction is *AND* and there is no way to override this. I plan on supporting *OR* but I'm not sure yet on the API.
+* Partial strings are not matched.
+* Very naive scoring function.
+
 Need more power?
 ----------------
 

diff --git a/walrus.py b/walrus.py
@@ -1438,7 +1438,7 @@ def _load_stopwords(self):
             self._stopwords = set(stopwords.splitlines())
 
     def tokenize(self, value):
-        value = re.sub('[\.,;:"\'\\/!@#\$%\*\(\)]', ' ', value)
+        value = re.sub('[\.,;:"\'\\/!@#\$%\*\(\)\-\=_]', ' ', value)
         words = value.lower().split()
         fraction = 1. / len(words)
         scores = {}