### Using SQLite Full-Text Search with Python and Peewee

#### This tutorial is based on https://charlesleifer.com/blog/using-sqlite-full-text-search-with-python

---

This example uses some Peewee's extensions which are collected under the [playhouse namespace](http://docs.peewee-orm.com/en/latest/peewee/playhouse.html).

The database name and the base models are defined in the [search_example.py](./search_example.py) module. 

In [1]:
# uncomment this library for installing the library
!pip install peewee
!pip install psycopg2-binary



In [2]:
# importing the auxiliary file with modules definition
# peewee and playhouse are imported from that module as well
from search_example import Entry, FTSEntry

In [3]:
# Create the tables
Entry.create_table()
FTSEntry.create_table()

You can open a SQLite shell or DB Browser and check the database schema.

```
sqlite> .schema
CREATE TABLE "entry" (
    "id" INTEGER NOT NULL PRIMARY KEY,
    "title" TEXT NOT NULL,
    "content" TEXT NOT NULL);
CREATE VIRTUAL TABLE "ftsentry" USING FTS5 (
    "content" TEXT NOT NULL);
CREATE TABLE 'ftsentry_content'(...)
CREATE TABLE 'ftsentry_segments'(...)
CREATE TABLE 'ftsentry_segdir'(...)
CREATE TABLE 'ftsentry_docsize'(...)
CREATE TABLE 'ftsentry_stat'(...)
```

In [4]:
# we can insert some entries
entry = Entry.create(
      title='This is how I rewrote everything with Python',
      content='Blah blah blah, type system, channels, blurgh')

FTSEntry.create(
      docid=entry.id, # Manually set the primary key ("docid") to entry's id.
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='Why ORM is a terrible idea',
      content='Blah blah blah, leaky abstraction, impedance mismatch')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='What is the relation between Javascript and Canvas',
      content='HTML5 features the <canvas> element that allows you to draw 2D graphics using JavaScript')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='Nokia Snake with JavaScript + Canvas',
      content='I thought I\'d re-create the Nokia Snake game (a distant relative of Nibbles) using JavaScript and the canvas element')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='Using python and k-means to find the dominant colors in images',
      content='Most images are an RGB array where we can easily apply K-Means Clustering. The Centers of each cluster would be the most dominant colors of the image')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='Saturday morning hack: personalized news digest with boolean query parser in Python',
      content='Occasionally I stumble on fascinating content and that\'s what keeps me coming back')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))

entry = Entry.create(
      title='Migrating from SQLite',
      content='This command instructs pgloader to load data from a SQLite file')

FTSEntry.create(
      docid=entry.id,  
      content='\n'.join((entry.title, entry.content)))


<FTSEntry: 7>

You can add many more entries as necessary. Just pick some random blog and extract some `title` and `content` information.

### Retrieving/querying data

From *peewee*, we can perform simple search queries by using the `FTSModel.match` helper:

In [5]:
query = (Entry
         .select(Entry.title)
         .join(FTSEntry, on=(Entry.id == FTSEntry.docid))
         .where(FTSEntry.match('javascript AND canvas'))
         .dicts())

for row_dict in query:
    print(row_dict)

{'title': 'What is the relation between Javascript and Canvas'}
{'title': 'Nokia Snake with JavaScript + Canvas'}


In [6]:
query = (Entry
         .select()
         .join(FTSEntry, on=(Entry.id == FTSEntry.docid))
         .where(FTSEntry.match('golang')))

for entry in query:
    print(entry.title)

### Sorting by relevance

By default, the results returned from a MATCH query are in an unspecified order. For full-text search to be useful, though, the results should be ordered by relevance. SQLite's FTS extension does not come with a relevance function per-se, but it is possible to extract metadata from the full-text index and get pretty good results. 

We will use some user-defined functions implemented in the [search_example.py](./search_example.py) module.

Here is an example of the SQL we will now generate to order search results by relevance:



In [8]:
# Sorting by relevance using the rank() algorithm.
# The first result returned contains the best match.
query = (Entry
         .select(Entry, FTSEntry.rank().alias('score'))
         .join(FTSEntry, on=(Entry.id == FTSEntry.docid))
         .where(FTSEntry.match('python'))
         .order_by(FTSEntry.rank()))

for entry in query:
    print("Title:", entry.title, "| Score:", round(entry.score, 2))

Title: This is how I rewrote everything with Python | Score: -0.33
Title: Using python and k-means to find the dominant colors in images | Score: -0.33
Title: Saturday morning hack: personalized news digest with boolean query parser in Python | Score: -0.33


In [14]:
# Sorting by relevance using the bm25() algorithm.
# The first result returned contains the best match.
query = (Entry
         .select(Entry, FTSEntry.bm25().alias('score'))
         .join(FTSEntry, on=(Entry.id == FTSEntry.docid))
         .where(FTSEntry.match('python'))
         .order_by(FTSEntry.bm25()))

for entry in query:
    print("Title:", entry.title, "| Score:", round(entry.score, 2))

Title: This is how I rewrote everything with Python | Score: -0.29
Title: Saturday morning hack: personalized news digest with boolean query parser in Python | Score: -0.23
Title: Using python and k-means to find the dominant colors in images | Score: -0.19
