Skip to content

Commit

Permalink
[PERF] Make the index correctly use FTS (#958)
Browse files Browse the repository at this point in the history
## Description of changes
Previously we were not using the FTS search index correctly.
https://sqlite.org/fts5.html#full_text_query_syntax Expects that you
query using the table name of the FTS table, not using the column name.
If you want to query by column name, you have to use column filters as
discussed in the link above. We opt to take the path suggested here
https://sqlite.org/forum/forumpost/1d45a7f6e17a3460 and match on id in
addition to filtering that specific column. The query planner leverages
this appropriately as confirmed in EXPLAIN.

Since we were doing speculative delete queries, assuming the index was
leveraged, this was incredibly slow. However now it is much faster.

Explain Before
```-- SCAN VIRTUAL TABLE INDEX 0:``` -> Full table scan.

Explain After 
``` -- SCAN VIRTUAL TABLE INDEX 0:M2 ``` -> Scans the index itself

The net effect of this is a large increase in write speed and also now
the write path time does not grow with table size.

### Quick Benchmark Results
N = 100k uniformly random vectors
D = 128
Metadata = one small key: value pair
Document = randomly generated string of length 100

Added with batch size = 1000

**Without Fix, Overall Time = 469s. Time to add a batch grows linearly
to >8000 ms**
<img width="590" alt="Screenshot 2023-08-09 at 5 53 24 PM"
src="https://github.com/chroma-core/chroma/assets/5598697/89dde745-9231-4f3f-b62c-bf8486f7e970">

**With Fix, Overall Time = 102s. Time to add a batch grows sublinearly
to ~1200 ms**
<img width="587" alt="Screenshot 2023-08-09 at 5 43 12 PM"
src="https://github.com/chroma-core/chroma/assets/5598697/2a771788-e5d9-4afe-bacb-dfbfb51b6cd1">

We will also want to make sure that the read path leverages this way of
querying. Will address that in a follow up PR.

## Test plan
Existing tests cover the scope of this change.

## Documentation Changes
None required.
  • Loading branch information
HammadB committed Aug 10, 2023
1 parent 9fb7ad6 commit cdb588b
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions chromadb/segment/impl/metadata/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
WhereOperator,
)
from uuid import UUID
from pypika import Table, Tables
from pypika import Table, Tables, Field
from pypika.queries import QueryBuilder
import pypika.functions as fn
from pypika.terms import Criterion, Function
Expand Down Expand Up @@ -140,8 +140,6 @@ def get_metadata(
q = q.where(
self._where_doc_criterion(q, where_document, embeddings_t, fulltext_t)
)
pass
# q = self._where_document_query(q, where_document, embeddings_t, fulltext_t)

if ids:
q = q.where(embeddings_t.embedding_id.isin(ParameterValue(ids)))
Expand Down Expand Up @@ -247,6 +245,7 @@ def _update_metadata(self, cur: Cursor, id: int, metadata: UpdateMetadata) -> No
self._db.querybuilder()
.from_(t)
.where(t.id == ParameterValue(id))
.where(Field(t.get_table_name()) == ParameterValue(id))
.delete()
)
sql, params = get_sql(q)
Expand Down

0 comments on commit cdb588b

Please sign in to comment.