Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sqlite full text search #177

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

will7200
Copy link
Contributor

Closes #103

This aims to be a drop in replacement for whoosh. with little to no API change for our customers.

  1. Uses FTS4 as FTS5 requires some escaping of the search terms
  2. Since the underlying search implementation strips out certain characters out during tokenization, these characters that might be of value to the customer are lost. Added functionality for the mask and filter operations to allows users to pass a callable to control their search better. Operations supported are those of sqlite3 column operators wrapped in pythonic operations.
    Examples:
    db.search("lollipop", mask={"product": lambda col: col == 'ZEBRA-2'})
    db.search("lollipop", mask={"product": lambda col: col.like('%ZEBRA-2%')})
    db.search("lollipop", mask={"product": lambda col: col.in_(['ZEBRA-2', 'ZEBRA-1'])})

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 83.24%. Comparing base (802dafe) to head (0c61d18).
Report is 14 commits behind head on main.

Files Patch % Lines
bw2data/updates.py 22.22% 7 Missing ⚠️
bw2data/search/indices.py 95.95% 4 Missing ⚠️
bw2data/search/schema.py 87.87% 4 Missing ⚠️
bw2data/search/search.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #177      +/-   ##
==========================================
+ Coverage   83.07%   83.24%   +0.17%     
==========================================
  Files          39       39              
  Lines        3609     3730     +121     
==========================================
+ Hits         2998     3105     +107     
- Misses        611      625      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bw2data/search/indices.py Outdated Show resolved Hide resolved
bw2data/search/indices.py Outdated Show resolved Hide resolved
@cmutel
Copy link
Member

cmutel commented Jul 23, 2024

@will7200 Awesome! One small comment which is an easy fix, and then fix some test failures, and we should be GTG!

on later python versions > 3.11, the weights constructed for the search function are invalid as they include the primary identifier as an arg to `matchinfo`.
@will7200
Copy link
Contributor Author

@cmutel fixed that and found the underlying causing those test failures. The last two remaining windows failures are due to:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpgwrprhc7\\dxbqkfxeszgknljeyz.46382e12\\lci\\databases.db'

Which looks like has happened in past github runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

D2 Use the SQLite text search instead of a separate Whoosh search index
2 participants