Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dataclip body to be searched as text (tsvector) #1939

Closed
Tracked by #1803
christad92 opened this issue Mar 26, 2024 · 0 comments · Fixed by #1941
Closed
Tracked by #1803

Allow dataclip body to be searched as text (tsvector) #1939

christad92 opened this issue Mar 26, 2024 · 0 comments · Fixed by #1941
Assignees
Labels
Launch MVP Features that are critical to launching the MVP

Comments

@christad92
Copy link

We recently added a gin index to dataclips.body, however the search query is treating the body as a string.

In order to use a gin index on a JSONB type, the query should use a json operator (@>, @@, @?...).

So in order to search dataclips as text (and be performant), we need to:

  1. Cast the json as a tsvector
    Attention should be paid as to the dictionary choice to ignore JSON syntax.
  2. Store the vector in a new column (body_search_vector?)
  3. Add a GIN index to the new column.
  4. Change the history page query to use the search vector instead of casting the body as a string.

NOTE: re: dictionary choice, we need to make sure that this accommodates a variety of searching patterns - please can we document at least 5 varying json document structure and/or search string variations. With tsvector it's entirely possible that compound words (camelCase) and other "not English words" might be ignored.

@christad92 christad92 added the Launch MVP Features that are critical to launching the MVP label Mar 26, 2024
@taylordowns2000 taylordowns2000 self-assigned this Mar 26, 2024
taylordowns2000 added a commit that referenced this issue Mar 27, 2024
* make key DB options configurable via ENV

* allow the delete query to take 100s

* remove unused jsonb index on dataclips to make insert/delete faster, see #1939

* use envy"

* remove unused

* update changelog

* always set queue target and interval

* disable ddl trans, migration lock; add indexes to steps
taylordowns2000 added a commit that referenced this issue Mar 29, 2024
taylordowns2000 added a commit that referenced this issue Apr 1, 2024
* make key DB options configurable via ENV

* use envy

* remove unused

* add tests to lock in current functionality

* fix rejected status

* fix #1794, close #1939

* format

* changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Launch MVP Features that are critical to launching the MVP
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants