Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PostgreSQL to_tsvector function with multiple fields #884

Closed
wants to merge 0 commits into from

Conversation

Charnelx
Copy link

@Charnelx Charnelx commented Mar 9, 2018

Hi.
I added SearchVectorFilter to support to_tsvector function for full text search. Theare two problems I faced:

  • this thing is vendor-specific (yep, only PostgreSQL)
  • to search by multiple fields we need to create one more argument (e.x. field_names) or use existing field_name with additional type check

Pros:

  • fast search
  • search by multiple fields at one's

Cons:

  • used only with PostgreSQL
  • dual meaning of field_name argument wich may lead to misunderstanding and mistakes

Case of use:

models.py

class Person(models.Model):

    name = models.CharField(max_length=128, verbose_name='person name')
    surname = models.CharField(max_length=128, verbose_name='person surname')

views.py

class PersonFilter(filters.FilterSet):
    person = SearchVectorFilter(field_names=['name', 'surname'], lookup_expr='icontains')

Copy link
Collaborator

@rpkilby rpkilby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I've added more detailed feedback below, but in short, overloading field_name isn't the correct thing to do here. Instead, extend the Filter to use additional arguments.

Also, tests will need to be provided.

@@ -78,8 +78,19 @@ def get_field_parts(model, field_name):
>>> [p.verbose_name for p in parts]
['author', 'first name']

>>> parts = get_field_parts(Book, [author__first_name, 'surname'])
>>> [p.verbose_name for p in parts]
['author', 'first_name', 'surname']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_field_parts is not intended to return a list of field names, but a list of the individual parts of a related field name (FK, m2m, 1to1). This result would be nonsensical, as first_name is a regular field on the author model, and does not represent a relationship to an additional Name model or something.

Either way, as explained above, overloading field_name is not the correct path, and these changes are a symptom of that. I would revert these changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You truth, I have read method descption about "the traversable relationships" and..ignored it.

@@ -484,6 +486,38 @@ class TimeRangeFilter(RangeFilter):
field_class = TimeRangeField


class SearchVectorFilter(Filter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inherit CharFilter, as you are validating text.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonably and thus accepted.

@@ -484,6 +486,38 @@ class TimeRangeFilter(RangeFilter):
field_class = TimeRangeField


class SearchVectorFilter(Filter):

Copy link
Collaborator

@rpkilby rpkilby Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overloading the purpose of field_name, create a custom __init__ that requires users to pass in a list of search fields. eg,

# Note that this provides a default `field_name` to be used for the annotation
# and query, and that `search_fields` will be a required keyword argument.
def __init__(self, field_name='search_vector', lookup_expr='exact', *, search_fields, **kwargs):
    super().__init__(field_name, lookup_expr, **kwargs)

    self.search_fields = search_fields

From here, it's really simple to overload the .filter() method to add in the annotation

def filter(self, qs, value):
    # Note that the annotation is operating on the filter's `field_name`, which was
    # set above to 'search_vector' as the filter's default value.
    qs.annotate(self.field_name=SearchVector(*self.search_fields))
    
    return super().filter(qs, value)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was the way it should be implemented. Thanks.

@Charnelx
Copy link
Author

@rpkilby great thanks for detailed response!
Well, adding additional argument to Filter was my first idea but I thought that I would be able to make something that not require any external change at all. Now I see the failure of this approach.

I'll take into account your remarks and make another try + will write tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants