Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #27589 -- Added managed triggers for SearchVectorField. #7678

Closed
wants to merge 8 commits into from
Closed

Fixed #27589 -- Added managed triggers for SearchVectorField. #7678

wants to merge 8 commits into from

Conversation

eukreign
Copy link
Contributor

@eukreign eukreign commented Dec 11, 2016

This is a significantly enhanced version of django.contrib.postgresql.search.SearchVectorField with automatically generated and managed database trigger.

Remaining todo's:

  • documentation
  • only update the tsvector column if the fields on which it depends have changed (need to add a conditional in the pl/pgsql function check if relevant columns have changed)
  • instead of hard coding language in the field definition, also support column name

There are probably a lot of other things that will need to be done. Please provide feedback, thanks!

https://code.djangoproject.com/ticket/27589

@timgraham timgraham changed the title SearchVectorField with built-in managed triggers Fixed #27589 -- Added managed triggers for SearchVectorField. Dec 12, 2016
@timgraham
Copy link
Member

It would be appropriate to post on the django-developers mailing list to try to get feedback from people who are using the field.

@adamchainz
Copy link
Sponsor Member

  1. This is making django.contrib.postgres a requirement for using the Django postgres backend, which it isn't at current (afaik)

  2. This is also the first instance I know of triggers being touched by Django. Typically Django has stayed away from them, preferring such things to be done in Python code with signals etc. I don't know enough here to see why a trigger is preferable to a pre_save signal, but I'd like to see the justification.

I'd also like to know if it is possible to manage the trigger as objects on their own, maybe using Expressions, with their own separate, concrete migration operations, etc. Adding a code to the postgres schema editor that manages one specific trigger seems to me to move us away from this laudable goal.

Another thing about adding triggers is that it changes backup procedure. Idk if it's common to exclude them on postgres (pg_dump seems to include them by default?), but on MySQL the main backup utilities mysqldump and mydumper don't dump triggers by default. If Django starts managing triggers, it has to be really clearly communicated that backups now need to include them, otherwise a lot of users will be very unhappy when restoring their backups.

@charettes
Copy link
Member

As Adam mentioned there's a lot of reason to believe a trigger based approach is not always the best solution.

Is there a reason this cannot live as a third-party package? For the trigger creation and removal part you could do without a custom database backend by hooking into the pre_migrate signal and injecting some AddTSVectorTrigger and RemoveTSVectorTrigger operations when detecting an AddField or RemoveField operation the plan.

@eukreign
Copy link
Contributor Author

The primary reason I added this is so that the SearchVectorField() could be more of a black box. You add it to your model, make/run migrations and "it just works".

On the other hand, I can't think of anything a database trigger solves that is otherwise impossible to do with a pre_save signal in Python. Although, I think most pre_save scenarios would require at least one extra query to perform whatever action you want.

In cases where you want to cut down on the number of queries or how much data you are sending to postgres then stored procedures become very useful.

I'm open to doing a more general purpose stored procedure building system. But I don't think that should be a blocker for this feature. The fact that the underlying code for the new SearchVectorField is very specialized is an implementation detail. Once a general purpose procedure framework exists we could replace the current specialized version and the SearchVectorField wouldn't know the difference.

@eukreign
Copy link
Contributor Author

Can we keep this pull request open a bit longer to get more feedback. If it turns out that it's not worth to add this to contrib.postgres I can definitely do it as a 3rd party package.

@charettes Thanks for the tip about the pre_migrate, I didn't know about that.

@charettes
Copy link
Member

@eukreign there's an example of the pre_migrate approach in django.contrib.contenttypes and #7673.

@eukreign
Copy link
Contributor Author

Thanks for all of the feedback. I've decided to close the pull request. Using pre_save is probably good enough for most people. I may switch to that approach as well.

@eukreign eukreign closed this Dec 12, 2016
@timgraham
Copy link
Member

Could you update (close?) the ticket and mailing list thread also?

@eukreign
Copy link
Contributor Author

Actually, is it possible to do a INSERT with a single statement? I setup a pre_save I can't figure out a way to use the SearchVector() this way...

from django.contrib.postgres.search import SearchVector
@receiver(pre_save, sender=Task)
def task_search_vector(sender, instance, **kwargs):
    instance.search = SearchVector('name', 'description')

When I save a Task, it throws the following exception:

  File "/home/lex/projects/django/django/db/models/sql/compiler.py", line 931, in prepare_value
    'can only be used to update, not to insert.' % (value, field)
ValueError: Failed to insert expression "SearchVector(Coalesce(Col(task_task, task.Task.name), Value()) || ' ' || Coalesce(Col(task_task, task.Task.description), Value()))" on task.Task.search. F() expressions can only be used to update, not to insert.

Maybe the trigger approach is still useful?

@eukreign eukreign reopened this Dec 12, 2016
@eukreign eukreign closed this Dec 12, 2016
@jarshwah
Copy link
Member

@eukreign yes, expressions can be used for INSERT as of https://code.djangoproject.com/ticket/24509 / 134ca4d

@eukreign
Copy link
Contributor Author

@jarshwah as you can see from the traceback above it does not seem to work, at least not for SearchVector, the traceback is from current master version of django.

@adamchainz
Copy link
Sponsor Member

@eukreign F expressions explicitly can't be used for INSERT as they don't make sense, since there is no row to select the value out of.

@jarshwah
Copy link
Member

Another thing to consider is that bulk changes do not fire signals per instance.

@eukreign
Copy link
Contributor Author

I have created a separate package for this for anyone interested:

https://github.com/damoti/django-tsvector-field

@eukreign eukreign deleted the tsvector_trigger branch December 13, 2016 16:48
@charettes
Copy link
Member

@eukreign great work!

I think you might be interested in this DEP about read-only fields. This feature would allow making your SearchVectorField refresh automatically on save on PostgreSQL by relying on RETURNING.

@eukreign
Copy link
Contributor Author

Definitely looking forward to the readonly field attribute. I've created an issue in django-tsvector-field to track this.

As far as RETURNING, I would actually want to go in the opposite direction. I would like to make the field deferred by default. The contents of the tsvector is an implementation detail in postgres, it's kind of pointless to keep shuttling this back and forth between Postgres and Django.

Is there a good way to make a field deferred by default?

@charettes
Copy link
Member

As far as RETURNING, I would actually want to go in the opposite direction. I would like to make the field deferred by default. The contents of the tsvector is an implementation detail in postgres, it's kind of pointless to keep shuttling this back and forth between Postgres and Django.

You should mention that on the DEP, maybe there's a use case for always deferring such fields.

Is there a good way to make a field deferred by default?

Not yet unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants