Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #31071 -- Disabled insert optimization for primary keys with defaults when loading fixtures. #12209

Merged
merged 2 commits into from Dec 30, 2019

Conversation

charettes
Copy link
Member

No description provided.

@felixxm felixxm self-assigned this Dec 12, 2019
@reupen
Copy link

reupen commented Dec 12, 2019

👍

Does the 'How Django knows to UPDATE vs. INSERT' section in the docs also need an update?

You may have noticed Django database objects use the same ``save()`` method
for creating and changing objects. Django abstracts the need to use ``INSERT``
or ``UPDATE`` SQL statements. Specifically, when you call ``save()``, Django
follows this algorithm:
* If the object's primary key attribute is set to a value that evaluates to
``True`` (i.e., a value other than ``None`` or the empty string), Django
executes an ``UPDATE``.
* If the object's primary key attribute is *not* set or if the ``UPDATE``
didn't update anything (e.g. if primary key is set to a value that doesn't
exist in the database), Django executes an ``INSERT``.

@felixxm felixxm changed the title Fixed #31071 -- Disabled default primary insert optimization when loading fixtures. Fixed #31071 -- Disabled insert optimization for primary keys with defaults when loading fixtures. Dec 12, 2019
docs/releases/3.0.txt Outdated Show resolved Hide resolved
@felixxm
Copy link
Member

felixxm commented Dec 12, 2019

Does the 'How Django knows to UPDATE vs. INSERT' section in the docs also need an update?

There is already a warning:

"The one gotcha here is that you should be careful not to specify a primary-key value explicitly when saving new objects, if you cannot guarantee the primary-key value is unused."

so IMO versionchanged annotation will do the job.

@felixxm
Copy link
Member

felixxm commented Dec 12, 2019

@charettes Thanks for this patch 👍 I updated docs.

docs/releases/3.0.txt Outdated Show resolved Hide resolved
Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @charettes, @felixxm.

I'm going to say this here, so you can say, "No" at your leisure. 🙂

I think this might be a mistake...

From the ticket, with the default in place, I think a lot of folks are doing this:

s0 = Sample.objects.create()
s1 = Sample(pk=s0.pk, name='Test 1')
s1.save()

...and so are going to update to broken code.

Yes, they can/could put force_update in, but the line on that is this:

It should be very rare that you’ll need to use these parameters. Django will almost always do the right thing and trying to override that will lead to errors that are difficult to track down. This feature is for advanced use only.

I think here we're saying "Despite the claim that you'll rarely need them, you're going to have to use these flags" for code that is quite common, has been working correctly since Day 0. I think a lot of folks will bump into that, and that it's no longer true that "Django will almost always do the right thing"

I accept that "it does reduce the number of queries significantly when using primary key defaults" (Simon from ticket) but I think we're putting that performance over correct-by-default.

SO... 😬 I think we should revert 85458e9. We could then reopen ticket-29260 but, Tim's original comment there:

A fix might try to detect if the primary key came from a default...

5 seconds thinking about that, and I'm back to (the preceding):

I'm not sure if the issue can or should be fixed.

— it's just not going to be worth the code. (Maybe there is some way of tracking it that's clean but...)

I'd instead say, "use force_insert" in this case.

(Why's that better than the "use force_update" here? Because we're not breaking anything, and the default behaviour, whilst having a redundant query, is still correct.)

Thus I would say ticket-29260 should ultimately be wontfix.

So, have a think...


If the two of you decide that this (i.e. this PR) is the way forward, then OK.

I have a couple of comments:

  • I think the How Django knows to UPDATE vs. INSERT docs are no longer correct. (Probably should have been adjusted for 85458e9.) They'll need at least addition of a second gotcha, but, as I read it, the first "If the object’s primary key attribute is set to a value that evaluates to True..." is no longer true.
  • Then in the release notes, I think we should mention force_update. update_or_create() is very good, yes, but it's an entirely different API to the code we've broken, so we should explicitly tell folks the smallest change that fixes it.

(Wincing about this. Loop back to top. I leave it with you.)

@charettes
Copy link
Member Author

charettes commented Dec 12, 2019

...and so are going to update to broken code.

Only in the cases where they've defined a primary key with a default value. Most of the cases where an implicit id or explicit AutoField subclass won't be affected by that.

I don't have a strong feeling either way even if I think the changes proposed here makes the most sense as I've explained on Trac. It feels like in order to properly implement this optimization we'd need a way to differentiate between a user provided primary key value and default generated one as Tim mentioned.

One of the reason I think this is that doing a SELECT followed by an INSERT when a default value is provided simply doesn't make sense when dealing with database generated defaults (#11783). For example, that would result in

SELECT id FROM table WHERE id = UUID()

There's no row so we'll proceed with an INSERT with a completely different UUID() return value.

This behavior will have to be changed to perform a complex query of the form

SELECT UUID() as generated_id, EXISTS(SELECT 1 FROM table WHERE id = generated_id)

Which certainly not what a user would expect a model of the form

class Order(models.Model):
    id = models.UUIDField(primary_key=True, default=UUID)

to do when they perform an Order().save().

We'll have to figure this one out in the near future but shipping an half baked solution is likely not best way forward.

@carltongibson
Copy link
Member

carltongibson commented Dec 12, 2019

Only in the cases where they've defined a primary key with a default value. Most of the cases where an implicit id or explicit AutoField subclass won't be affected by that.

But (all?) UUID PKs will be, right? (Which is pretty common now, and why we've had a case come in already...) — tl;dr I don't think this is a super-edge-case...

Anyhow, as I say, I'm happy to go with yours and Mariusz' best judgement, but I started looking at the docs change and was... "Oh..." -- so I wanted to raise my concern.

@charettes
Copy link
Member Author

why we've had a case come in already...

Right but to be fair it was not about using this pattern directly but because fixture loading was broken. Not saying this pattern is not used in the wild but the report was about broken fixture loading.

@carltongibson
Copy link
Member

Yep. Good point. 😉

@felixxm
Copy link
Member

felixxm commented Dec 12, 2019

I know that these three lines

s0 = Sample.objects.create()
s1 = Sample(pk=s0.pk, name='Test 1')
s1.save()

don't look unusual at first glance, but ... I want to clarify what is broken, so object's primary key is known and we create a new model instance to update this object. Moreover this is broken only for primary keys with defaults. TBH, I'm surprised that we support creating a new model instance to update an existing one, it's quite unnatural (at least for me) and error-prone (a data loss possibility). I would expect to filter an object and update it, e.g.

Sample.objects.filter(pk=s0.pk).update(name='Test 1')

You can also use update_or_create() 🤷‍♂️ . Nevertheless, I understand concerns, we can wait few days for other opinions 🗣️ and decide in the next week.

@carltongibson
Copy link
Member

... it's quite unnatural (at least for me)...

Yep. Totally agree. But folks but it, and my thought is that it's quite common, combined with that for UUID PKs having a default is the standard...

Anyhow... as I say, as long as you and Simon are agreed, for which the double-check here is enough, I'm happy to go forward, but I do think we need the docs to be accurate about the situation.

@felixxm
Copy link
Member

felixxm commented Dec 18, 2019

@charettes What do you think? IMO we should move this patch forward.

Do you have an idea how to clarify bullet points in the "How Django knows to UPDATE vs. INSERT"?

@carltongibson
Copy link
Member

Happy to help work on the docs tomorrow if you folks like?

@charettes
Copy link
Member Author

@felixxm I think we should move forward as well.

Do you have an idea how to clarify bullet points in the "How Django knows to UPDATE vs. INSERT"?

What about

* If the object's primary key attribute is *not* set or if the primary key field
  defines a :attr:`~django.db.models.Field.default` or if the ``UPDATE``
  didn't update anything (e.g. if primary key is set to a value that doesn't
  exist in the database), Django executes an ``INSERT``.

Any thoughts about that @reupen?

@reupen
Copy link

reupen commented Dec 20, 2019

Looks good for the second bullet point, but the first one will also need an update if I'm not mistaken since the UPDATE will be skipped if the primary key field has a default defined and a new model instance is being saved.

It is getting a bit complicated admittedly – maybe the 'primary key field has a default defined' case could be separated from the 'primary key field doesn't have a default defined' case? For example, something like:

If if the primary key field defines a :attr:`~django.db.models.Field.default`:

* if it is an existing model instance, Django executes an ``UPDATE``
* if it is a new model instance, or if the ``UPDATE``
  didn't update anything (e.g. if primary key is set to a value that doesn't
  exist in the database), Django executes an ``INSERT``

If if the primary key field does *not* define a :attr:`~django.db.models.Field.default`:

<existing two bullet points>

@felixxm
Copy link
Member

felixxm commented Dec 27, 2019

@reupen @charettes Thanks 👍 I rebased from master, and tried to explain the current algorithm in the How Django knows to UPDATE vs. INSERT section as clearly as possible 🤕 .

@charettes
Copy link
Member Author

The new wording makes sense to me @felixxm, thanks!

…faults when loading fixtures.

Model.save_base() is called directly when loading fixtures and assumes
existing rows will be updated. Branching of "raw" allows to maintain
the optimization introduced in #29260 while supporting this edge case.

Regression in 85458e9.

Thanks Reupen Shah for the report.
@felixxm felixxm merged commit 9e14bc2 into django:master Dec 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants