Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #31685 -- Added support for updating conflicts to QuerySet.bulk_create(). #13065

Merged
merged 1 commit into from
Jan 20, 2022

Conversation

ChihSeanHsu
Copy link
Contributor

@ChihSeanHsu ChihSeanHsu commented Jun 14, 2020

Ticket #31685

Adding upsert_conflicts arguments to bulk_create.
This change only supports to mysql, sqlite3 and postgresql.

  • unittest
  • refactor
  • document

@ChihSeanHsu ChihSeanHsu marked this pull request as draft June 14, 2020 17:34
@apollo13
Copy link
Member

apollo13 commented Jun 16, 2020

I just gave the patch a quick glance, but I am wondering if it wouldn't make more sense to have the argument similar to the ForeignKey.on_delete, ie on_conflict = models.IGNORE | models.UPDATE? This might allow for more advanced "on conflict update set xyz=aaa" forms in the future.

@ChihSeanHsu ChihSeanHsu marked this pull request as ready for review June 16, 2020 16:06
@ChihSeanHsu
Copy link
Contributor Author

I just gave the patch a quick glance, but I am wondering if it wouldn't make more sense to have the argument similar to the ForeignKey.on_delete, ie on_conflict = models.IGNORE | models.UPDATE? This might allow for more advanced "on conflict update set xyz=aaa" forms in the future.

It's a good idea. I'll change this argument to what you suggest.
Thanks for your help.

@felixxm
Copy link
Member

felixxm commented Jun 16, 2020

I just gave the patch a quick glance, but I am wondering if it wouldn't make more sense to have the argument similar to the ForeignKey.on_delete, ie on_conflict = models.IGNORE | models.UPDATE? This might allow for more advanced "on conflict update set xyz=aaa" forms in the future.

It was discussed in the original ticket-28668 and #9192, see comment.

@felixxm
Copy link
Member

felixxm commented Jun 16, 2020

I would change upsert_conflicts to update_conflicts (True/False).

@ChihSeanHsu
Copy link
Contributor Author

I would change upsert_conflicts to update_conflicts (True/False).

Okay, I've changed upsert to update.
And you provide the topic that is about not using string constant instead of boolean argument.
I use boolean arguments(update_conflicts, ignore_conflicts) for high-level function bulk_create , but use string constant in low-level function that's called by bulk_create. Because we can decrease some duplicate code by using string constant in low-level function.

I don' know it is a good way to doing like this or not.
Could anyone give some advice?

@chrisconlan
Copy link

Thank you for writing this pull request. Looking forward to using this feature

@jacobtylerwalls
Copy link
Member

jacobtylerwalls commented Oct 28, 2020

Hi @ChihSeanHsu I noticed you ticked Needs Tests and Needs Documentation on the ticket, which signals to reviewers this isn't ready to review. Be sure to untick those if you've got tests and docs and you're ready for a review. Also this would be a good time to resolve the merge conflict. Thanks!

@ChihSeanHsu
Copy link
Contributor Author

Hi @ChihSeanHsu I noticed you ticked Needs Tests and Needs Documentation on the ticket, which signals to reviewers this isn't ready to review. Be sure to untick those if you've got tests and docs and you're ready for a review. Also this would be a good time to resolve the merge conflict. Thanks!

Hi @jacobtylerwalls,
Thanks for your notification, I'll merge conflict later.

@alextatarinov
Copy link
Contributor

alextatarinov commented Nov 9, 2020

Hi @ChihSeanHsu, great job! However, I think there is some room for improvement.
With ignore_conflicts it makes sense to just say "ON CONFLICT DO NOTHING" (on any conflict), but IMHO it is more complicated with upsert.
There is more than fields with unique=True that can produce the conflict. At least, there are unique_together, and actually, there can be any unique constraint added either via RunSQL or manual database change, and we don't know what exactly is going to conflict. Therefore, there is a need for a more granular API, which allows users to specify both unique_fields for the ON CONFLICT part and update_fields, similarly to update_or_create.
We are using this library, and in my experience, it does not seem possible to automatically infer the intended behavior (what to update on what fields conflict) and must be specified by the developer.

@hannseman
Copy link
Contributor

@alextatarinov good point, we would need similar logic to what's used in Model._get_unique_checks to take unique_together and UniqueConstraint in account:

unique_togethers = [(self.__class__, self._meta.unique_together)]
constraints = [(self.__class__, self._meta.total_unique_constraints)]
for parent_class in self._meta.get_parent_list():
if parent_class._meta.unique_together:
unique_togethers.append((parent_class, parent_class._meta.unique_together))
if parent_class._meta.total_unique_constraints:
constraints.append(
(parent_class, parent_class._meta.total_unique_constraints)
)

But as you say it would probably make more sense to let the user specify the fields to conflict on.

@ChihSeanHsu
Copy link
Contributor Author

Hi @alextatarinov , thanks for your suggestion, I'll add it to this pull request.

@ChihSeanHsu
Copy link
Contributor Author

Hi @alextatarinov and @hannseman,

I have a question need to clarify, this new feature is based on bulk_create and it is based on INSERT ... ON DUPLICATE KEY DO ... or INSERT ... ON CONFLICT (...) DO ....
Conflicts will only happen when pk or unique keys duplicates, because it's a insert action.
IMHO, we only need to let user input what fields(update_fields) they want to update, and I'll add more detailed in get unique_fields.
Or do you two have any other concern?

And upsert function in this library is not using this SQL command, it controls this action in application level not database.

@alextatarinov
Copy link
Contributor

alextatarinov commented Nov 15, 2020

Hello again, @ChihSeanHsu. If you specify native=True, the linked library uses the ON CONFLICT statement via the querybuilder library.
I see where your logic goes, but I think there is a reason why the conflict target is required to provide when using DO UPDATE (https://www.postgresql.org/docs/9.5/sql-insert.html#SQL-ON-CONFLICT). It means the database cannot automatically infer the conflict target, neither we can do it in Django. I am not sure why, one thing that comes to mind is that the row inserted can conflict with multiple other rows on different constraints, and the database won't know which row to update. Therefore, it's the developer who should specify unique fields that may lead to conflict + update.
On a side note, I think the inability to specify unique_fields for ignore_conflicts is also a shortcoming and should be addressed (maybe even in this ticket).

P.S. I am nowhere an expert, so I suggest we ask someone more experienced.
P.P.S. It turns out, that when multiple rows conflict, all of them are updated in MySQL (which does not allow specifying unique_fields). You can also add a LIMIT to update only X first rows matched. https://stackoverflow.com/a/16377933.
Now I really wonder why PostgreSQL requires the conflict_target with ON UPDATE. I believe PostgreSQL is better designed than MySQL in general, so I would say their approach is preferable (require to specify unique_fields). But if I am not mistaking, this is not supported by MySQL, unfortunately. Not sure which API and behavior would be the best then.


for unique_together in opts.unique_together:
unique_fields.extend(unique_together)

Copy link
Contributor

@hannseman hannseman Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also collect class based unique-constraints, you get can them from opts.total_unique_constraints.

@hannseman
Copy link
Contributor

hannseman commented Nov 15, 2020

I'm not really sure that collecting all unique fields and then using them for conflict_target will work in PostgresSQL. My understanding is that it expects a field or a set of fields that can be inferred as a unique constraint. In other words the fields that is passed as conflict_target needs to match a single unique constraint. https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

For example:

CREATE TABLE person(
    email varchar UNIQUE,
    phone varchar UNIQUE,
    name varchar
);
persons = [Person(name="Some name", phone="+46700000000", email="test@example.com")]
Person.objects.bulk_create(persons, update_conflicts=True, update_fields=["name"])

Would generate the following SQL:

INSERT INTO person(email, phone, name) VALUES ('test@example.com', '+46700000000', 'Some name')
ON CONFLICT (email, phone)
DO UPDATE SET name = EXCLUDED.name;

This would error with ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification as no composite index on (email, phone) exists.

@ChihSeanHsu
Copy link
Contributor Author

Hello again,

Hello again, @ChihSeanHsu. If you specify native=True, the linked library uses the ON CONFLICT statement via the querybuilder library.
I see where your logic goes, but I think there is a reason why the conflict target is required to provide when using DO UPDATE (https://www.postgresql.org/docs/9.5/sql-insert.html#SQL-ON-CONFLICT). It means the database cannot automatically infer the conflict target, neither we can do it in Django. I am not sure why, one thing that comes to mind is that the row inserted can conflict with multiple other rows on different constraints, and the database won't know which row to update. Therefore, it's the developer who should specify unique fields that may lead to conflict + update.
On a side note, I think the inability to specify unique_fields for ignore_conflicts is also a shortcoming and should be addressed (maybe even in this ticket).

P.S. I am nowhere an expert, so I suggest we ask someone more experienced.
P.P.S. It turns out, that when multiple rows conflict, all of them are updated in MySQL (which does not allow specifying unique_fields). You can also add a LIMIT to update only X first rows matched. https://stackoverflow.com/a/16377933.
Now I really wonder why PostgreSQL requires the conflict_target with ON UPDATE. I believe PostgreSQL is better designed than MySQL in general, so I would say their approach is preferable (require to specify unique_fields). But if I am not mistaking, this is not supported by MySQL, unfortunately. Not sure which API and behavior would be the best then.

After I surveyed and as you mentioned, the unique_fields argument is only for user using postgresql and sqlite.
If user use mysql as their database, they only can specify update_fields.
In my opinion, it's not a good solution, but there is this kind of limit.

I'm not really sure that collecting all unique fields and then using them for conflict_target will work in PostgresSQL. My understanding is that it expects a field or a set of fields that can be inferred as a unique constraint. In other words the fields that is passed as conflict_target needs to match a single unique constraint. https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

For example:

CREATE TABLE person(
    email varchar UNIQUE,
    phone varchar UNIQUE,
    name varchar
);
persons = [Person(name="Some name", phone="+46700000000", email="test@example.com")]
Person.objects.bulk_create(persons, update_conflicts=True, update_fields=["name"])

Would generate the following SQL:

INSERT INTO person(email, phone, name) VALUES ('test@example.com', '+46700000000', 'Some name')
ON CONFLICT (email, phone)
DO UPDATE SET name = EXCLUDED.name;

This would error with ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification as no composite index on (email, phone) exists.

And I will add more unittest to cover this situation.

@ChihSeanHsu ChihSeanHsu marked this pull request as draft November 22, 2020 17:40
@ChihSeanHsu ChihSeanHsu force-pushed the dev/bulk_create_allow_upsert branch 2 times, most recently from d554d68 to fe3e848 Compare December 6, 2020 16:25
@lithammer
Copy link

lithammer commented Dec 21, 2020

Not sure if it's my place to comment but... I wonder if the API could be made a bit more flexible by modeling a bit more like QuerySet.update()? For example we have a use-case where we want increment fields when there's a conflict, rather than replacing. Something similar to this:

CREATE TABLE foo (
    timestamp timestamp NOT NULL,
    num_requests integer NOT NULL
);
INSERT INTO
    foo (timestamp, num_requests)
VALUES
    ('2020-12-21T12:00:00', 10)
ON CONFLICT
    (timestamp)
DO UPDATE SET
    num_requests = foo.num_requests + EXCLUDE.num_requests;
foos = [
    Foo(timestamp="2020-12-21T12:00:00", num_requests=F("num_requests") + 10),
]
Foo.objects.bulk_create(foos, update_conflicts=True, update_fields=["num_requests"])

Not sure how portable that is though, or if it's mostly a PostgreSQL thing.

@ChihSeanHsu
Copy link
Contributor Author

"num_requests"

Hi @lithammer,

It's a good suggestion, but bulk_create action is base on InsertQuery and F is not allowed to use in it.
If we want to support this flexible usage, we need to refactor lot of code.

But if there is requirement, we can discuss it.

@ChihSeanHsu ChihSeanHsu marked this pull request as ready for review January 24, 2021 14:41
Base automatically changed from master to main March 9, 2021 06:21
@jacobtylerwalls
Copy link
Member

Thanks for the continued work here. Is this ready for re-review? If so, please make sure to uncheck Needs Improvement on the ticket each time it's ready for re-review. (Would be helpful if you could also bump the release note and change annotations to 4.1). Cheers.

@ChihSeanHsu ChihSeanHsu force-pushed the dev/bulk_create_allow_upsert branch 4 times, most recently from ddd5ab3 to 0da99dd Compare October 18, 2021 14:05
@ChihSeanHsu
Copy link
Contributor Author

Thanks for the continued work here. Is this ready for re-review? If so, please make sure to uncheck Needs Improvement on the ticket each time it's ready for re-review. (Would be helpful if you could also bump the release note and change annotations to 4.1). Cheers.

Hi @jacobtylerwalls ,
Thanks for your remind.
I've update release note to 4.1 and change ticket status.

Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChihSeanHsu Thanks for updates 👍 I left more comment, however didn't go through docs and tests in detail. I'll do this in the next round.

Please add also release notes for the new feature.

docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
django/db/backends/mysql/operations.py Outdated Show resolved Hide resolved
tests/bulk_create/tests.py Outdated Show resolved Hide resolved
django/db/backends/base/operations.py Show resolved Hide resolved
django/db/backends/sqlite3/features.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
tests/bulk_create/tests.py Outdated Show resolved Hide resolved
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChihSeanHsu Thanks 👍 I left more comment. Again I didn't go through docs and tests in detail.

django/db/backends/sqlite3/operations.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/backends/sqlite3/operations.py Outdated Show resolved Hide resolved
tests/bulk_create/tests.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChihSeanHsu Thanks for updates and all your efforts 👍 I left small comments.

It's almost ready for my final review/edits 🥇

django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
tests/bulk_create/tests.py Outdated Show resolved Hide resolved
tests/bulk_create/tests.py Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
@felixxm
Copy link
Member

felixxm commented Jan 17, 2022

@ChihSeanHsu Do you have time to keep working on this?

@ChihSeanHsu
Copy link
Contributor Author

@ChihSeanHsu Do you have time to keep working on this?

@felixxm
Sorry.
I was a little busy previous months and did not notice that it was under another reviews.
I'll update it later, thanks.

@felixxm felixxm self-assigned this Jan 18, 2022
@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch 2 times, most recently from 637bd4f to 54244ac Compare January 18, 2022 06:12
@felixxm
Copy link
Member

felixxm commented Jan 18, 2022

I squashed commits and rebased from the main branch. I'll work on final edits.

@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch 2 times, most recently from 5f2e849 to c32e98f Compare January 18, 2022 11:06
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChihSeanHsu I pushed edits to docs and main changes. I'm going to review the tests tomorrow.

django/db/backends/mysql/operations.py Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
django/db/models/query.py Outdated Show resolved Hide resolved
@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch 2 times, most recently from f055eaa to 9aeee21 Compare January 19, 2022 09:27
@felixxm
Copy link
Member

felixxm commented Jan 19, 2022

@ChihSeanHsu Thanks again for all your efforts 🥇 Welcome aboard ⛵

I pushed edits to tests. IMO, it's ready 🥳

@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch from 9aeee21 to fba23da Compare January 19, 2022 09:51
@felixxm felixxm changed the title #31685 -- Support updating conflicts with QuerySet.bulk_create(). Fixed #31685 -- Added support for updating conflicts to QuerySet.bulk_create(). Jan 19, 2022
@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch from fba23da to 0e62ffe Compare January 19, 2022 10:01
…_create().

Thanks Florian Apolloner, Chris Jerdonek, Hannes Ljungberg, Nick Pope,
and Mariusz Felisiak for reviews.
@felixxm felixxm force-pushed the dev/bulk_create_allow_upsert branch from 0e62ffe to 0f69464 Compare January 19, 2022 19:19
@felixxm felixxm merged commit 0f69464 into django:main Jan 20, 2022
@ChihSeanHsu
Copy link
Contributor Author

@ChihSeanHsu Thanks again for all your efforts 🥇 Welcome aboard ⛵

I pushed edits to tests. IMO, it's ready 🥳

@felixxm
Thanks for your helps and I'm glad to do this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants