Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare entity-level commit order computation in the UnitOfWork #10651

Conversation

mpdude
Copy link
Contributor

@mpdude mpdude commented Apr 24, 2023

This is the second chunk to break #10547 into smaller PRs suitable for reviewing. It prepares the UnitOfWork to work with a commit order computed on the entity level, as opposed to a class-based ordering as before.

Background

#10531 and #10532 show that it is not always possible to run UnitOfWork::commit() with a commit order given in terms of entity classes. Instead it is necessary to process entities in an order computed on the object level. That may include

  • a particular order for multiple entities of the same class
  • a particular order for entities of different classes, possibly even going back and forth (entity $a1 of class A, then $b of class B, then $a2 of class A – revisiting that class).

This PR is about preparing the UnitOfWork so that its methods will be able to perform inserts and deletions on that level of granularity. Subsequent PRs will deal with implementing that particular order computation.

Suggested change

Change the private executeInserts and executeDeletions methods so that they do not take a ClassMetadata identifying the class of entities that shall be processed, but pass them the list of entities directly.

The lists of entities are built in two dedicated methods. That happens basically as previously, by iterating over $this->entityInsertions or $this->entityDeletions and filtering those by class.

Potential (BC breaking?) changes that need review scrutiny

  • \Doctrine\ORM\Persisters\Entity\EntityPersister::addInsert() was previously called multiple times, before the insert would be performed by a call to \Doctrine\ORM\Persisters\Entity\EntityPersister::executeInserts(). With the changes here, this batching effectively no longer takes place. executeInserts() will always see one entity/insert only, and there may be multiple executeInserts() calls during a single UoW::commit() phase.
  • The caching in \Doctrine\ORM\Cache\Persister\Entity\AbstractEntityPersister previously would cache entities from the last executed insert batch only. Now it will collect entities across multiple batches. I don't know if that poses a problem.
  • Overhead in \Doctrine\ORM\Persisters\Entity\BasicEntityPersister::executeInserts is incurred multiple times; that may, however, only be about SQL statement preparation and might be salvageable.
  • The postPersist event previously was not dispatched before all instances of a particular entity class had been written to the database. Now, it will be dispatched immediately after every single entity that has been inserted.

postPersist event changes – meaningful?

Documentation on the postPersist event is here: https://www.doctrine-project.org/projects/doctrine-orm/en/2.14/reference/events.html#postupdate-postremove-postpersist

The postPersist event occurs for an entity after the entity has been made persistent. It will be invoked after the database insert operations. Generated primary key values are available in the postPersist event.

"It will be invoked after the database insert operations" is a bit imprecise. It previously was "after the database insert operations for that entity class, but before operations for other classes". With this PR, it would be "after the database insert operations for this entity instance".

"Generated primary key values are available in the postPersist event" should better be written as "a generated primary key value for the entity is available in the postPersist event".

Alternatively, we could defer postPersist events until all new entities have been inserted. Either way, it's a slight change from the previous implementation where the event would run after all entities of a particular class had been processed.

Extra bonus

This is what the DALL·E AI thinks it looks like when an entity persister will have to process entities one-by-one and is no longer able to batch the inserts.

DALL·E 2023-04-24 14 07 14

@mpdude mpdude changed the title This is the second chunk to break #10547 into smaller PRs suitable for reviewing. It prepares the UnitOfWork to work with a commit order determined at the entity instance level, as opposed to a class-based ordering as before. Prepare entity-level commit order computation in the UnitOfWork Apr 24, 2023
@mpdude mpdude force-pushed the unit-of-work-schedule-single-entities branch 3 times, most recently from e53681d to 1b1e673 Compare April 24, 2023 10:45
@mpdude mpdude marked this pull request as ready for review April 24, 2023 11:30
This is the second chunk to break doctrine#10547 into smaller PRs suitable for reviewing. It prepares the `UnitOfWork` to work with a commit order computed on the entity level, as opposed to a class-based ordering as before.

#### Background

doctrine#10531 and doctrine#10532 show that it is not always possible to run `UnitOfWork::commit()` with a commit order given in terms of  entity _classes_. Instead it is necessary to process entities in an order computed on the _object_ level. That may include

* a particular order for multiple entities of the _same_ class
* a particular order for entities of _different_ classes, possibly even going back and forth (entity `$a1` of class `A`, then `$b` of class `B`, then `$a2` of class `A` – revisiting that class).

This PR is about preparing the `UnitOfWork` so that its methods will be able to perform inserts and deletions on that level of granularity. Subsequent PRs will deal with implementing that particular order computation.

#### Suggested change

Change the private `executeInserts` and `executeDeletions` methods so that they do not take a `ClassMetadata` identifying the class of entities that shall be processed, but pass them the list of entities directly.

The lists of entities are built in two dedicated methods. That happens basically as previously, by iterating over `$this->entityInsertions` or `$this->entityDeletions` and filtering those by class.

#### Potential (BC breaking?) changes that need review scrutiny

* `\Doctrine\ORM\Persisters\Entity\EntityPersister::addInsert()` was previously called multiple times, before the insert would be performed by a call to `\Doctrine\ORM\Persisters\Entity\EntityPersister::executeInserts()`. With the changes here, this batching effectively no longer takes place. `executeInserts()` will always see one entity/insert only, and there may be multiple `executeInserts()` calls during a single `UoW::commit()` phase.
* The caching in `\Doctrine\ORM\Cache\Persister\Entity\AbstractEntityPersister` previously would cache entities from the last executed insert batch only. Now it will collect entities across multiple batches. I don't know if that poses a problem.
* Overhead in `\Doctrine\ORM\Persisters\Entity\BasicEntityPersister::executeInserts` is incurred multiple times; that may, however, only be about SQL statement preparation and might be salvageable.
* The `postPersist` event previously was not dispatched before _all_ instances of a particular entity class had been written to the database. Now, it will be dispatched immediately after every single entity that has been inserted.
@mpdude mpdude force-pushed the unit-of-work-schedule-single-entities branch from 1b1e673 to b42cf99 Compare April 24, 2023 11:32
@mpdude
Copy link
Contributor Author

mpdude commented Apr 29, 2023

@FabioBatSilva you were the last one to make relevant API changes to the persister classes in #808.

Maybe you still remember those days and have any thoughts on this?

@FabioBatSilva
Copy link
Member

@FabioBatSilva you were the last one to make relevant API changes to the persister classes in #808.

Maybe you still remember those days and have any thoughts on this?

Wow.. Can't believe that i'ts been almost 10y since that PR 😱

I'ts been a while since i looked into that code,
But Looks good as far as i can tell.

@mpdude
Copy link
Contributor Author

mpdude commented May 1, 2023

@FabioBatSilva thanks a lot for your feedback. Yes, time flies...

In particular, with regard to 2nd level caching, do you see any problem with calling \Doctrine\ORM\Persisters\Entity\EntityPersister::executeInserts multiple times before \Doctrine\ORM\Cache\Persister\CachedPersister::afterTransactionComplete is being called?

@FabioBatSilva
Copy link
Member

Is getInserts still return the entire set ?

This bit here might be problematic :

$this->queuedCache['insert'] = $this->persister->getInserts();

IIRC it assumes the all inserts are present when afterTransactionComplete is called.
If executeInserts is called multiple times it might be resetting the queue and afterTransactionComplete will only see the last set of entries.

@mpdude
Copy link
Contributor Author

mpdude commented May 2, 2023

I think I’ve got this covered with the changes to lib/Doctrine/ORM/Cache/Persister/Entity/AbstractEntityPersister.php in this PR.

Copy link
Member

@SenseException SenseException left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is entity-level-commit-order going to be merged into the 3.0.x branch? If the impact of the changes isn't graspable, do we miss tests for the commit order?

@mpdude
Copy link
Contributor Author

mpdude commented May 5, 2023

@SenseException Thank your for taking the time to review this!

entity-level-commit-order is a temporary branch to break #10547 down into manageable stories. Privately, I was hoping that we could merge it into 2.15 (or whatever the next 2.x release is). Since it might include deprecations and/or there is always a risk associated with such changes, no need to rush it and surprise users in a bugfix release.

Reasons (in my opinion) not to aim at 3.0:

  • The entire goal of Compute the commit order (inserts/deletes) on the entity level #10547 is to fix a long list of bugs (see the list over there), without requiring changes on the users' side.
  • In case we realize that there are releant changes (causing breakage), we should not include them in 3.0 in a "like it or lump it" style, but instead find ways so that users can opt-in as simple and as soon as possible, i. e. in 2.x.

That being said, yes - there always remains a risk that changes break things for users, and I try hard to come up with creative scenarios to imagine how that might happen.

But, on the other hand, we also need to make clear where users overstressed our promises given. One example is the semantics of the postPersist event I mentioned in the initial comment of this PR. Clearly, it is supposed to be emitted after a particular entity has been persisted – but we did not mention anything about the order in which these events happen, or if one may assume all persist operations have happened at the time the first event is being emitted.

In these cases, we'll need to discuss/decide what the reasonable assumptions are that one might have made, and where users (unknowingly?) relied on accidental implementation details only. I hope that, in the long run, we can improve specifications, tests, documentation and the like when we find such places. Hopefully, this makes it more clear for users what they may expect, and makes it more clear for us what may be changed.

Hm, did that answer your question 🤔 ?

Regarding test coverage:

The commit order is very central for everything that writes entities to the DB, so almost every functional test we have depends on it in one way or another.

For the tricky edge cases that currently do not work right (reported by many users over the years), I have created test cases in various other PRs, but all of them are consolidated in #10547.

@SenseException
Copy link
Member

One example is the semantics of the postPersist event I mentioned in the initial comment of this PR. Clearly, it is supposed to be emitted after a particular entity has been persisted – but we did not mention anything about the order in which these events happen, or if one may assume all persist operations have happened at the time the first event is being emitted.

Okay, I was afraid that this would introduce changes that would only fit into a major release. Sounds like it is not.

Hm, did that answer your question thinking ?

Yes, it did. Thank you.

@greg0ire greg0ire merged commit 9766b6b into doctrine:entity-level-commit-order May 5, 2023
@greg0ire
Copy link
Member

greg0ire commented May 5, 2023

Thanks @mpdude !

@mpdude mpdude deleted the unit-of-work-schedule-single-entities branch May 5, 2023 19:14
mpdude added a commit to mpdude/doctrine2 that referenced this pull request May 8, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

#### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

#### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

#### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request May 8, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

#### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

#### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

#### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request May 9, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

#### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

#### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

#### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request May 9, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

#### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

#### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

#### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request May 31, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

 #### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

 #### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

 #### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants