Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge release 2.16.0 into 3.0.x #10861

Closed
wants to merge 113 commits into from

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Aug 1, 2023

Release Notes for 2.16.0

2.16.0

  • Total issues resolved: 0
  • Total pull requests resolved: 15
  • Total contributors: 7

Improvement

Test Suite

Deprecation

Static Analysis

New Feature

Deprecation,Improvement

mpdude and others added 30 commits March 23, 2023 08:01
Run workflows also for the `entity-level-commit-order` branch
This is the second chunk to break #10547 into smaller PRs suitable for reviewing. It prepares the `UnitOfWork` to work with a commit order computed on the entity level, as opposed to a class-based ordering as before.

#### Background

#10531 and #10532 show that it is not always possible to run `UnitOfWork::commit()` with a commit order given in terms of  entity _classes_. Instead it is necessary to process entities in an order computed on the _object_ level. That may include

* a particular order for multiple entities of the _same_ class
* a particular order for entities of _different_ classes, possibly even going back and forth (entity `$a1` of class `A`, then `$b` of class `B`, then `$a2` of class `A` – revisiting that class).

This PR is about preparing the `UnitOfWork` so that its methods will be able to perform inserts and deletions on that level of granularity. Subsequent PRs will deal with implementing that particular order computation.

#### Suggested change

Change the private `executeInserts` and `executeDeletions` methods so that they do not take a `ClassMetadata` identifying the class of entities that shall be processed, but pass them the list of entities directly.

The lists of entities are built in two dedicated methods. That happens basically as previously, by iterating over `$this->entityInsertions` or `$this->entityDeletions` and filtering those by class.

#### Potential (BC breaking?) changes that need review scrutiny

* `\Doctrine\ORM\Persisters\Entity\EntityPersister::addInsert()` was previously called multiple times, before the insert would be performed by a call to `\Doctrine\ORM\Persisters\Entity\EntityPersister::executeInserts()`. With the changes here, this batching effectively no longer takes place. `executeInserts()` will always see one entity/insert only, and there may be multiple `executeInserts()` calls during a single `UoW::commit()` phase.
* The caching in `\Doctrine\ORM\Cache\Persister\Entity\AbstractEntityPersister` previously would cache entities from the last executed insert batch only. Now it will collect entities across multiple batches. I don't know if that poses a problem.
* Overhead in `\Doctrine\ORM\Persisters\Entity\BasicEntityPersister::executeInserts` is incurred multiple times; that may, however, only be about SQL statement preparation and might be salvageable.
* The `postPersist` event previously was not dispatched before _all_ instances of a particular entity class had been written to the database. Now, it will be dispatched immediately after every single entity that has been inserted.
…tities

Prepare entity-level commit order computation in the `UnitOfWork`
This is the first chunk to break #10547 into smaller PRs suitable for reviewing. It adds a new topological sort implementation.

 #### Background

Topological sort is an algorithm that sorts the vertices of a directed acyclic graph (DAG) in a linear order such that for every directed edge from vertex A to vertex B, vertex A comes before vertex B in the ordering. This ordering is called a topological order.

Ultimately (beyond the scope of this PR), in the ORM we'll need this to find an order in which we can insert new entities into the database. When one entity needs to refer to another one by means of a foreign key, the referred-to entity must be inserted before the referring entity. Deleting entities is similar.

A topological sorting can be obtained by running a depth first search (DFS) on the graph. The order in which the DFS finishes on the vertices is a topological order. The DFS is possible iif there are no cycles in the graph. When there are cycles, the DFS will find them.

For more information about topological sorting, as well as a description of an DFS-based topological sorting algorithm, see https://en.wikipedia.org/wiki/Topological_sorting.

 #### Current situation

There is a DFS-based topological sorting implemented in the `CommitOrderCalculator`. This implementation has two kinks:

1. It does not detect cycles

When there is a cycle in the DAG that cannot be resolved, we need to know about it. Ultimately, this means we will not be able to insert entities into the database in any order that allows all foreign keys constraints to be satisfied.

If you look at `CommitOrderCalculator`, you'll see that there is no code dealing with this situation.

2. It has an obscure concept of edge "weights"

To me, it is not clear how those are supposed to work. The weights are related to whether a foreign key is nullable or not, but can (could) be arbitrary integers. An edge will be ignored if it has a higher (lower) weight than another, already processed edge... 🤷🏻?

 #### Suggested change

In fact, when inserting entities into the database, we have two kinds of foreign key relationships: Those that are `nullable`, and those that are not.

Non-nullable foreign keys are hard requirements: Referred-to entities must be inserted first, no matter what. These are "non-optional" edges in the dependency graph.

Nullable foreign keys can be set to `NULL` when first inserting an entity, and then coming back and updating the foreign key value after the referred-to (related) entity has been inserted into the database. This is already implemented in `\Doctrine\ORM\UnitOfWork::scheduleExtraUpdate`, at the expense of performing one extra `UPDATE` query after all the `INSERT`s have been processed. These edges are "optional".

When finding a cycle that consists of non-optional edges only, treat it as a failure. We won't be able to insert entities with a circular dependency when all foreign keys are non-NULLable.

When a cycle contains at least one optional edge, we can use it to break the cycle: Use backtracking to go back to the point before traversing the last _optional_ edge. This omits the edge from the topological sort order, but will cost one extra UPDATE later on.

To make the transition easier, the new implementation is provided as a separate class, which is marked as `@internal`.

 #### Outlook

Backtracking to the last optional edge is the simplest solution for now. In general, it might be better to find _another_ (optional) edge that would also break the cycle, if that edge is also part of _other_ cycles.

Remember, every optional edge skipped costs an extra UPDATE query later on. The underlying problem is known as the "minimum feedback arc set" problem, and is not easily/efficiently solvable. Thus, for the time being, picking the nearest optional edge seems to be reasonable.
…10566)

When computing the commit order for entity removals, we have to look out for `@ORM\JoinColumn(onDelete="SET NULL")` to find places where cyclic associations can be broken.

 #### Background

The UoW computes a "commit order" to find the sequence in which tables shall be processed when inserting entities into the database or performing delete operations.

For the insert case, the ORM is able to schedule _extra updates_ that will be performed after all entities have been inserted. Associations which are configured as `@ORM\JoinColumn(nullable=true, ...)` can be left as `NULL` in the database when performing the initial `INSERT` statements, and will be updated once all new entities have been written to the database. This can be used to break cyclic associations between entity instances.

For removals, the ORM does not currently implement up-front `UPDATE` statements to `NULL` out associations before `DELETE` statements are executed. That means when associations form a cycle, users have to configure `@ORM\JoinColumn(onDelete="SET NULL", ...)` on one of the associations involved. This transfers responsibility to the DBMS to break the cycle at that place.

_But_, we still have to perform the delete statements in an order that makes this happen early enough. This may be a _different_ order than the one required for the insert case. We can find it _only_ by looking at the `onDelete` behaviour. We must ignore the `nullable` property, which is irrelevant, since we do not even try to `NULL` anything.

 #### Example

Assume three entity classes `A`, `B`, `C`. There are unidirectional one-to-one associations `A -> B`, `B -> C`, `C -> A`. All those associations are `nullable= true`.

Three entities `$a`, `$b`, `$c` are created from these respective classes and associations are set up.

All operations `cascade` at the ORM level. So we can test what happens when we start the operations at the three individual entities, but in the end, they will always involve all three of them.

_Any_ insert order will work, so the improvements necessary to solve #10531 or #10532 are not needed here. Since all associations are between different tables, the current table-level computation is good enough.

For the removal case, only the `A -> B` association has `onDelete="SET NULL"`. So, the only possible execution order is `$b`, `$c`, `$a`. We have to find that regardless of where we start the cascade operation.

The DBMS will set the `A -> B` association on `$a` to `NULL` when we remove `$b`. We can then remove `$c` since it is no longer being referred to, then `$a`.

 #### Related cases

These cases ask for the ORM to perform the extra update before the delete by itself, without DBMS-level support:
* #5665
* #10548
This is the third step to break #10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from #10592 and the refactoring from #10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead.

 #### Current situation

`UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code:

https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325

 #### Suggested change

* Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities.
* In the dependency graph, add edges for all to-one association target entities.
* Make edges "optional" when the association is nullable.

 #### Test changes

I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up.

Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
This is part of the series of issues fixed by #10547. In particular, the changes from #10566 were relevant.

See #10348 for the bug description.

Co-authored-by: Grégoire Paris <postmaster@greg0ire.fr>
Add a test case to show #10348 has been fixed
Co-authored-by: Grégoire Paris <postmaster@greg0ire.fr>
Add test to show #7006 has been fixed
@Frikkle was the first to propose these tests in #6533.
@rvanlaak followed up in #8703, making some adjustments.

Co-authored-by: Gabe van der Weijde <gabe.vanderweijde@triasinformatica.nl>
Co-authored-by: Richard van Laak <rvanlaak@gmail.com>
Co-authored-by: Grégoire Paris <postmaster@greg0ire.fr>
Tests suggested in #7180 (comment) and #7180 (comment) by @arnaud-lb.

Co-authored-by: Arnaud Le Blanc <arnaud.lb@gmail.com>
Add tests to show #6499 has been fixed
Add tests for entity insertion and deletion that require writes to different tables in an interleaved fashion, and that have to re-visit a particular table.

 #### Background

In #10531, I've given an example where it is necessary to compute the commit order on the entity (instead of table) level.

Taking a closer look at the UoW to see how this could be achieved, I noticed that the current, table-level commit order manifests itself also in the API between the UoW and `EntityPersister`s.

 #### Current situation

The UoW computes the commit order on the table level. All entity insertions for a particular table are passed through `EntityPersister::addInsert()` and finally written through `EntityPersister::executeInserts()`.

 #### Suggested change

The test in this PR contains a carefully constructed set of four entities. Two of them are of the same class (are written to the same table), but require other entities to be processed first.

In order to be able to insert this set of entities, the ORM must be able to perform inserts for a given table repeatedly, interleaved with writing other entities to their respective tables.
This test implements the situation described in #9192. The commit order computation will be fixed by #10547.
Add test to show #9192 has been fixed
Add test to show #7180 has been fixed
This refactoring does two things:

* We can avoid collecting the post insert IDs in a cumbersome array structure that will be returned by the EntityPersisters and processed by the UoW right after. Instead, use a more expressive API: Make the EntityPersisters tell the UoW about the IDs immediately.
* IDs will be available in inserted entities a tad sooner. That may help to resolve #10735, where we can use the IDs to skip extra updates.
Add tests for entity insertion and deletion that require the commit order calculation to happen on the entity level. This demonstrates the necessity for the changes in #10547.

This PR contains two tests with carefully constructed entity relationships, where we have a non-nullable `parent` foreign key relationships between entities stored in the same table.

Class diagram:

```mermaid
classDiagram
    direction LR
    class A
    class B
    A --> B : parent
    B --|> A
```

Object graph:

```mermaid
graph LR;
    b1 --> b2;
    b2 --> a;
    b3 --> b2;
```

 #### Situation before #10547

The commit order is computed by looking at the associations at the _table_ (= _class_) level. Once the ordering of _tables_ has been found, entities being mapped to the same table will be processed in the order they were given to `persist()` or `remove()`.

That means only a particular ordering of `persist()` or `remove()` calls (see comment in the test) works:

For inserts, the order must be `$a, $b2, $b1, $b3` (or `... $b3, $b1`), for deletions `$b1, $b3, $b2, $a`.

 #### Situation with entity-level commit order computation (as in #10547)

The ORM computes the commit order by considering associations at the _entity_ level. It will be able to find a working order by itself.
…ing merge()

This PR tries to improve the situation/problem explained in #3037:

Under certain conditions – there may be multiple and not all are known/well-understood – we may get inconsistencies between the `\Doctrine\ORM\UnitOfWork::$entityIdentifiers` and `\Doctrine\ORM\UnitOfWork::$identityMap` arrays.

Since the `::$identityMap` is a plain array holding object references, objects contained in it cannot be garbage-collected.
`::$entityIdentifiers`, however, is indexed by `spl_object_id` values. When those objects are destructed and/or garbage-collected, the OID may be reused and reassigned to other objects later on.

When the OID re-assignment happens to be for another entity, the UoW may assume incorrect entity states and, for example, miss INSERT or UPDATE operations.

One cause for such inconsistencies is _replacing_ identity map entries with other object instances: This makes it possible that the old object becomes GC'd, while its OID is not cleaned up. Since that is not a use case we need to support (IMHO), #10785 is about adding a safeguard against it.

In this test shown here, the `merge()` operation is currently too eager in creating a proxy object for another referred-to entity. This proxy represents an entity already present in the identity map at that time, potentially leading to this problem later on.
Co-authored-by: Grégoire Paris <postmaster@greg0ire.fr>
greg0ire and others added 28 commits July 11, 2023 12:10
Resort on Query::HINT_FORCE_PARTIAL_LOAD less
…tance

Support not Insertable/Updateable columns for entities with `JOINED` inheritance type
This was never meant to be under version control. Did not spot it in the
diff.

Closes #10836
* 2.15.x:
  PHPStan 1.10.25, Psalm 5.13.1 (#10842)
We already have the sidebar for this.
It confuses the guides, and is ugly.
Fix/Self deprecation with getQueryCacheImpl
* Introduce FilterCollection#restore method

* Add suspend method instead

* Add more tests
* Introduce FilterCollection#restore method

* Add suspend method instead

* Add more tests
* 2.15.x:
  Fix static analysis
  Other solution
  Avoid self deprecation
  fix: use platform options instead of deprecated custom options (#10855)
…cation-provided IDs

This excludes such associations from the commit order computation, since the foreign key constraint will be satisfied when inserting the row.

See #10735 for more details about this edge case.
Compute the commit order (inserts/deletes) on the entity level
@derrabus derrabus closed this Aug 9, 2023
@derrabus derrabus deleted the 2.16.x-merge-up-into-3.0.x_st4AnJwe branch August 9, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.