Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ordering of the edges regardless when the parent node is persisted #11058

Closed
wants to merge 3 commits into from

Conversation

netbull
Copy link

@netbull netbull commented Nov 11, 2023

In correlation of #11057

greg0ire
greg0ire previously approved these changes Nov 11, 2023
Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derrabus there isn't a correct order, so I'm not sure there should be tests. If no tests are broken and people in #10864 confirm this fixes their issue, then I think we can merge as is.

I'd say either we commit to maintaining that order forever, and we add tests and even maybe docs, or we don't and then there are no needs for tests besides manual ones.

@derrabus
Copy link
Member

I see, so this change brings us closer to the behavior before 2.16? In that case, I don't mind merging the PR. But it's still not a bugfix (as the current behavior is not "wrong").

@derrabus derrabus added Improvement and removed Bug labels Nov 11, 2023
@greg0ire
Copy link
Member

I see, so this change brings us closer to the behavior before 2.16?

I don't know, let me ask people in #10864

@derrabus
Copy link
Member

One thing though: Please add a proper PR title because that title will appear in our changelog.

@netbull
Copy link
Author

netbull commented Nov 11, 2023

I see, so this change brings us closer to the behavior before 2.16? In that case, I don't mind merging the PR. But it's still not a bugfix (as the current behavior is not "wrong").

Yes, it's closer to the behaviour before 2.16.x. I personally thing that it's a bugfix in the sense of treating the nodes in different way if the are also an edge for another node or not. If they are standalone nodes it's totally fine, but if they are also edges then the order is forced to be reversed.

@mpdude
Copy link
Contributor

mpdude commented Nov 11, 2023

I'm afraid I still don't get it – what is the case where edges change the node order?

@mpdude
Copy link
Contributor

mpdude commented Nov 11, 2023

We have the \Doctrine\Tests\ORM\Internal\TopologicalSortTest::testNodesMaintainOrderWhenNoDepencency test to make sure entities maintain order when not constrained by foreign key references... that was to keep most of our own tests green by not changing order unnecessarily.

Also, when I add this:

--- a/tests/Doctrine/Tests/ORM/Internal/TopologicalSortTest.php
+++ b/tests/Doctrine/Tests/ORM/Internal/TopologicalSortTest.php
@@ -153,6 +153,17 @@ class TopologicalSortTest extends TestCase
         self::assertSame(['A', 'B', 'C'], $this->computeResult());
     }

+    public function testNodesMaintainOrderWhenEdgesPermit(): void
+    {
+        $this->addNodes('A', 'B', 'C');
+        $this->addEdge('A', 'B');
+        $this->addEdge('A', 'C');
+
+        // Nodes shall maintain the order in which they were added
+        // when permitted by edges/constraints
+        self::assertSame(['A', 'B', 'C'], $this->computeResult());
+    }

... it works. So where exactly is the order changed?

Copy link
Member

@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my "Missing Tests" label again. After @mpdude's comment, I want to see the change of order that we want to enforce with this PR.

Also:

Please add a proper PR title because that title will appear in our changelog.

@netbull
Copy link
Author

netbull commented Nov 11, 2023

We have the \Doctrine\Tests\ORM\Internal\TopologicalSortTest::testNodesMaintainOrderWhenNoDepencency test to make sure entities maintain order when not constrained by foreign key references... that was to keep most of our own tests green by not changing order unnecessarily.

Also, when I add this:

--- a/tests/Doctrine/Tests/ORM/Internal/TopologicalSortTest.php
+++ b/tests/Doctrine/Tests/ORM/Internal/TopologicalSortTest.php
@@ -153,6 +153,17 @@ class TopologicalSortTest extends TestCase
         self::assertSame(['A', 'B', 'C'], $this->computeResult());
     }

+    public function testNodesMaintainOrderWhenEdgesPermit(): void
+    {
+        $this->addNodes('A', 'B', 'C');
+        $this->addEdge('A', 'B');
+        $this->addEdge('A', 'C');
+
+        // Nodes shall maintain the order in which they were added
+        // when permitted by edges/constraints
+        self::assertSame(['A', 'B', 'C'], $this->computeResult());
+    }

... it works. So where exactly is the order changed?

It appears that the order is kept if the "main/parent" node is persisted first, but not when it's persisted last.

With this PR we make sure that the order is kept whenever the main/parent node is persisted

@netbull netbull changed the title fix: #11057 fix: ordering of the edges regardless when the parent node is persisted Nov 11, 2023
@mpdude
Copy link
Contributor

mpdude commented Nov 12, 2023

@netbull Thank you for adding the test. I think I now understand what you're aiming at.

However, I am afraid that your fix is not correct since it depends on the order in which edges are added. To illustrate, if you change the two addEdge() calls in testNodesMaintainOrderWhenEdgesPermitAndMainNodePersistedLast, the test fails.

In the bigger picture, the order of TopologicalSort::addNode() calls corresponds to the order of EntityManager::persist(). This is what the user controls and what we would like – as a courtesy – adhere to as much as possible. TopologicalSort::addEdge() is driven by the order in which the UoW discovers associations between entities, which may depend on the order of fields in a class or the order of XML mapping declarations. IMHO it should be irrelevant.

So, maybe in general we can say that for any two nodes A and B that have no connection in the dependency tree (either way), when A was given to addNode() before B, then A should show up in the result before B?

@mpdude
Copy link
Contributor

mpdude commented Nov 12, 2023

So, maybe in general we can say that for any two nodes A and B that have no connection in the dependency tree (either way), when A was given to addNode() before B, then A should show up in the result before B?

This won't work.

Assume the nodes are given in order B, D, C, A, E. Edges are A->B, A->C and E->D.

So, E must go before D (edge constraint). And we want D before A and A before E (given order between unconnected nodes).

From the latter part, we get D -> A -> E, but that is a contradiction to E -> D.

This means we cannot promise that for any two nodes (like A and E) that were given in a particular order and that have no constraints between them, we will process them in the order given.

For this example, there are multiple possible orderings of nodes that all fulfill the "hard" edge constraints, but have a different number of "expectation violations", i. e. pairs of nodes processed in a order different from the order given. This would leave us with an optimization problem – I don't want to go down that road.

Copy link
Contributor

@mpdude mpdude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not merge this unless we have a clear definition of what we're aiming to improve, so we can cover this as a "promise given" with tests.

It seems to me that right now we're just reverting the pre 2.16 behaviour for one anecdotal example.

@mpdude
Copy link
Contributor

mpdude commented Nov 12, 2023

Let me summarize again for the example where entities are given in order B, D, C, A, E, and there are foreign key references from B->A, from C->A and D->E (edges go the other way round).

  • The original order BDCAE is not possible at all.
  • In other words, for every possible solution, at least two entities are processed in another order than they were given and somebody might complain.
  • Many orderings are technically possible: ABCED, ABEDC, EDABC, EDACB, ACBED, ACEDB, AEDBC, AEDCB, ...
  • All of those work with the same amount of queries (no extra UPDATE necessary) and thus make the ORM shine ✨
  • When all of the entities are of different classes and end up in different tables, all solutions end up with the same auto-increment primary key assignments, so nobody will ever notice any difference between orderings.
  • When all of the entities are of the same class, every single ordering will end up with a different primary key assignment.
  • So, it also depends on how entities relate to tables. One particular sort implementation may be fine for one user and their primary keys seem to be assigned in the order the entities were given, whereas for another user (or another combination of entity classes) this would require a differnent sorting.
  • Even if we take the tables into consideration when sorting: Assume A, E and D belong to the same table. Should we end up with AED, EDA or EAD as sub-sequences? None of those will give D the lowest and E the highest ID values as one might expect.

@netbull
Copy link
Author

netbull commented Nov 12, 2023

Let me summarize again for the example where entities are given in order B, D, C, A, E, and there are foreign key references from B->A, from C->A and D->E (edges go the other way round).

  • The original order BDCAE is not possible at all.
  • In other words, for every possible solution, at least two entities are processed in another order than they were given and somebody might complain.
  • Many orderings are technically possible: ABCED, ABEDC, EDABC, EDACB, ACBED, ACEDB, AEDBC, AEDCB, ...
  • All of those work with the same amount of queries (no extra UPDATE necessary) and thus make the ORM shine ✨
  • When all of the entities are of different classes and end up in different tables, all solutions end up with the same auto-increment primary key assignments, so nobody will ever notice any difference between orderings.
  • When all of the entities are of the same class, every single ordering will end up with a different primary key assignment.
  • So, it also depends on how entities relate to tables. One particular sort implementation may be fine for one user and their primary keys seem to be assigned in the order the entities were given, whereas for another user (or another combination of entity classes) this would require a differnent sorting.
  • Even if we take the tables into consideration when sorting: Assume A, E and D belong to the same table. Should we end up with AED, EDA or EAD as sub-sequences? None of those will give D the lowest and E the highest ID values as one might expect.

Probably I'm not explaining well. I don't mind the order in which the sql statements are executed. Basically I don't care about the FK, so I don't argue there.

The bug /according to me/ is that if we have entities A, B, C and D. Assume that B, C and D are edges of A.

  • If you persist first the edges and lastly A, then the order will be A, D, C, A.
  • If you persist first A and then the edges, then the order will be A, B, C, D

Real world example: you have a reservation with guests Hans, Ivan and Laura. You copy the reservation and if you persist the reservation entity last, the order of the guests is reversed /Laura, Ivan and Hans/. If you copy it again the second copy of the reservation will have guests Hans, Ivan and Laura.

@mpdude
Copy link
Contributor

mpdude commented Nov 12, 2023

Assume that B, C and D are edges of A.

What exactly do you mean with an entity being an „edge“ of another one?

you have a reservation with guests Hans, Ivan and Laura

How do you define the order of guests in your reservation?

@netbull
Copy link
Author

netbull commented Nov 12, 2023

Assume that B, C and D are edges of A.

What exactly do you mean with an entity being an „edge“ of another one?

you have a reservation with guests Hans, Ivan and Laura

How do you define the order of guests in your reservation?

There are Reservation entity with oneToMany relationship with entity Guest. The Guest entity I call "edge" to the Reservation entity.

The order is defined by the order of persisting.

As you mentioned the TopologicalSort::addNode() adds the nodes according to the EntityManager::persist()

Working example /without my fix/.

  1. you persist entity Reservation
  2. then you persist three times entity Guest in the order of each guest /Hans, Ivan and Laura/
  3. in UnitOfWork::computeInsertExecutionOrder() the entities are added as nodes
  4. few lines below that via the entity associations the UnitOfWork adds the three Guests entities as Edges, because they are related to the Reservation entity
  5. in TopologicalSort::sort() we start iterating the nodes in reverse order because we add them in the sortResult array via array_unshift
  6. on the first iteration we are on Entity Guest - Laura /because we persist it last and we iterate in reverse order - from point 5/. At the end of TopologicalSort::visit() we mark it as VISITED
  7. the next two iterations are not really important as they are pretty much the same as point 6.
  8. at the last iteration we are on the entity Reservation which we process as usual, but we find edges and start iterate them to visit them.
  9. we skip all edges as they are already marked as VISITED from point 6 :)

Broken example /without my fix/.

  1. then you persist three times entity Guest in the order of each guest /Hans, Ivan and Laura/
  2. you persist entity Reservation
  3. in UnitOfWork::computeInsertExecutionOrder() the entities are added as nodes
  4. few lines below that via the entity associations the UnitOfWork adds the three Guests entities as Edges, because they are related to the Reservation entity
  5. in TopologicalSort::sort() we start iterating the nodes in reverse order because we add them in the sortResult array via array_unshift
  6. on the first iteration we are on Entity Reservation. We process it and find the edges.
  7. The edges ARE NOT iterated in reverse order, so when they are added to the sortResult they are reversed.
  8. the next three iterations from point 5 are basically skipped, because the nodes are marked as VISITED

The key point here is when the developer persists the main entity /from the above examples this is the Reservation/. If its before the guests, there is not problem, but if it's after them, the guests are reversed.

@mpdude
Copy link
Contributor

mpdude commented Nov 12, 2023

The order is defined by the order of persisting

No, the order of persist() calls does not mean or guarantee anything.

If the order of your guests within the reservation matters, you’ll need to define something like a rank field and declare the reservation to be ordered according to it.

@netbull
Copy link
Author

netbull commented Nov 12, 2023

The order is defined by the order of persisting

No, the order of persist() calls does not mean or guarantee anything.

If the order of your guests within the reservation matters, you’ll need to define something like a rank field and declare the reservation to be ordered according to it.

Ok, but persisting the Reservation entity last or first currently guarantee that the guests order will reverse

@netbull
Copy link
Author

netbull commented Nov 14, 2023

@mpdude any new thoughts?

@mpdude
Copy link
Contributor

mpdude commented Nov 14, 2023

My point of view is:

The sequence of persist() calls does not imply any kind of order for the entities.

Unless you define something like an explicit rank column, your Guest entities do not have a defined order. When the order is not defined, you cannot argue that it is reversed under certain conditions.

Probably not what you want to hear, though.

@netbull
Copy link
Author

netbull commented Nov 14, 2023

My point of view is:

The sequence of persist() calls does not imply any kind of order for the entities.

Unless you define something like an explicit rank column, your Guest entities do not have a defined order. When the order is not defined, you cannot argue that it is reversed under certain conditions.

Probably not what you want to hear, though.

@mpdude I don't argue about that :) And I understand completely your point.

I think you did not understand my point which is:

  • persisting the Reservation entity first and then the Guests will work fine and as expected
$a = new Reservation();
$em->persist($a);

$guests = ['Hans', 'Ivan', 'Laura'];
foreach ($guests as $guest) {
    $g = new Guest($guest);
    $a->addGuest($g);
    $em->persist($g);
}

Result: Hans, Ivan, Laura

  • persisting the Guests first and then the Reservation at the end will always reverse the Guests collection
$a = new Reservation();

$guests = ['Hans', 'Ivan', 'Laura'];
foreach ($guests as $guest) {
    $g = new Guest($guest);
    $a->addGuest($g);
    $em->persist($g);
}

$em->persist($a);

Result: Laura, Ivan, Hans

It is not about relying of the specific order, but WHEN we persist the $a entity matters right now, and I think this should not be dependent or i'm wrong?

@derrabus
Copy link
Member

Result: Hans, Ivan, Laura

Result: Laura, Ivan, Hans

And both results are correct.

It is not about relying of the specific order, but WHEN we persist the $a entity matters right now, and I think this should not be dependent or i'm wrong?

I don't get this sentence, I'm sorry.

@netbull
Copy link
Author

netbull commented Nov 14, 2023

Result: Hans, Ivan, Laura

Result: Laura, Ivan, Hans

And both results are correct.

It is not about relying of the specific order, but WHEN we persist the $a entity matters right now, and I think this should not be dependent or i'm wrong?

I don't get this sentence, I'm sorry.

You are wrong... sorry, but I showed examples with code and everything.
The ORM should not force the order to reverse depending when the developer persist the main entity. It's just wrong and broken behaviour.

Anyway, if you are convinced that this is how it's supposed to work, close the PR.

@greg0ire
Copy link
Member

I think this should not be dependent or i'm wrong?

Collections are unordered sets so you can't assume you will always get the same order. FWIW, you could get Ivan, Laura, Hans, which would be extremely weird but still technically correct. It's like if you were performing a SELECT statement without specifying an ORDER BY clause: technically, any order goes. Let's close this, I think all maintainers agree.

@greg0ire greg0ire closed this Nov 14, 2023
@netbull
Copy link
Author

netbull commented Nov 14, 2023

@greg0ire I agree with you, but I don't agree that this library should force change in the order if developer persists at the end of his code.

This is what you don't get... why you array_reverse and then array_unshift? Why not leave it as it is and it will be way more predictable behaviour.

@greg0ire
Copy link
Member

I don't think it's fair to frame it as if we go out of our way to force a change in order. The change in your PRs are an attempt to maintain the order if possible, by adding an extra function call. It's not like you removed a function called that messed with the order, it's the contrary, is it? If I'm mistaken, then you could achieve your goal by removing that hypothetical extra, unnecessary function call we are doing, and no tests would get broken, so why not do just that? If tests do get broken then you will know why we are doing it. Anyway, maybe other maintainers know why there is a call to array_reverse, then array_unshift, but I don't.

@netbull
Copy link
Author

netbull commented Nov 14, 2023

I don't think it's fair to frame it as if we go out of our way to force a change in order. The change in your PRs are an attempt to maintain the order if possible, by adding an extra function call. It's not like you removed a function called that messed with the order, it's the contrary, is it? If I'm mistaken, then you could achieve your goal by removing that hypothetical extra, unnecessary function call we are doing, and no tests would get broken, so why not do just that? If tests do get broken then you will know why we are doing it. Anyway, maybe other maintainers know why there is a call to array_reverse, then array_unshift, but I don't.

What you suggest at first sounds logical, but it's not really /I think/ :)

See here.

Explanation:

We aim to sort the nodes via DFS algorithm which is OK and great! In the sense of the algorithm we should add the node to the head of the sortedResults array, but it's never mentioned to reverse the order of the nodes at the very beginning of the sorting /which we do with array_reverse/, just because we add the results to the head.

In other words, we should either:

  • remove array_reverse and fix the tests to correctly test the results. This way always the results will be reversed
  • add array_reverse to the edges. This way we aim to keep the order as it should be. Also with this, no tests fail whatsoever.

Probably the maintainers can explain a bit more about the array_reverse usage and why it's only used once.

@greg0ire
Copy link
Member

Probably the maintainers can explain a bit more about the array_reverse usage and why it's only used once.

Maybe yeah. If they can't, I suggest you remove that call and run the tests, maybe there will be failures, and tests with explanations. If there are no failures, I'd rather merge a PR that removes the array_reverse than one that adds them, but maybe I'm missing the big picture… maybe the result is somewhat more intuitive/palatable with array_reverse, I don't know.

@mpdude
Copy link
Contributor

mpdude commented Nov 14, 2023

🤔 (gimme some time)

@rvanlaak
Copy link
Contributor

@greg0ire worth knowing; because the other issue you referred to earlier for feedback has been locked, people can not respond to your request for feedback over there.

@greg0ire
Copy link
Member

@rvanlaak I did write "please let us know in the PR" (as opposed to the locked issue) though, didn't I?

mpdude added a commit to mpdude/doctrine2 that referenced this pull request Dec 21, 2023
This PR changes a detail in the commit order computation for depended-upon entities.

We have a parent-child relationship between two entity classes. The association is parent one-to-many children, with the child entities containing the (owning side) back-reference.

Cascade-persist is not used, so all entities have to be passed to `EntityManager::persist()`.

Before v2.16.0, two child entities C1 and C2 will be inserted in the same order in which they are passed to `persist()`, and that is regardless of whether the parent entity was passed to `persist()` before or after the child entities.

As of v2.16.0, passing the parent entity to `persist()` _after_ the child entities will lead to an insert order that is _reversed_ compared to the order of `persist()` calls.

This PR makes the order consistent in both cases, as it was before v2.16.0.

 #### Cause

When the parent is passed to `persist()` after the children, commit order computation has to re-arrange the entities. The parent must be inserted first since it is referred to by the children.

The implementation of the topological sort from doctrine#10547 processed entities in reverse `persist()` order and unshifted finished nodes to an array to obtain the final result. That leads to dependencies (parent → before c1, parent → before c2) showing up in the result in the reverse order of which they were added.

This PR changes the topological sort to produce a result in the opposite order ("all edges pointing left"), which helps to avoid the duplicate array order reversal.

 #### Discussion

* This PR _does not_ change semantics of the `persist()` so that entities would (under all ciscumstances) be inserted in the order of `persist()` calls.
* It fixes an unnecessary inconsistency between versions before 2.16.0 and after. In particular, it may be surprising that the insert order for the child entities depends on whether another referred-to entity (the parent) was added before or after them.
* _Both_ orders (c1 before or after c2) are technically and logically correct with regard to the agreement that `commit()` is free to arrange entities in a way that allows for efficient insertion into the database.

Fixes doctrine#11058.
@mpdude
Copy link
Contributor

mpdude commented Dec 21, 2023

#11086 should fix this inconsistency between pre-2.16.0 and after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants