Skip to content

Loading…

DDC-763: Cascade merge on associated entities can insert too many rows through "Persistence by Reachability" #5277

Open
doctrinebot opened this Issue · 23 comments

2 participants

@doctrinebot

Jira issue originally created by user ccapndave:

I think that the UnitOfWork needs to maintain a map of splobjecthash($newEntity)->$managedEntity for entities that were persisted via reachability during a merge. doMerge should then only call persistNew if the original entity has not already been persisted (if it has already been persisted it should merge the managed entity from the map). The map should be maintained until a flush() or until the UnitOfWork is cleared. The reasoning is as follows.

Imagine we have a simple doctor object with no associations:

$doctor = new Doctor();
$em->persist($doctor);
$em->persist($doctor);
$em->flush();

After the first persist() $doctor is MANAGED so the second persist has no effect and this results in a single Doctor row.

If we do the same thing using merge and persistence by reachability:

$doctor = new Doctor();
$em->merge($doctor);
$em->merge($doctor);
$em->flush();

we get 2 Doctor rows being added.

Obviously in this particular case we should use the return value from the first merge() as the parameter of the second merge which would give correct behaviour.

However, now imagine one Doctor has many Patients and many Patients have one Doctor, all the associations have cascade merge enabled, and further assume that $d1 (Doctor id=1) is already in the database. We now attempt to create two patients and assign them to the existing doctor:

$d1= new Doctor(); $d1->id = 1; // This is a DETACHED entity

$p1 = new Patient();
$p2 = new Patient();

$d1->patients->add($p1); $p1->doctor = $d1;
$d1->patients->add($p2); $p2->doctor = $d1;

$em->merge($p1);
$em->merge($p2);

$em->flush();

This actually results in 4 rows being added to the 'patients' table instead of 2, I think because $p1 and $p2 are getting persisted both as the root objects and then again from the patient->doctor->patients array. Since the cascade merging happens internally we can't replace the array contents with the managed return values without walking through the object graph (in which case there is no point in using cascade merge in the first place). Maintaining a map in UnitOfWork will allow doMerge to ensure it doesn't persist the same entities twice.

I'm not sure, but this might be relevant for cascade persist too.

P.S. Another bug report on this can be found at http://code.google.com/p/flextrine2/issues/detail?id=32 (it basically says the same thing with different entities).

@doctrinebot

Comment created by @beberlei:

@Roman A possible fix for this in my opinion is another map in UnitOfWork $mergedEntities = array(); and a patch like this:

diff --git a/lib/Doctrine/ORM/UnitOfWork.php b/lib/Doctrine/ORM/UnitOfWork.php
index 242d84b..1d0d8b3 100644
--- a/lib/Doctrine/ORM/UnitOfWork.php
<ins></ins><ins> b/lib/Doctrine/ORM/UnitOfWork.php
@@ -1340,6 </ins>1340,10 @@ class UnitOfWork implements PropertyChangedListener
             return; // Prevent infinite recursion
         }

<ins>        if (isset($this->mergedEntities[$oid])) {
</ins>            return $this->mergedEntities[$oid];
<ins>        }
</ins>
         $visited[$oid] = $entity; // mark visited

         $class = $this->em->getClassMetadata(get_class($entity));
@@ -1468,6 <ins>1472,8 @@ class UnitOfWork implements PropertyChangedListener

         $this->cascadeMerge($entity, $managedCopy, $visited);

</ins>        $this->mergedEntities[$oid] = $managedCopy;
+
         return $managedCopy;
     }
@doctrinebot

Comment created by ccapndave:

I have tested this patch with my application and it fixes the problem in all my relevant test cases apart from one. The test case that's failing is one that persists a bi-directional many to many relationship, so the associations interweave with each other (if you know what I mean).

I wonder if perhaps doMerge need to continue cascading even if it finds an item in $this->mergedEntities

This is the Flextrine code that fails - it results in no entries in movie_artist. This might also be related to DDC-758?

m1 = new Movie();
m1.title = "Movie 1";

m2 = new Movie();
m2.title = "Movie 2";

a1 = new Artist();
a1.name = "Artist 1";

a2 = new Artist();
a2.name = "Artist 2";

m1.artists.addItem(a1); a1.movies.addItem(m1);
m1.artists.addItem(a2); a2.movies.addItem(m1);

m2.artists.addItem(a1); a1.movies.addItem(m2);
m2.artists.addItem(a2); a2.movies.addItem(m2);

// These translate to cascade merges on the server
em.persist(m1);
em.persist(m2);
em.persist(a1);
em.persist(a2);

// Now flush
em.flush();

@doctrinebot

Comment created by ccapndave:

P.S. This test passes if I translate em.persist() to $em->persist() (not cascading) on the server instead of translating it to a cascade merge; not sure if that helps

@doctrinebot

Comment created by romanb:

I'd really like to avoid introducing an additional instance variable just to solve this issue but I did not find the time yet to really look into it.

Does someone have a unit test for this already and can attach it to the issue?

@doctrinebot

Comment created by romanb:

Rescheduling for RC1.

@doctrinebot

Comment created by ccapndave:

Here is a functional test case containing three tests:

testMultiMerge tests basic merging of two new entities, checking that only a single entity ends up in the database. This passes with Benjamin's patch.

testMultiCascadeMerge tests the more complex case of merging a OneToMany association. This also passes with Benjamin's patch.

testManyToManyPersistByReachability tests the ManyToMany case described above and this fails with Benjamin's patch, probably because doMerge doesn't cascade down entities that it has already merged and some ManyToMany associations are being ignored. Its a bit hard to be certain what is causing this as even without Benjamin's patch this test would fail due to DDC-758.

@doctrinebot

Comment created by @beberlei:

@Roman i thought about this issue, its not possible without that additional map of merged entities. There is no way we can get that information from other sources.

Problem is rather that the use-case probably only applies in mass-merging scenarios and client-server serialization.

@doctrinebot

Comment created by ccapndave:

Added another failing test case - adding the same entity from different ends of a many to many bi-directional association to check that there isn't an integrity constraint violation caused by Doctrine trying to add the same row twice.

@doctrinebot

Comment created by ccapndave:

Attached a patch for this issue.

@doctrinebot

Comment created by @beberlei:

can you comment why all the additionall stuff is necessary compared to my patch?

@doctrinebot

Comment created by ccapndave:

It fixes the two additional test cases - testManyToManyPersistByReachability and testManyToManyDuplicatePersistByReachability.

testManyToManyPersistByReachability was failing with your original patch because there are ManyToMany cases where an entity may have already been merged, but its still necessary to add it to an association and continue to cascade. Running the following with the original patch will miss out some of the associations.

$m1 = new Movie();
$m1->title = "Movie 1";

$m2 = new Movie();
$m2->title = "Movie 2";

$a1 = new Artist();
$a1->name = "Artist 1";

$a2 = new Artist();
$a2->name = "Artist 2";

$m1->artists->add($a1); $a1->movies->add($m1);
$m1->artists->add($a2); $a2->movies->add($m1);
$m2->artists->add($a1); $a1->movies->add($m2);
$m2->artists->add($a2); $a2->movies->add($m2);

$em->merge($a1);
$em->merge($a2);
$em->flush();

The other change in my patch is to protect against this case. It ensures that the following code doesn't add the same entity twice to a collection.

$em->merge($m1);
$em->merge($m2);
$em->merge($a2);
$em->merge($a2);
$em->flush();
@doctrinebot

Comment created by @beberlei:

I am not sure if the issue here is rather multiple calls to merge that contain different parts of the same object-graph.

There should be a very simple fix for this, call ->clear() after each merge.

I am not sure if this patch drags us into a blackhole of issues with merging.

@doctrinebot

Comment created by ccapndave:

Calling ->clear() and ->flush() after each merge is a workaround for the simple case, but unless I am misunderstanding I don't think its a solution for cases where the merging is happening automatically in cascadeMerge. I've actually encountered this issue in another project and scenario to do with creating REST APIs and merging JSON objects into entities, and applying the patch fixed it so a) I think this issue might be a more common that we first thought and b) the patch basically seems to work (plus it doesn't introduce any failing cases in the existing test suite). I can actually still find one edge case to do with cascading merging interlinked many to many associations that this doesn't fix, but I was planning to open that as a new ticket after this My feeling is that the current merge already has issues and this definitely improves it.

@doctrinebot

Comment created by @beberlei:

It cannot happen inside a single merge, single merges use the $visited to avoid infinite recursions, each entity can only be merged once inside a single merge operation.

@doctrinebot

Comment created by @beberlei:

Added a note into the documentation about using EntityManager#clear between merging of entities which share subgraphs and cascade merge.

Handling this issue in UnitOfwork will be declared an improvement, not a bug anymore and be scheduled for later releases. The required changes to the core are to dangerous and big.

@doctrinebot

Comment created by ccapndave:

Where in the docs is that?

Just to summarize, the equivalent operation to having multiple merges and a single flush is to call merge followed by flush each time, with the whole thing surrounded by a transaction? Does this have a big impact on performance?

@doctrinebot

Comment created by ccapndave:

Ben - even given the decision not to implement this (and I do understand your thinking, as it is a major change), is there any reason not to implement the bit that ensures that the same entity isn't added to a collection twice during a merge? I can't think of a situation where this should be allowed, and I have a use case where I get 'DUPLICATE KEY' errors if this isn't there.

Please see attached patch.

@doctrinebot

Comment created by @beberlei:

What bit of that huge patch is that? Can you extract it into another ticket if thats possible?

@doctrinebot

Comment created by @beberlei:

I added it to "Working with Objects" and the descripton of Merge. Its not yet live on the site.

Using this current workaround has a performance impact, since more SELECT statements have to be issued against the database.

@doctrinebot

Comment created by ccapndave:

Apologies for not being clear - only the 3rd patch (multipleaddmerge.diff) is relevant to the 'DUPLICATE KEY' error I am now talking about, but I'll put it in a nother ticket if you prefer.

@doctrinebot

Comment created by @beberlei:

please add a new ticket, patch looks good.

@doctrinebot

Comment created by ccapndave:

Created as DDC-875

@beberlei beberlei was assigned by doctrinebot
@doctrinebot doctrinebot added this to the 2.x milestone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.