Jira issue originally created by user giorgiosironi:
on using persist() on an entity X:
So I do this:
$picard = $this->_getUser('Picard');
There are 2 instances of Picard saved instead of 1.
Failed asserting that string:2 matches expected integer:1.
The entity is recognized as new instead of detached, so no error is thrown and it is duplicated in the database.
Comment created by giorgiosironi:
Patch tentative. Resolved this bug but breaks another test of DetachedEntityTest, so I added another to make clear what is the issue: relying on keys to know if an entity is detached is not correct when entities have natural keys like CmsPhonenumber.
It seems that the only option would be hitting the database to see if the entity is already present.
Eliminating the error and simply persisting the entity anyway would not be cool because a duplicate will be created everytime entity is passed to persist(); even to pass it internally to merge() we have to know if it's new or detached.
A last option would be declaring: pass detached entities to persist at your own risk. It's normal for Doctrine not being aware of the difference between new and detached, if not 'detached' would not be so detached if the Orm remembers it. I have no problem in keeping new entities separated from detached ones in my data structure. Or I can simply merge() all, catching exceptions and add the entity that resulted in exceptions with persist().
Comment created by romanb:
Yes, that exactly is the problem and I am/was aware of this but hitting the database is a no-go for that. Determining the state (UnitOfWork#getEntityState) has to be very fast. So its just a matter of finding the best compromise.
We can't save any information in Doctrine objects because serialized or cached entities travel between http requests and sessions. merge() uses the same assumption that entities without identifiers defined are new and when passing a phonenumber it gives a false protection because the defined natural key does not result always in a real entity in the database.
Substituting a proxy subclass which adds a $_state private property is definitely an overkill. I suggest removing the exception and assuming that detached objects should be passed to merge() before any operation is performed on them, and only detached objects should be passed to merge(). Behavior if other operations are done previously would be undefined.
The use case for the merge()/detach() methods is serialization/unserialization of objects from a cache. A developer would then only merge objects in the access point of the cache or on saving.
Comment created by @beberlei:
My take at this issue:
Now the heuristics:
Problematic case: IdGenerator is Assigned and we find identifiers: Here i assume the entity is new, by definition of an entity persist() leads to a duplicate key error and catches this error.
I think this is the best approach.
Updated version that introduces the "EntityExistsException"
The problem I have with this is that by doing this, every single entity passed to persist() gets its identifier read out using reflection just to determine whether its really new.
I would rather change the contract and the documentation to clearly indicate that detached instances must not be passed to persist().
There are no problems in a stricter contract for persist() if detached instances are kept separate from new ones in applications, which is usually true since they come from a cache or a store. I would go with the change as it's the simplest reliable solution.
I updated the documentation to be more clear about this issue: http://www.doctrine-project.org/documentation/manual/2_0/en/working-with-objects:persisting-entities
What do you think?
The problem is, we can not do it reliably for all cases. Yes, we could look into the identifier property if the identifier is generated and conclude from that whether it is detached or not but that does not work for manually assigned identifiers. For these we would need to hit the database and I just don't think we can do that.
So right now changing the persist contract as I did in the documentation, explicitly stating that detached entities should not be passed to persist() and that the behavior is undefined if done so, is the "best" thing.
Tell me what you think. The alternative would be to do the above (peek into identifier field of generated identifiers or look the identifier up in the db if it is assigned manually). I'm pretty sure the JPA implementations do just that but I dont know for sure and I dont think we want that.
I am still undecided though :) Something inside me prefers to go the other route, even if it means some additional db hits for manually assigned identifiers. Please make your vote.
If an identifier is a natural key, and not generated, a detached object of that class and a new one are really indistinguishable if they have the same values on the other columns.
I am for simplicity: the detached/merge entities are related to use cases like caches, sessions and serialization of every kind. So typically you in an application you always know when they come from such a source and can call merge() appropriately.
If calling persist() erroneously will result in throwing an exception on flush() (primary key value already present), the application developer will notice the problem.
Pushing final resolution back to beta3. The problem is that this missing detection of detached objects can lead to weird errors in other cases as well. Need to think about it some more.
Issue was closed with resolution "Fixed"
Imported 3 attachments from Jira into https://gist.github.com/d9023c7662c514f9fa1f