New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDC-586: Repo does not find "unflushed" object #5092
Comments
Comment created by romanb: Hi, you mention a good point, however, this currently only affects findBy queries made through a repository. A DQL query already triggers a flush when there are pending insertions but this still has its own problems. First of, querying against the objects in the UoW is not a viable solution in my eyes. For a regular find() (by identifier) the situation is clear anyway, you must flush prior to a lookup on an entity you previously persisted in the same request because, by definition, generated primary key values are only guaranteed to be available after the next flush. Automatic flushing if the UoW has pending inserts (new objects) and a query is executed (either through DQL or a repository) currently has its own set of problems, namely that it is still subject to infinite recursion if such a query is triggered in an event (listener) that executes during commit of a UoW, and secondly, that it will easily lead to double-flushes that cause unnecessary overhead (currently a flush() even if nothing needs to be done is not free because the UoW actually has to check whether nothing needs to be done). Both of these problems could be addressed with some sort of flags, but the question still is whether its not better to flush manually in the first place. |
Comment created by romanb: Furthermore, automatic flushing when there is no transaction active is probably also not a great idea, as it may split a single unit of work (that was supposed to be atomic) into 2 without the user knowing about it. So auto-flushing should better only happen when a transaction is active (i.e. explicit transaction demarcation is used). |
Comment created by jkleijn: That would mean, in every example, you should flush after persisting new objects, period. If I flush in some cases and not in others, I'm asking for issues that may not be caught by tests. It's an inconsistency that I personally am not comfortable with. Could be that I'm overlooking something, I've just started playing with D2. Why is querying against registered objects not viable? It's not easy, granted, but it doesn't seem impossible. There should probably be a layer between the UoW and the "persisters" (Data Mappers?). RE: the UoW double flush: state management on the UoW as a whole should prevent that. i.e. after a commit the whole UoW is clean? Just a suggestion, as I said, still getting my bearings. On a side note I just want to say that what I've seen so far, for the better part, pleases me greatly. Kudos. |
Comment created by romanb: @"That would mean, in every example, you should flush after persisting new objects, period." Yes, if you want the objects to be visible to queries in the same request. Generally, you should flush when you complete a unit of work and that is usually not the whole request (but can be). I don't want to "query" against registered objects because it is a) not easy b) likely a lot of code and c) very likely error-prone. And in addition I don't see this helping with solving any inconsistency. If you want to use find() you have to flush anyway because you can not find() without having the identifier in the first place, which is only available after a flush. @re: the UoW double flush: Yes, like I said, it can be done but it is a compromise. Having a "clean/dirty" flag in addition to calculating the changesets of the work to do (which implicitly tells us whether the UoW is dirty) adds more code and more potential for errors. Forget to update the flag in one location and you get flushes that don't do anything, because the flag was not updated. A dirty-flag for the UoW is not really required for proper working. It is similar to the approach of maintaining a separate counter for the number of elements in a collection implementation: can make many size/count requests faster but complicates the internal implementation and increases the likelihood for errors (and lock contention for the counter in a thread-safe/concurrent implementation, an interesting case where performance goes against scalability, but I digress and that does not apply to php obviously). That said, I am not strongly opposed to doing this. If you're interested in how this is specified by "big brother", take a look at section 3.8.7 of the JPA 2 specification. Shortly, with the default behavior it requires the implementation to ensure that unflushed changes are visible to queries which can be achieved by flushing these to the database automatically but only if a transaction is active, otherwise the implementation must not flush to the database. There is alternatively also a "MANUAL" flush mode, in that case the effect of updates made to entities in the UoW upon queries is unspecified. We do not have different flush modes anymore, however, in Doctrine. So I see two possible ways to go here:
That's how I see it. Now we need some votes and volunteers for the implementation :) Personally, I am not sure yet about which version I prefer, 2) does not sound too bad for me. |
Comment created by romanb: In Nr. 1) the case with the infinite recursion may actually be more problematic. I think you simply *can not see* unflushed new objects in queries made during a UoW commit. |
Comment created by jkleijn: When there's no in-memory objects inclusion, I'd say 2) as well. Again, I have no idea how this is implemented currently, but I would prefer something like this: $repo->start(); $repo->register($object); $repo->commit(); Why?
The way I see it "start" would create a UoW associated with the repo, "commit" would calculate changes and write (the enitity manager would make sure references in other UoWs are removed). Because the way it is currently implemented (or so it seems), it's unclear when to flush and when not to flush, and unclear what I'm flushing at any one point in the code (because it is not locally isolated). If I have to decide whether to flush in some bit of client code, I am apparently making an assumption about the target entity, i.e. coupling. I know, you already went beta, so it's unlikely you would consider such a large change, but anyway, for your consideration. Finally, I realize I'm borderline nagging now as you've made it clear you see nothing it, but a Repository (as in the PoEAA pattern, p 322) may provide a method of fetching native in-memory objects using criteria, acting as a "buffer" between code and database. The Repository in D2 does effectively nothing but delegate to the UoW (or mostly to the underlying persister). Ref PoEAA 327 for an example of an in-memory strategy. As a final point of critique, the Repository does not always seem to be used as entry point for data requests, which is the whole point of the pattern. Most of what's in EntityManager, should be in EntityRepository ("manager" is a bit to abstract a concept to expect clear responsibilities anyway). EntityManager::find() delegates to EntityRepository, but pretty much everything else is the other way around. EntityManager would be better off named DataGateway, as that accurately describes its intended function. I admit, it would be very difficult to use DQL on in memory objects, but it would be far superior and if it work lead to much more predictable behaviour. It's the ONLY way the data store is ever going to be truly transparent. A few examples (DQL from the docs): SELECT
SELECT u FROM User u WHERE u.id = ?1 OR u.nickname LIKE ?2 ORDER BY u.surname DESC
With joins it could get more complex, provided you want to intelligently merge results into existing objects. Question is whether that is really needed, but there's obviously a performance benefit. Actually this may already be implemented. I suspect there are edge cases, rooted in DQL still being based on SQL, but in theory it should be possible. Likely you would still want to do start(), and delegate to the driver to start an actual transaction to prevent inconsistent reads... The only way to find out if it's truly feasible is to to try it, I think. Ramble, ramble, ramble, I'm done. :) I know I seem critical, but it's positive critique, I love the direction you went with D2. |
Comment created by romanb: Maybe I was not clear, with approach Nr.1 there would be in-memory objects inclusion (of new objects), in fact, there always is, due to the identity map. When you query for objects and some of them are already in memory, these are used, not again reconstructed. The EntityRepository provided by Doctrine is just a convenient mechanism for writing your own repositories. There are many different understandings for what a repository is, you can make it whatever you want it to be. Is a PoEAA repository the same as a DDD repository? Anyway, the repository could be stripped of the project, it is optional, the state management is handled by the EntityManager and UnitOfWork. These are the core components. I agree that the delegation from EntityManager#find to the repository is suboptimal in this regard and should be the other way around. Now to your question: "When should I flush?". Generally, you should flush at the end of a transaction, which in turn is a unit of work. That means, use explicit transaction demarcation. begin() ... flush() commit(). I've added some control abstractions recently that should make this even easier. I can only recommend to explicitly demarcate your transaction boundaries. As you probably know, you can not talk to your database outside of a transaction anyway. The default behavior (flush() wrapping all its stuff in a transaction) is for convenience mostly and so as not to alienate or confuse people even more who are used to autocommit mode. Concerning the naming, we mostly stick with the JPA specification and I, for one, really like the naming and I don't want to invent new names. PoEAA is far more abstract (and the examples far too specific) than what is specified in JPA, so I recommend giving that a read. The patterns in PoEAA obviously and intentionally leave a lot of room for different variants of implementation and also leave open a lot of open questions (many of the difficult questions especially, it is for a reason that the author recommends using an existing tool instead of writing your own). In my opinion it is just not feasible to query in-memory objects in a generic way, all the examples in PoEAA do not have generic but rather concrete code examples, which is obviously a lot easier. The feasible strategy, and that is what we do, is to do in-memory lookups only when querying by PK, otherwise the query is executed and afterwards nevertheless any objects reused that are already in memory (based on the PK) and not reconstructed. This is the approach we use. Thanks for your input, I do see that you are an experienced fellow in object-relational persistence, maybe we can see you as a committer some day? :) |
Comment created by romanb: @ "SELECT u, UPPER(u.name) nameUpper FROM MyProject\Model\User u" This selects all users and their names in uppercase, the uppercase names are scalar values, the users are not modified! Scalar values are separate from objects. @ "... and unclear what I'm flushing at any one point in the code" flush() means: Synchronize the in-memory state of my objects with the database, making any changes that are only in-memory persistent. Nothing more, nothing less. Again, objects are always reused based on the identity map and the state that is in-memory prevails, unless you use refresh() or execute a query with the Query::HINT_REFRESH query hint. All objects you fetch from DQL, be it as a root object or as a joined association, are first looked up in-memory (but after the SQL query has been issued!). Maybe we have been talking past each other here, what I refer to as not feasible is querying the in-memory objects first in some way, even before the SQL query. This is just too complicated and error-prone, except for the simple case of a PK lookup and that is where we do it already. |
Comment created by jkleijn:
Right. Bad example.
I realize that it means that, but commit() would be more obvious.
Fair enough, you don't think it's feasible, so we'll keep it at that. Maybe I'll give it a shot some time. |
+1 Since this issue is still open and marked "Improvement" - any news on that? |
@edrush there is no way to support this at the moment, as the UnitOfWork only provides an ID map. |
@Ocramius Thank you! I'm sorry if this is the wrong place to ask: then how would I flush a complex object that was automatically created by $em->merge($objectFromJson), entity by entity? |
@edrush I'd suggest asking something like that on stackoverflow (with some examples about what you need to do). The issue tracker is, well... for issues :-P |
@Ocramius right, sorry. (here it is) |
Would it be possible to have the entitymanager, able to expose already managed entities in a Collection object (segregated by class, possibly accessible throught repositories) ? Which is basically what we are handcrafting in teh repository class when needed. |
This is not feasible due to the fact that except for identifier lookup operations, we cannot expose the identity map in any other way without incurring in massive memory and BC problems. Performing anything more than a direct identifier lookup will go through the database and will require an SQL query that will skip PHP's in-memory entities. Closing as |
"Performing anything more than a direct identifier lookup will go through the database" Anyway, it's obviously not a MUST HAVE feature, it would be more a convenience one. Thanks for the reply, have a nice day :) |
I fully understand your suggestion, and explained also that we can't expose
more UoW internal state.
…On 31 Jan 2018 08:58, "popy-dev" ***@***.***> wrote:
"Performing anything more than a direct identifier lookup will go through
the database"
I think you missed my point. My suggestion was to add a public method
which instanciates a fresh ArrayCollection containing ALL managed entities
(non filtered), so that a developper (not doctrine core) COULD iterate over
if he wants.
Anyway, it's obviously not a MUST HAVE feature, it would be more a
convenience one. Thanks for the reply, have a nice day :)
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#5092 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJakL_sPz1NMROOuAe1s7Oophrao1WWks5tQB0cgaJpZM4HVFW8>
.
|
Sorry then ! |
Jira issue originally created by user jkleijn:
The problem is this:
$bar = new \entity\content\ContentTag();
$bar->setName('bar');
$em->persist($bar);
$existingTag = $em->getRepository('entity\content\ContentTag')->findOneByName('bar');
Seeing as in EntityRepository "find()" queries the Unit of Work first, and "findBy()" goes directly to the persister, only remotely stored objects will be found. Now if I want a tag object to attach related tags, it would have to query by name to see if an object already exist, BUT it wont find one as the UoW has not been committed, resulting in a new one being created, ultimately resulting in a PDO error on the unique name constraint.
This can be "solved" by inserting a flush, but it is impossible to know whether a flush is required, without knowledge of what comes next. I.e. for one part to know it has to flush, it has to know another wants to fetch an object you just created.
This causes an unacceptable amount of coupling.
Somehow the repo will have to be able execute DQL against the objects in the UoW. This does not have to be full support (straight away), but it should fail (throw an exception) if the possibility exists that the UoW contains items that are excluded (e.g. the operation is not supported and the UoW still contains items).
For right now, this means the EntityManager should throw an exception if DQL is executed on the type when the UoW is not empty. Until the time that the EntityManager can query the UoW using DQL. The alternative would be to "flush" before every operation that goes to the database for data.
The text was updated successfully, but these errors were encountered: