Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DDC-357] Effective toOne joins #970

Conversation

fprochazka
Copy link
Contributor

What is this?

I've kind of rewriten querying of toOne relations. It is more data and query-count effective.

How?

Let's demonstrate it on CmsUser and CmsAddress from tests models. Let's solve behaviour for toOne relations that are not mentioned in the query.

SELECT u FROM CmsUser u

lazy + mapped by side

Already implemented, result is that CmsAddress would be proxy.

lazy + inverse side

CmsUser has CmsAddress relation that is mapped and owned by CmsAddress entity.

What has to happen? The identifier of CmsAddress cannot be loaded from users table nad has to be added automatic join for the table. Because it's lazy it will be hydrated as proxy, becase that is exactly what I've asked for.

If it would have been eagerly loaded, It would create 1+N queries problem that I'm trying to avoid with this. I have the relation as lazy, if I knew I would have needed it and wanned to optimized, I'd join it, but I didn't.

Result is therefore CmsUser entity + CmsAddress proxy

eager - both inverse and mapped by sides

The appropriate query component is generated with autojoin and auto selecting of the entity.

If it is self-referencing, the auto join is not generated becase it would cause infinite recursion.

Why?

I've given this a lot of thought and tested it on our not-so-small application. We have unfortunately lot of entitiy relations that are mapped on the inverse side than we need to select, which is effectively killing performace DDC-357

I would have to go and list all the entities as partials to save performace creating such monsters as this

$builder = $repository->createQueryBuilder("o")
    ->leftJoin("o.voucher", "vu")->addSelect("partial vu.{id}")
    ->leftJoin("o.address", "a")->addSelect("a")
    ->leftJoin("o.restaurant", "r")->addSelect("partial r.{id, name}")
    ->leftJoin("o.payment", "p")->addSelect("partial p.{id}")
    ->leftJoin("o.rating", "rat")->addSelect("partial rat.{id}")
    ->leftJoin("r.settings", "rs")->addSelect("partial rs.{id}")
    ->leftJoin("r.address", "ra")->addSelect("ra")
    ->leftJoin("r.position", "rp")->addSelect("partial rp.{id}");
# plus about five more just to make save performace

We all know that hydrating a large result set is a bottleneck and if I say the relation is lazy and I'm not joining it I really don't want it to be joined with all it's data!

Now imagine I just want to select few orders and render some data on the page.. I have tens of queries like this just because I have to. This is wrong that the ORM is tripping my feet like this.

What now?

I know I have to solve theese:

  • more refactoring?
  • more tests
  • what to do with issue tests that now have changed behaviour?

Any suggestions? Let's have a reasonable discussion, please don't just close this, I've put a lot of effort into this.

- [DDC-357] Real Lazy-loading toOne for inverse side: when the relation is not EAGER and is inverse it will be hydrated as proxy
- Eager toOne relations are automatically joined to the generated SQL and always loaded
@doctrinebot
Copy link

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DDC-3011

We use Jira to track the state of pull requests and the versions they got
included in.

@stof
Copy link
Member

stof commented Mar 5, 2014

I wouldn't call it effective toOne: for the inverse side, you are joining the table to get the id, and then querying the same table again to load other data when the proxy is initialized. This is where you are wasting queries.

Note that lazy does not mean you will always get a proxy, but that Doctrine can use a proxy. If the entity was already fully loaded in the identity map, it will use it, not a separate proxy

@fprochazka
Copy link
Contributor Author

@stof that's not exactly true, It's a toOne relation, if I knew I would use the entire entity, I would have joined it in the DQL. This only means I have more control over how the data is loaded. With the current solution, I always get 1+N queries no matter what, the only solution is to hack it with partials which sickens me.

I know about identity map and I'm not requiring to always have a proxy. It's just a hint for the SqlWalker and ObjectHydrator - they will try to get me proxy to save resources. If it's already in memory, who cares?

@stekycz
Copy link

stekycz commented Mar 5, 2014

I like this! 👍

@stof I agree there is a waste of querying same table twice however you have to do the same thing now (for better performance) manually. If you need the relation loaded then you should write the join independently on this. This does not help to optimize querying relations you use but it helps to optimize querying relations you do NOT use. An that is the point IMO.

@Ocramius
Copy link
Member

Ocramius commented Mar 5, 2014

@fprochazka assuming that FKs and constraints are in place and guarantee integrity of the association at db level, can we completely skip the joins?

@stekycz
Copy link

stekycz commented Mar 5, 2014

@Ocramius Do you mean that FKs and constraints at db level affects joining and creating of proxies at Doctrine level? How it could be possible?

@Ocramius
Copy link
Member

Ocramius commented Mar 5, 2014

@stekycz the FK constraints are really there to ensure that an associated entity exists, otherwise it's quite useless.

Re-thinking of it, the join cannot be avoided.

@stekycz
Copy link

stekycz commented Mar 5, 2014

@Ocramius I am not sure I understand what you mean. FK could have NULL value (which is default behavior in Doctrine btw). That is why there is used LEFT JOIN clause. Do you suggest to check existence of relation before joining? That is quick hard in complicate queries. The check should be there only for the proxy but it should be already implemented. Am I missing something?

@fprochazka
Copy link
Contributor Author

@beberlei @FabioBatSilva may I please have your thoughts on this before I continue?

@fprochazka
Copy link
Contributor Author

ping

@@ -119,7 +119,17 @@ protected function prepare()

$sourceClassName = $this->_rsm->aliasMap[$this->_rsm->parentAliasMap[$dqlAlias]];
$sourceClass = $this->getClassMetadata($sourceClassName);
$assoc = $sourceClass->associationMappings[$this->_rsm->relationMap[$dqlAlias]];
$assocName = $this->_rsm->relationMap[$dqlAlias];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

@beberlei
Copy link
Member

This requires way too many changes. I don't like that very much. I disagree that this needs solving in such a big refactoring. You could solve this in your application by not mapping the inverse bi-directional one to one association and either use an explicit service or a more active record approach to do something like this:

class Order
{
    private $id;
    private $entityManager;

    public function postLoad($event)
    {
        $this->entityManager = $event->getEntityManager();
    }

    public function getVoucher()
    {
        return $this->entityManager
            ->getRepository('Voucher')
            ->findOneBy(array('order' => $this->id));
    }
}

This is very simple to write on your side and doesn't need so strict changes in the ORM.

@beberlei beberlei closed this Mar 23, 2014
@stekycz
Copy link

stekycz commented Mar 23, 2014

@beberlei I hardly disagree because it is very uncomfortable when you have a lot of OneToOne relations in your system. In real world you need bidirectional access many times (for example to write joins in DQL). I hacked this for a short time by usage ManyToOne relation and created API which looks like OneToOne but it is a hack, not correct solution. However it is better then your suggestion I think because it is not so effective. Yes I can update your suggestion to use private property to keep loaded entity but I have to write it by my own for every OneToOne relation which I need. I think this should be solved by ORM because it is not my job to solve imperfection of framework I use.

And a question at the end. Why did you close this PR without waiting for any reaction by community and author?

@beberlei
Copy link
Member

@stekycz You may call it the imperfection of the library at work here, but it is the impedance mismatch and bidirectional one-to-one, which is problematic even without an ORM, because it either requires many LEFT JOINs, n+1 queries, or NULL foreign keys.

As for your other arguments:

  • With 2.4 you can join without associations, SELECT o, v FROM Order o JOIN Voucher v WITH o.id = v.order
  • The LOC argument doesn't count, in the case above you trade ORM mapping code vs PHP code. In the end its either just a small amount that can hardly be considered.
  • If you need bidirectional associations many times, then why not design them more efficiently for the ORM and put the foreign key on the "main entity", in this example here Order. The impedance mismatch requires to make sacrifices from the relational perfectionism.

Given all this, I am not willing to accept this PR in this state, so i closed it. We can discuss better solutions here and then maybe have a new PR.

@stekycz
Copy link

stekycz commented Mar 23, 2014

Thanks for your response! I agree that it is problematic but I do not see better solution then mentioned refactoring. Current solution has problems you are talking about (many LEFT JOINs or n+1 queries) using Doctrine and this PR solves it in the way you do not like.

  • Thanks for mentioning of update in 2.4. I did not know about it.
  • Yes I trade these differences but I think this is PHP specific library which could help to solve PHP specific problems. You also say there should be some sacrifice from perfectionism so I see this the same.
  • You can design associations better if you design new system, that is true without any doubts. However how often you create new project instead of maintaining older project? The problem could be in legacy code and in my case it is so.

Do I understand well you accept solution of this only if there will not be many changes in code? Why? Are not there tests to check if everything is ok?

@fprochazka
Copy link
Contributor Author

I don't understand why have you closed the pullrequest. What does it mean? Delete all and start again? Major features suggest there might be required major code changes.

And again, I still think this should be solved by ORM. I believe my changes made the querying much better and disagree with you about adding SQL to the result of translating DQL.

I've provided solution that might need a bit more work but is several levels better than the current solution. I understand why you're trying to achieve pure code desing but what good is a clean and shiny tool that does shitty job?

@fprochazka
Copy link
Contributor Author

Let's be constructive here for a while. This is a hude deal for us and we need the ORM to solve this effectively. Why is my solution bad and in what classes/levels of abstraction would you implement the functionality?

@beberlei
Copy link
Member

@fprochazka Currently FETCH_ modes do not affect the SQL generation at all and I want to keep it that way. The idea is that there should be no optimization of SQL by the SQLWalker to make DQL as simple as possible to understand, given its complexity already. The approach in your PR is therefore not acceptable from a high level POV.

To describe the risk of this patch, you add LEFT JOINs somewhere inbetween joined entities, can you guarantee that for all edge cases (inheritance, filters, etc...) your code generates the right SQL? Since we currently have a bug with binding of joins in some cases already, I would assume this patch would create problems here as well. If this patch would increase edge case failures for FETCH_EAGER, then some people might need to disable this feature for their entities globally.

Adding to that the refactoring changes alot in the SqlWalker and adds properties on cached objects (ResultMapping) that affect the performance of ALL queries, not just the one executed.

Currently the Hydrators+UnitOfWork can do optimization via query hints, i.e. HINT_DEFEREAGERLOAD by the BasicEntity persisters and I would prefer a solution that works this way. While I understand this wouldnt allow to work with JOINs, you could improve the batching of eager loads for example to allow batch queries for inverse one-to-one to fetch only the IDs for proxy generation.

However the central point again, inverse bidirectional one-to-one on the main entity is problematic from an ORM<->relational perspective and the best way is to avoid it completely.

@fprochazka
Copy link
Contributor Author

Ok, so tu sum it up:

  • I create hint/logic for hydrators+UoW that will defer loading of the objects (reusing the existing code offcourse and adding what is required to complete this feature)
  • add the posibility to load only proxies for relations that are lazy inverse bidirectional one-to-one, that would result in one single query that fetches the IDs and creates the proxies
  • batch load eager inverse bidirectional one-to-one
  • batch load owning to-one relations that are eager

Sounds good? If I work on it this way (hydrator+UoW level), is there a chance we will solve the problem together and get it merged eventually?

@beberlei
Copy link
Member

Sounds good, the fourth thing is actually supported already. I don't mind if this creates more code (it will probably than the current proposal), but it will be rather isolated from the rest.

@fprochazka
Copy link
Contributor Author

I'm really happy we came to an understanding and I will work on this as soon as I can :)

@Koc
Copy link
Contributor

Koc commented Jun 25, 2014

@fprochazka could you continue work on issue please?

@fprochazka
Copy link
Contributor Author

Currently I cannot, but I have it planned.

@ryall
Copy link

ryall commented Jan 30, 2015

Looking forward to a solution to this.

@austinh
Copy link

austinh commented Feb 16, 2015

+1 Also looking forward to this. It creates very frustrating performance situations for those of us with many OneToOnes.

@JanJakes
Copy link

👍 Any updates/plans with this issue?

@glubo
Copy link

glubo commented May 7, 2015

ping

@Ocramius
Copy link
Member

Ocramius commented May 7, 2015

Instead of pinging, please pick it up yourselves if it's relevant.

@iBasit
Copy link

iBasit commented May 22, 2015

+1 this needs to be implemented, wasted so much time and read so many articles and Q&A, before coming here.

Please add extra property or using extra_lazy loading, so everyone have option to do what they want and not doctrine forcing them to follow there rules and there ways.

Please have this fixed. Thank you.

@jee7
Copy link

jee7 commented Aug 4, 2015

I would also like that. Currently a workaround for me was to change all the OneToOne relations into OneToMany / ManyToOne and have getters / setters as:

public function setMyEntity(MyEntity $myEntity) {
    $this->myEntity->clear();
    $this->myEntity->add($myEntity);

    return $this;
}

public function getMyEntity() {

    return $this->myEntity->first();
}

Although that did break, if I asked:

if (null !== $entity->getMyEntity())

Because first() seems to return false in the case, when there are no objects. So the getter could be a little better.
I still find that approach better than to start using some active record pattern. I think Doctrine is awesome because of the data mapper pattern it uses, an active record approach would not be nice.
Implicit service to match the other end would be a possibility, but then again, what are bi-directional relations for?

@iBasit
Copy link

iBasit commented Aug 4, 2015

I had to do same thing, but Ideally there should be settings for doing on demand request as per developer

@tiger-seo
Copy link

+1 need this

@antanas-arvasevicius
Copy link

Hi everyone, I've also came up with this issue, and just can't believe, is it just really happening.
Does someone is in progress to solve this?
If not, could you share a a problems which you have came up in solving this?

@stayeronglass
Copy link

+1

1 similar comment
@Cosmologist
Copy link

+1

@stof
Copy link
Member

stof commented May 30, 2016

An architecture for the solution was suggested in #970 (comment) but nobody is working on this task currently. Writing a +1 is useless here, as it is only about sending a notification email to all subscribers.
If the task is important for you, the best solution is to pick the task and implement the feature.

@clytemnestra
Copy link

clytemnestra commented Jun 16, 2016

The thing is most people don't know how to implement it, so it makes sense to let the ones that know how to do it, do it, since they'd do a better job. Doctrine is pretty big for someone to learn how it works and implement this feature properly. Most people having this problem come from Symfony, and the only thing we know about Doctrine is how to make queries, and not that much about how it works, or its internals.

But this issue creates a lot of problems and hasn't been fixed for years, requiring us to do ugly fixes or not use OneToOne at all.

@iBasit
Copy link

iBasit commented Oct 26, 2016

revisiting year later just to add +1

@ZRayCC
Copy link

ZRayCC commented May 31, 2018

+1

@fprochazka
Copy link
Contributor Author

As you might have guessed, my plans changed and I don't have capacity to fix this. Feel free to pick it up, if you need it :)

@BonBonSlick
Copy link

BonBonSlick commented May 7, 2019

Tell me please, what if we not use voted workaround for oneToOne but use this?
https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/tutorials/composite-primary-keys.html#identity-through-foreign-entities

Do we still have same bug?

@syther101
Copy link

syther101 commented May 7, 2019

@BonBonSlick I'm unsure how doctrine would handle this. In example 1, the oneToMany relationship to AtricleAttribute is a collection. So doctrine will be able to proxy this fine. (See here for a description about that)

On the inverse side, I assume due to the composite key being the article id, and primary keys not being nullable. Doctrine might be able to load a proxy correctly here too.

In the example 2 thou. I assume you'd have the same problem. The bidirectional relationship isn't mapped in the example. But If you did map it by adding an address field to the user, you'd have the same problem.

Because the address could be null, doctrine will be unable to proxy this and will query to see if an address does exist.

p.s. My understanding of this could be completely wrong. I've just come across this issue in one of my own projects today and have spent the day researching 😂

@hwmatthewa
Copy link

+1

@hubertnnn
Copy link

@syther101 There are really 3 ways doctrine could handle this (plus 2 variations of loading).
Let Assume we have entities User and Address joined using oneToOne relationship

  1. You could keep User as an initialized proxy that the moment you call getAddress will load the address.
    This does not require any extra querying and will lead to n+1 if you need an address.
    Possibly the proxy class could have an extra method similar to __isInitialized() to check its relationships.

  2. You could load just the id of relationship and keep it as proxy.
    This can be done either by leftJoining Address and selecting its id or by doing a second query.
    This will "lazy" load just the ids to make the required proxies and
    as a result it would end up with memory efficient way of loading entities with toOne relationships.

  3. You could load the entire relationship and get the entity instead of proxy.
    This can be done either by leftJoining Address and selecting all its fields or by doing a second query.
    This will eager load all the relationships without using any Proxies.

I think the best solution would be to allow user to pick the method using "fetch" setting.
#2 would be the best default and the best for setting "fetch=lazy", with #3 being used if you set "fetch=eager".
About #1, I am not sure if its possible to do this, since it will prevent entities from converted from proxies to normal entities,
but so far I don't see any downsides in doing so. I would suggest making it accessible by explicitly setting "fetch=extra-lazy"

@auxmoney-tbarcala
Copy link

@hubertnnn The first solution you suggested needs to assume that getAddress is the getter for the address property, and doesn't contain any other logic, which should be true in most cases, but it is not something Doctrine usually demands. Or did I understand it wrong?

Would it also be an option to add a NullableReference proxy, so that we could opt-in to use this class, and then implement the getters/setters like below? This could be a backward-compatible change.

#[ORM\OneToOne(mappedBy: 'cmsUser')]
/** @var NullableReference<CmsAddress> */
private NullableReference $cmsAddress;

public function __construct()
{
    $this->cmsAddress = NullableReference::null();
}

public function getCmsAddress(): ?CmsAddress
{
    return $this->cmsAddress->deref();
}

public function setCmsAddress(?CmsAddress $cmsAddress): void
{
    $this->cmsAddress->set($cmsAddress);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet