Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on nested o2m reading #11873

Closed
3 tasks done
u12206050 opened this issue Feb 26, 2022 · 4 comments · Fixed by #11246
Closed
3 tasks done

Performance issue on nested o2m reading #11873

u12206050 opened this issue Feb 26, 2022 · 4 comments · Fixed by #11246
Assignees
Labels

Comments

@u12206050
Copy link
Contributor

u12206050 commented Feb 26, 2022

Preflight Checklist

Describe the Bug

Using directus items api and requesting specific fields on a relationship such as in the case of:

{
  filter: { 'status': { '_eq': 'confirmed' } },
  fields: ['person.id', 'person.firstname', 'person.lastname', 'person.mobile']
}

Person here being a relation on this particular collection. I was wondering why it was slow and was shocked to find the query being executed against the database looked like this:

select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '723A0-B790-FA151' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '823A0-B820-FA545' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '833A0-B830-FA414' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '633A0-B640-FA040' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '633A0-B600-FA656' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '243A0-B230-FA555' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '223A0-B240-FA444' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '243A0-B210-FA747' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '213A0-B220-FA040' limit 100) as `foo` union all …

Obviously so many unions is going to be small, why not just

select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` IN ('723A0-B790-FA151', '823A0-B820-FA545', '223A0-B240-FA444', '243A0-B210-FA747', '213A0-B220-FA040')

To Reproduce

Setup a collection with a relationship (M2O) to another collection.
Make an api request to that collection with a similar query as shown above.
You might need to monitor your database for queries or console log the queries being executed using trace in ENV

Errors Shown

Actually have been getting 500 errors if I have too high a limit.

What version of Directus are you using?

9.5.2

What version of Node.js are you using?

16.4.0

What database are you using?

mysql 8

What browser are you using?

chrome

What operating system are you using?

linux

How are you deploying Directus?

GCP

@rijkvanzanten
Copy link
Member

This is known and in the works here: #11246

The short answer to the long question "Obviously so many unions is going to be small, why not just:" is limits. We want to fetch n items per parent key, by doing a WHERE IN query like that, you fetch all nested items per parent key. This isn't an issue in a table with a couple hundred/thousand items, but will blow up when you have hundreds of thousands / millions of related items.

The union query approach alleviated that, by being able to have a limit per parent item, but (as you correctly stated) has performance drawbacks of its own 👍🏻

@benhaynes
Copy link
Sponsor Member

Could we conditionally change the query based on the presence of the limit parameter?

@rijkvanzanten rijkvanzanten changed the title Terrible DB query execution Performance issue on nested o2m reading Feb 28, 2022
@u12206050
Copy link
Contributor Author

Ok I understand. Chunking the queries could help, although this entire issue for using UNION seems to be due to a high number of child records if I understand correctly. In which case it is probably not ideal to run a query without filtering, since you will anyways end up with performance issues down the line. Thanks though.

@rijkvanzanten
Copy link
Member

In which case it is probably not ideal to run a query without filtering, since you will anyways end up with performance issues down the line. Thanks though.

Absolutely, that's the problem here 😄

We want to fetch up to limit items per parent item, so for example we only want to fetch 3 items per parent:

id: 1
nested:
  - 1
  - 2
  - 3

id: 2
nested:
  - 4
  - 5
  - 6

However, in SQL when doing a WHERE IN query, you can't apply that limit on a per-parent basis. This:

SELECT * FROM example WHERE IN parent_id (1, 2) LIMIT 3

will only fetch 3 items total, instead of the 3 items per parent. If we were to double the limit based on the amount of parents:

SELECT * FROM example WHERE IN parent_id (1, 2) LIMIT 6

you would end up with:

id: 1
nested:
  - 1
  - 2
  - 3
  - 4
  - 5

id: 2
nested:
  - 6

as the limit is on the total set of nested items (before "stitching").

At first (in a previous alpha), we would fetch all related o2m items into memory, and then apply the limit in javascript. This obviously causes some performance issues in memory, as you can end up loading a virtually endless amount of records into memory. This wasn't as big of a deal when you had a couple thousand items to stitch, but would really cause trouble when you're talking larger data sets.

My proposed fix on the PR is effectively reverting to the previous setup, but doing it in a way that doesn't require loading the whole related table into memory at once. Instead, we can batch over the rows in the table in chunks of n rows at a time, and short-circuit once all the parent items have their nested items filled. This should result in the best of both worlds 👍🏻

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants