Performance issue on nested o2m reading #11873

u12206050 · 2022-02-26T20:11:09Z

Preflight Checklist

I have completed all Troubleshooting Steps.
I'm on the latest version of Directus.
There's no other issue that already describes my problem.

Describe the Bug

Using directus items api and requesting specific fields on a relationship such as in the case of:

{
  filter: { 'status': { '_eq': 'confirmed' } },
  fields: ['person.id', 'person.firstname', 'person.lastname', 'person.mobile']
}

Person here being a relation on this particular collection. I was wondering why it was slow and was shocked to find the query being executed against the database looked like this:

select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '723A0-B790-FA151' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '823A0-B820-FA545' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '833A0-B830-FA414' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '633A0-B640-FA040' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '633A0-B600-FA656' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '243A0-B230-FA555' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '223A0-B240-FA444' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '243A0-B210-FA747' limit 100) as `foo` union all select * from (select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` = '213A0-B220-FA040' limit 100) as `foo` union all …

Obviously so many unions is going to be small, why not just

select `persons`.`id`, `persons`.`mobile`, `persons`.`firstname`, `persons`.`lastname` from `persons` where `persons`.`id` IN ('723A0-B790-FA151', '823A0-B820-FA545', '223A0-B240-FA444', '243A0-B210-FA747', '213A0-B220-FA040')

To Reproduce

Setup a collection with a relationship (M2O) to another collection.
Make an api request to that collection with a similar query as shown above.
You might need to monitor your database for queries or console log the queries being executed using trace in ENV

Errors Shown

Actually have been getting 500 errors if I have too high a limit.

What version of Directus are you using?

9.5.2

What version of Node.js are you using?

16.4.0

What database are you using?

mysql 8

What browser are you using?

chrome

What operating system are you using?

linux

How are you deploying Directus?

GCP

The text was updated successfully, but these errors were encountered:

rijkvanzanten · 2022-02-28T16:14:56Z

This is known and in the works here: #11246

The short answer to the long question "Obviously so many unions is going to be small, why not just:" is limits. We want to fetch n items per parent key, by doing a WHERE IN query like that, you fetch all nested items per parent key. This isn't an issue in a table with a couple hundred/thousand items, but will blow up when you have hundreds of thousands / millions of related items.

The union query approach alleviated that, by being able to have a limit per parent item, but (as you correctly stated) has performance drawbacks of its own 👍🏻

benhaynes · 2022-02-28T16:16:18Z

Could we conditionally change the query based on the presence of the limit parameter?

u12206050 · 2022-03-01T08:28:39Z

Ok I understand. Chunking the queries could help, although this entire issue for using UNION seems to be due to a high number of child records if I understand correctly. In which case it is probably not ideal to run a query without filtering, since you will anyways end up with performance issues down the line. Thanks though.

rijkvanzanten · 2022-03-01T15:14:15Z

In which case it is probably not ideal to run a query without filtering, since you will anyways end up with performance issues down the line. Thanks though.

Absolutely, that's the problem here 😄

We want to fetch up to limit items per parent item, so for example we only want to fetch 3 items per parent:

id: 1
nested:
  - 1
  - 2
  - 3

id: 2
nested:
  - 4
  - 5
  - 6

However, in SQL when doing a WHERE IN query, you can't apply that limit on a per-parent basis. This:

SELECT * FROM example WHERE IN parent_id (1, 2) LIMIT 3

will only fetch 3 items total, instead of the 3 items per parent. If we were to double the limit based on the amount of parents:

SELECT * FROM example WHERE IN parent_id (1, 2) LIMIT 6

you would end up with:

id: 1
nested:
  - 1
  - 2
  - 3
  - 4
  - 5

id: 2
nested:
  - 6

as the limit is on the total set of nested items (before "stitching").

At first (in a previous alpha), we would fetch all related o2m items into memory, and then apply the limit in javascript. This obviously causes some performance issues in memory, as you can end up loading a virtually endless amount of records into memory. This wasn't as big of a deal when you had a couple thousand items to stitch, but would really cause trouble when you're talking larger data sets.

My proposed fix on the PR is effectively reverting to the previous setup, but doing it in a way that doesn't require loading the whole related table into memory at once. Instead, we can batch over the rows in the table in chunks of n rows at a time, and short-circuit once all the parent items have their nested items filled. This should result in the best of both worlds 👍🏻

u12206050 added the Bug (Potential) label Feb 26, 2022

rijkvanzanten added API 🍰 Bug and removed Bug (Potential) labels Feb 28, 2022

rijkvanzanten self-assigned this Feb 28, 2022

rijkvanzanten mentioned this issue Feb 28, 2022

Replace union query approach with updated table scan #11246

Merged

2 tasks

rijkvanzanten changed the title ~~Terrible DB query execution~~ Performance issue on nested o2m reading Feb 28, 2022

rijkvanzanten closed this as completed in #11246 Mar 1, 2022

github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue on nested o2m reading #11873

Performance issue on nested o2m reading #11873

u12206050 commented Feb 26, 2022 •

edited by rijkvanzanten

rijkvanzanten commented Feb 28, 2022

benhaynes commented Feb 28, 2022

u12206050 commented Mar 1, 2022

rijkvanzanten commented Mar 1, 2022

Performance issue on nested o2m reading #11873

Performance issue on nested o2m reading #11873

Comments

u12206050 commented Feb 26, 2022 • edited by rijkvanzanten

Preflight Checklist

Describe the Bug

To Reproduce

Errors Shown

What version of Directus are you using?

What version of Node.js are you using?

What database are you using?

What browser are you using?

What operating system are you using?

How are you deploying Directus?

rijkvanzanten commented Feb 28, 2022

benhaynes commented Feb 28, 2022

u12206050 commented Mar 1, 2022

rijkvanzanten commented Mar 1, 2022

u12206050 commented Feb 26, 2022 •

edited by rijkvanzanten