Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in file library on ~5,000,000+ items #14796

Closed
emrahnazif opened this issue Aug 2, 2022 · 7 comments · Fixed by #16870
Closed

Performance regression in file library on ~5,000,000+ items #14796

emrahnazif opened this issue Aug 2, 2022 · 7 comments · Fixed by #16870

Comments

@emrahnazif
Copy link

Describe the Bug

TLTR:
from app it takes 60 secs to load /app/files meanwhile API returns /files in 30 ms.
Same for other collections. 3 bottlenecks: sort, filter_count, total_count

I have around 6m files in my directus projects.
Through the app when you nagivate to /files, it makes 2 requests:

This request returns latest files & total_count:
https://directus.api/files?limit=100&fields[]=id&fields[]=modified_on&fields[]=type&fields[]=title&fields[]=type&fields[]=filesize&sort[]=-uploaded_on&page=1&filter=%7B%22_and%22:[%7B%22type%22:%7B%22_nnull%22:true%7D%7D,%7B%22folder%22:%7B%22_null%22:true%7D%7D]%7D&meta[]=filter_count&meta[]=total_count

Original request takes 30 secs to return on apple m1. When remove all filtersa and ask only /files it takes 30 ms, super fast.
3 things creates bottlenecks:
sort[]=-uploaded_on (adding this increased request time from 30ms to 12secs)
meta[]= filter_count (adding this increased request time from 30ms to 11secs)
meta[]= total_count (adding this increased request time from 30ms to 2.2secs)

Info: filter_count and total_count returns the same value: around 6million.

This request returns only total_count: The request takes 12 secs to return on apple m1.
https://directus.api/files?limit=100&filter=%7B%22_and%22:[%7B%22type%22:%7B%22_nnull%22:true%7D%7D,%7B%22folder%22:%7B%22_null%22:true%7D%7D]%7D&fields[]=id&fields[]=title&fields[]=description&fields[]=tags&fields[]=location&fields[]=storage&fields[]=storage_divider&fields[]=filename_disk&fields[]=filename_download&fields[]=metadata&fields[]=type&fields[]=filesize&fields[]=modified_by&fields[]=modified_on&fields[]=embed&fields[]=uploaded_by&fields[]=uploaded_on&fields[]=folder&fields[]=width&fields[]=height&fields[]=charset&fields[]=duration&fields[]=training&fields[]=inference&fields[]=annotations&fields[]=$thumbnail&sort=-uploaded_on&aggregate=%7B%22count%22:[%22*%22]%7D

This 2nd request seems unnecessary as App already gets total_count from the first request. Also, excluding all fields reduces to time to get from 12 secs to 2 secs for the same requested data.

As app requests to gets both of these at the same time, it takes around 60 secs to complete, CPU hits 100, everything slows down. This applies

I do not know what can be done but I know this makes the APP less useful when it takes up-to a minute to load page.

To Reproduce

Create millions of items in a collection and try to open a collection from the APP

Errors Shown

No errors

What version of Directus are you using?

v9.13.0

What version of Node.js are you using?

N/A

What database are you using?

mysql 8

What browser are you using?

chrome

How are you deploying Directus?

local

@rijkvanzanten
Copy link
Member

Could you do me a favor, and see what the general speed difference is of select count(*) from directus_files versus select * from directus_files limit 100 on your mysql 8 installation? Of course it's not an inclusive representative test, but at least it gives us a rough idea of what the percentage difference is on the raw database 🙂

@rijkvanzanten rijkvanzanten changed the title Performance bottleneck in that app/collection request Performance regression in file library on ~5,000,000+ items Aug 2, 2022
@emrahnazif
Copy link
Author

emrahnazif commented Aug 2, 2022

Could you do me a favor, and see what the general speed difference is of select count(*) from directus_files versus select * from directus_files limit 100 on your mysql 8 installation? Of course it's not an inclusive representative test, but at least it gives us a rough idea of what the percentage difference is on the raw database 🙂

Sure, with my pleasure, I run the same queries for 5 times:
select count(*) from directus_files
2.316 s
2.352 s
2.299 s
2.355 s
2.362 s

select * from directus_files limit 100
14 ms
12 ms
19 ms
16 ms
10 ms

rough idea of what the percentage difference is on the raw database

I believe the difference is pretty clear, at least on my setup :)

@rijkvanzanten, I saw you changed title to "Performance regression in file library on ~5,000,000+ items" In fact, it is the same for collections as well. Probably after a little more items.

@emrahnazif
Copy link
Author

2 more tests, first one is "directus_activity" with 15,3m records. Other is my_collection with again 6 m records

select count(*) from directus_activity
11.000 s

select * from directus_activity limit 100
14 ms

select count(*) from my_collection
2.350 s

select * from my_collection limit 100
15 ms

This also shows, when the number of records increased in a table, what will happen.

@rijkvanzanten
Copy link
Member

Thanks! Beyond adding indexes or some other engine optimizations in mysql's side, we can't do too much to bring those numbers down, but one massive improvement we can do here is to treat the counts as "async" in the app, so instead of waiting around for that number to be calculated until the page loads, show the first 100 items immediately (as those are loaded fast), and update the counts once they come in (even though that might take a while)

@joachimmueller
Copy link

+1 for this solution. The UI becomes unresponsive otherwise.

@Sandros94
Copy link

@rijkvanzanten So if I understand it correctly this is more like a mysql bottleneck? With a similar number of items, on a different database like postgresql with redis, we should see an improvement?
Since every file has a record inside the database and that record is used to display the folder page, right?

I know almost nothing about databases and I have only a few months of experience with Directus, so I'm still learning.

@rijkvanzanten
Copy link
Member

Linear: ENG-214

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants