Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: performance improvement and shuffling limited queries #3

Merged
merged 3 commits into from
Oct 16, 2023

Conversation

Sandros94
Copy link
Contributor

@Sandros94 Sandros94 commented Oct 16, 2023

The original shuffleArray and getRandomItems functions had a quadratic performance, mainly caused by splice in the loop, resulting in a $O(n^2)$ performance ($n$ being the number of items to shuffle).
With the these updated ones (credit to this awesome blog post about Fisher-Yates Shuffle) we get a much better linear performance of $O(n)$. Crucial when shuffling medium to big collections.

This also fixes the fact that when doing a limited query, the output would have still being sorted by ids in alphanumeric order.

Detailed Explanation

A slightly in-depth explanation on the performance improvement that this PR introduces (mainly discussed privately on Discord):

splice on its own has a performance of O(n), doing a for loop on an array has also a performance of O(n). The sum thus is O(n^2). Removing the usage of splice (thus editing the array inplace, without an additional array) makes us achieve O(n) performance in shuffleArray.

For getRandomItems its a similar story, the original one, using splice achieve a performance of a minimum of O(amount*n) and a maximum of O(n^2), but if we remove splice we go down to a minimum of O(amount) and a maximum of O(n). But since we only picked a random set, without truly shuffling it, we need to add a shuffleArray making it in the end a performance between O(amount^2) and a maximum of O(n^2)

@br41nslug
Copy link
Owner

As you may be able to guess this extension was not made with performance in mind (or production use for that matter). But i won't say no to a free performance improvement so will have a look at it later 🚀.

@Sandros94
Copy link
Contributor Author

As you may be able to guess this extension was not made with performance in mind (or production use for that matter).

Oh I'm 100% aware of that! But since you mentioned it in directus/#3240 I wanted to contribute to this nice example. 😁

But i won't say no to a free performance improvement so will have a look at it later 🚀.

This is actually my first time placing my hands on a Directus extension, but I shouldn't have made a mess 😜

@Sandros94
Copy link
Contributor Author

I've noticed a small mistake in getRandomItems, I'm fixing it right now.

@Sandros94 Sandros94 changed the title fix: performance improvement for large collections fix: performance improvement and shuffling limited queries Oct 16, 2023
Copy link
Owner

@br41nslug br41nslug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition, this shuffle algorithm is measurably more efficient 🚀

@br41nslug br41nslug merged commit 347d1e0 into br41nslug:main Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants