Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(utils): improve the performance of scoring and reporting #212

Merged
merged 10 commits into from
Nov 14, 2023

Conversation

MishaSeredenkoPushBased
Copy link
Contributor

Improved the performance of scoring and reporting.
Removed nested loops where possible, added dictionaries to go constant time O(1) instead of linear O(n).
Made minor refactoring, added some checks.
@BioPhoton As task owner, let me know if I understood the task correctly and implemented everything properly. Also, maybe you could propose additional changes for further performance improvements.

Closes #132

@BioPhoton
Copy link
Collaborator

Could you add the new logic the the benchmark tests please.

Copy link
Collaborator

@BioPhoton BioPhoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
One thing would be good to add is the new benchmark like here: #137

packages/core/src/lib/implementation/persist.ts Outdated Show resolved Hide resolved
packages/core/src/lib/implementation/persist.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/report.spec.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/report.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/report.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/scoring.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/scoring.ts Outdated Show resolved Hide resolved
packages/utils/src/lib/scoring.ts Outdated Show resolved Hide resolved
@MishaSeredenkoPushBased
Copy link
Contributor Author

@BioPhoton @matejchalk well, it looks like the changes don't affect the performance in a positive way - I added the updated functions to the benchmark, file utils/perf/optimized3.mjs. The performance went 3 times slower than before, and even slower than base:
image
We definitely need to try the imperative + dictionaries. But it seems like the dictionaries filling also costs some performance.
Any other suggestions? 🤔

@BioPhoton
Copy link
Collaborator

could we get the traces for that?

@MishaSeredenkoPushBased
Copy link
Contributor Author

MishaSeredenkoPushBased commented Nov 13, 2023

I checked my changes by benchmarking - it appeared that it got 535.000 ops/s. Really bad I would say, even worse than base. Then, by reverting my changes one function by one, I figured out that the initial performance for the scoring was 750.000 - looks better, but not so cool. There are different other optimized solutions in the benchmark, which gain up to 1.600.000 (on my machine).
It looks like the performance is affected by the declarative approach, multiple spread operators, .map, .reduce and so on. I tried to go ultimately imperative and see what the difference could be.
I took the optimized2 solution from benchmark, fixed it (it didn't work properly), and optimized it more. I put it into optimized3 check, and it gained almost 1.800.000 ops/s!
The only thing is that the code it highly imperative - a lot of mutations, nested functions and so on. Looks definitely not perfect, I expect that someone could leave the comments where it could be improved. But overall, I consider that this huge 3.5 times performance improvement is more important than prettiness or higher readability.

image

@BioPhoton @matejchalk FYI

Copy link
Collaborator

@matejchalk matejchalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A have a question about the benchmarks. What numbers of audits and groups are you using?

The fact that imperative code is faster the declarative is not surprising (that's always the case in benchmarks which use extreme values), but the trick is figuring out if it's relevant to us or not.

If the performance optimizations are not noticeable for realistic amounts of data, then optimizing for code maintenance is still the way to go. E.g. if it doesn't make a difference until we reach thousands of groups, then it's not relevant to us, because that's not a realistic scenario and we would do better to prioritize our ability to understand and maintain the code and prevent bugs (side-effects like mutations are an anthesis to this). There's a famous quote that "premature optimization is the root of all evil", after all.

I would estimate that 100s and in extreme cases 1000s of audits are realistic (e.g. for the CLI, our very strict ESLint gives us 347 audits). So if the performance optimization has an impact at this level, then it's worth it, but if we don't see the performance benefits until we reach a million audits for example, then it isn't.

For groups, the number is much lower. I can't imagine we'd have over a 100 even.

e2e/cli-e2e/tests/__snapshots__/collect.spec.ts.snap Outdated Show resolved Hide resolved
packages/utils/src/lib/scoring.ts Show resolved Hide resolved
packages/utils/src/lib/scoring.ts Outdated Show resolved Hide resolved
@MishaSeredenkoPushBased
Copy link
Contributor Author

Overall, as a comment on the changes - the final performance improvement is 2.3 times in comparison to base solution, with 1000 audits and 100 groups.
image
The more audits and groups are present, the bigger is the difference. Here are crazy 10.000 of audits and 100 groups:
image
Almost 3 times difference - looks like it's a really performant solution.

@MishaSeredenkoPushBased MishaSeredenkoPushBased merged commit 41d7c0b into main Nov 14, 2023
11 checks passed
@MishaSeredenkoPushBased MishaSeredenkoPushBased deleted the improve-scoring-performance branch November 14, 2023 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔥 performance performance optimization 🧩 utils
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make scoring and reporting logic more performant
4 participants