Skip to content

Improve DeltaList behavior for large projects #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 26, 2025

Conversation

gsmet
Copy link

@gsmet gsmet commented Jun 3, 2025

This pull request targets the 3.x branch as I think this support has been rewritten in 4.

Here is the situation prior to this patch with a massive project (~40k source files):

Screenshot from 2025-06-03 17-17-35

7172 samples, 3.79%

The first commit keeps the ordering as is and make sure we write the file sorted and the outputs of DeltaList are sorted.
The behavior is exactly the same as before.

Here is the situation after the first commit:

Screenshot from 2025-06-03 17-17-27

78 samples, 0.04% - basically, the only thing left is the tiny thing you can see at the far right of the previous flame graph.

The second commit is a further optimization, which might not be useful and only sorts when displaying the output. The file is written unsorted.
It doesn't change much tbh so we might as well not include the second commit.
I did the work to be comprehensive so I thought I would present it.

Here is the situation after the second commit:

Screenshot from 2025-06-03 17-17-43

59 samples, 0.04%

Opening as a draft so that we decide what to do about the second commit: my advice would be to drop it but YMMV

BTW, it might be possible to optimize further but given this code is readable and give good results, I think it's good enough.

Fixes #334


To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@gsmet gsmet changed the title Improve delta list Improve DeltaList behavior for large projects Jun 3, 2025
@gsmet gsmet marked this pull request as draft June 3, 2025 15:32
Comment on lines 36 to 37
this.added = newList.stream().filter(i -> !oldList.contains(i)).collect(Collectors.toList());
this.removed = oldList.stream().filter(i -> !newList.contains(i)).collect(Collectors.toList());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to use HashSets and avoid copying then removing a lot of elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be better to use the old for loop instead of streams?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could sure.

@gsmet
Copy link
Author

gsmet commented Jun 13, 2025

hey @jorsol , any chance you could have a look at this PR?

gsmet added 2 commits June 26, 2025 16:07
I have been experimenting with a large code base to try to improve
Quarkus build time and I stumbled upon the fact that DeltaList was very
inefficient in these cases.

In this commit, I keep the ordering even if not actually important. I
will push further optimizations in a second commit that we can discuss
further.

Fixes apache#334
@gsmet gsmet force-pushed the improve-DeltaList branch from 2cfe2a8 to c4a7c66 Compare June 26, 2025 14:17
@cstamas
Copy link
Member

cstamas commented Jun 26, 2025

@desruisseaux @gnodet ping

Copy link
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gsmet gsmet marked this pull request as ready for review June 26, 2025 14:26
@gsmet
Copy link
Author

gsmet commented Jun 26, 2025

I adjusted the commits:

  • dropped the commit removing the sorts - this commit made things a bit faster but it's probably more future proof to keep the output properly sorted
  • I switched from streams to for loops in DeltaList

@desruisseaux
Copy link
Contributor

It looks fine to me as well.

@gnodet gnodet merged commit 63846f1 into apache:maven-compiler-plugin-3.x Jun 26, 2025
88 checks passed
@github-actions github-actions bot added this to the 3.14.1 milestone Jun 26, 2025
@gnodet gnodet added the enhancement New feature or request label Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants