-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing performance regression testing #14599
Comments
@nrmancuso I would like to give it a shot. But, I want to discuss a few things about this to get more insights before everything starts. I want to follow the best approach. Therefore, I will have to introduce a new test task and create a new script that benchmarks multiple metrics for checkstyle execution. I think it would be better if we also integrate this into the CI (just like how we use the "GitHub, generate report" command). To benchmark, ideally, it would be beneficial for us to run Checkstyle on some large projects that take relatively longer execution time. Do you have some suggestions on which project should we use as the benchmark sample? Or do we want to do it over many different projects? (In this case, this will be a part of the original regression test) Another question is, do you prefer to run this benchmark inside the original regression test procedure (change the original code), or run this benchmark in a completely isolated new script? IMO, it would be better to run the benchmark in a completely isolated procedure. For the following reasons:
PS: if any of my understanding above is inaccurate, please correct me :) |
The latest openjdk has proven to be a good codebase to test on for both performance and memory consumption considerations. |
I will start right now to build a new github action task for measuring both performance and memory consumption on openJDK. |
I would suggest to map out your high level plan here to make sure we are all on the same page. Also, it would be a good idea to create a POC in your own fork of checkstyle and share here to demonstrate what this solution will look like. |
Draft PlanGoalImplement a new CI test task that benchmarks Checkstyle execution and compares the metrics between the new patch and the original version. UsageThis benchmark task runs with other CIs when a PR is created or pushed. PlanCreate a new GitHub action in
PS: now we use |
@nrmancuso this is a draft high-level plan, please take a look. I will be working on a sample in my forked repo for the following few days. |
We just need use https://man7.org/linux/man-pages/man1/time.1.html over some command that runs parsing only (config with no modules under Treewalker, or single lightweight module) As soon as this basis prove of concept works well we can extend it to memory and any other parameters. This basis implementation should be able to catch regression if in some PR we revert performance optimization. |
@romani I have updated my plan (#14599 (comment)). Please take a look. I will work on the POC in my local fork repo now. |
this is already done by all CI for you. |
Issue checkstyle#14599: Build performance regression test
Closed via #14754 |
I see that we did not add memory stats, as I suggested above. Let's use this script in practice for awhile and see if we really need such features :) |
@nrmancuso Yes, we decided to go with bare minimum first. If we decide to implement a memory stats in the future, please feel free to let me know. I can do that quickly, since I have that feature in an early version of the PR. |
As discovered at #14566 (and elsewhere previously), parsing performance is a critical, user-facing metric that we should be consistently measuring and verifying. We have discussed this topic previously, but no action has been taken and we released a "bad" version to users.
A quick hack can be to time the execution of OpenJDK no-error CI tasks; however, this can end up bring pretty inconsistent due to the fact that we download the repo to be parsed during the report generation. Additionally, variables like the amount of resources allocated to the JVM, etc. play a role.
A slightly better approach would be to improve our AST regression testing to show checkstyle execution time as a difference between master and the feature branch (as a percentage), since this will be executed on the same machine, sequentially, and have identical resource allocation.
Probably the best approach would be to create a new script to time only checkstyle execution, and capture other metrics like memory usage, etc, without all the I/O involved with report generation.
For example, our generated report could tell us that parsing was 33% faster on the feature branch, used 10% less memory, etc.
The text was updated successfully, but these errors were encountered: