New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi thread mode to checkstyle launcher #4370

Closed
sabaka opened this Issue May 23, 2017 · 14 comments

Comments

@sabaka
Contributor

sabaka commented May 23, 2017

We need to have ability to launch checkstyle in both modes - single thread and multi thread.
Single thread mode should be used by default, but switching between odes should not be hard.
It will help us to add new MT mode step by step and be able to merge it without breaking existent code.

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani May 23, 2017

Member

While we are not sure about exact design, please do not document CLI arguments to keep them hidden during summer.

Member

romani commented May 23, 2017

While we are not sure about exact design, please do not document CLI arguments to keep them hidden during summer.

@romani romani added the approved label May 23, 2017

@soon

This comment has been minimized.

Show comment
Hide comment
@soon

soon May 29, 2017

Contributor

I've faced with a trouble while solving this issue. My first impression was to add a cli parameter, create two classes MultiThreadChecker and MultiThreadedTreeWalker and then create ModuleResolver class (like existing PropertyResolver) which will replace Checker with MultiThreadChecker and TreeWalker with MultiThreadTreeWalker while resolving classes according to the checker config. This allows us to update multi-thread module versions separately without affecting existing single-thread classes.

However, I'm not sure if this is a right way, at least because there are some tests which requires that all modules should be presented in the checkstyle_checks.xml config. Of course, multi-thread modules could be ignored in these tests, because they will mostly provide the same functionality as the single-thread versions, but, as I already said, not sure if the issue should be solved this way.

Another possible solution is to extract logic for launching checks into another classes, which will not be implemented as modules. However, this requires to move some logic from Checker and TreeWalker and would definitely lead to conflicts.

@sabaka @romani I'm not sure which solution should I choose. Could you please help me to select the right way?

Contributor

soon commented May 29, 2017

I've faced with a trouble while solving this issue. My first impression was to add a cli parameter, create two classes MultiThreadChecker and MultiThreadedTreeWalker and then create ModuleResolver class (like existing PropertyResolver) which will replace Checker with MultiThreadChecker and TreeWalker with MultiThreadTreeWalker while resolving classes according to the checker config. This allows us to update multi-thread module versions separately without affecting existing single-thread classes.

However, I'm not sure if this is a right way, at least because there are some tests which requires that all modules should be presented in the checkstyle_checks.xml config. Of course, multi-thread modules could be ignored in these tests, because they will mostly provide the same functionality as the single-thread versions, but, as I already said, not sure if the issue should be solved this way.

Another possible solution is to extract logic for launching checks into another classes, which will not be implemented as modules. However, this requires to move some logic from Checker and TreeWalker and would definitely lead to conflicts.

@sabaka @romani I'm not sure which solution should I choose. Could you please help me to select the right way?

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach May 30, 2017

Member

there are some tests which requires that all modules should be presented in the checkstyle_checks.xml config

Tests can be rewritten and should not restrict us how we make code in main. Don't think of existing tests when writing new code, but you can think of how this new code should be tested. These current tests were made for how checkstyle was at the time, and are not really future proof for how checkstyle might change.

Another note, there is no restriction on having multiple treewalkers in one configuration. It shouldn't break anything or cause any issues as each one is an isolated module.

Another possible solution is to extract logic for launching checks into another classes

IMO, we should split these 2 modules into 3 different classes each.
Ex: Checker into AbstractChecker, SingleModeChecker, MultiModeChecker (names are just examples).
There is a predefined set of work that checker does in all modes. This work may not change fully when switching to multi-mode, but it could be separated into more modular methods so the same code can be used and called at will between the 2 modes as they see fit.
For example, Checker.processFiles might change as we will probably process files differently in multi-mode with each file it's own thread, but Checker.processFile where we do the actual work on a single file will remain the same between the 2 modes.

Could you please help me to select the right way?

I have one additional question you may not have considered. There will be an on/off for multithreading but how will you determine the number of threads to use? Will this be CLI, configuration, or based on the number of threads found? How will checker and treewalker work together to control the max number of threads?

Of the 2 you gave, CLI sounds the best.

Only other way I can see not mentioned is dropping ModuleResolver completely and embed the modes into the configuration file (IE: manually changing Checker to MultiThreadChecker), but this might not be a good idea if we decide we need both checker and treewalker to be the same mode.

So for now, the CLI option and ModuleResolver seems the best to me.

Edit: Isn't ModuleResolver the same as PackageObjectFactory?

Member

rnveach commented May 30, 2017

there are some tests which requires that all modules should be presented in the checkstyle_checks.xml config

Tests can be rewritten and should not restrict us how we make code in main. Don't think of existing tests when writing new code, but you can think of how this new code should be tested. These current tests were made for how checkstyle was at the time, and are not really future proof for how checkstyle might change.

Another note, there is no restriction on having multiple treewalkers in one configuration. It shouldn't break anything or cause any issues as each one is an isolated module.

Another possible solution is to extract logic for launching checks into another classes

IMO, we should split these 2 modules into 3 different classes each.
Ex: Checker into AbstractChecker, SingleModeChecker, MultiModeChecker (names are just examples).
There is a predefined set of work that checker does in all modes. This work may not change fully when switching to multi-mode, but it could be separated into more modular methods so the same code can be used and called at will between the 2 modes as they see fit.
For example, Checker.processFiles might change as we will probably process files differently in multi-mode with each file it's own thread, but Checker.processFile where we do the actual work on a single file will remain the same between the 2 modes.

Could you please help me to select the right way?

I have one additional question you may not have considered. There will be an on/off for multithreading but how will you determine the number of threads to use? Will this be CLI, configuration, or based on the number of threads found? How will checker and treewalker work together to control the max number of threads?

Of the 2 you gave, CLI sounds the best.

Only other way I can see not mentioned is dropping ModuleResolver completely and embed the modes into the configuration file (IE: manually changing Checker to MultiThreadChecker), but this might not be a good idea if we decide we need both checker and treewalker to be the same mode.

So for now, the CLI option and ModuleResolver seems the best to me.

Edit: Isn't ModuleResolver the same as PackageObjectFactory?

@soon

This comment has been minimized.

Show comment
Hide comment
@soon

soon May 30, 2017

Contributor

I have one additional question you may not have considered. There will be an on/off for multithreading but how will you determine the number of threads to use? Will this be CLI, configuration, or based on the number of threads found? How will checker and treewalker work together to control the max number of threads?

I think we should allow user to specify the number of threads via CLI, so, I'm going to add a parameter --threads-number=N, which will enable multi-threaded mode. This will be passed to the Checker and TreeWalker through Configuration object.

Not sure if we should specify the number of threads though the config, but this could be easily added in the future as a module parameter, so, I'm going to implement CLI parameter only for now.

Contributor

soon commented May 30, 2017

I have one additional question you may not have considered. There will be an on/off for multithreading but how will you determine the number of threads to use? Will this be CLI, configuration, or based on the number of threads found? How will checker and treewalker work together to control the max number of threads?

I think we should allow user to specify the number of threads via CLI, so, I'm going to add a parameter --threads-number=N, which will enable multi-threaded mode. This will be passed to the Checker and TreeWalker through Configuration object.

Not sure if we should specify the number of threads though the config, but this could be easily added in the future as a module parameter, so, I'm going to implement CLI parameter only for now.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk May 30, 2017

Contributor

@soon @rnveach @sabaka
As you started discussing the implementation and the details of Checker and TreeWalker hierarchies for single-thread and multi-thread modes, I have a question: did you chose what approach to use:

  1. 3rd party library (Akka, etc);
  2. Pure Java (thread pools, etc) ?
Contributor

MEZk commented May 30, 2017

@soon @rnveach @sabaka
As you started discussing the implementation and the details of Checker and TreeWalker hierarchies for single-thread and multi-thread modes, I have a question: did you chose what approach to use:

  1. 3rd party library (Akka, etc);
  2. Pure Java (thread pools, etc) ?
@soon

This comment has been minimized.

Show comment
Hide comment
@soon

soon May 31, 2017

Contributor

@MEZk Right now I'm working on the switching between multi-threaded and single-threaded modes, without actual implementation. I'm going to start write the initial MT implementation in 1-2 days.

Actually, I haven't work with Akka or similar, but I'd prefer to use plain Java threads+executors. My argument is simple - actor model will introduce unnecessary complexity. We don't have to build distributed reactive checker (building responsive distributed systems is a feature noted in the docs), we just want to run checks in parallel :)

So, I'd vote for the pure java threads.

Contributor

soon commented May 31, 2017

@MEZk Right now I'm working on the switching between multi-threaded and single-threaded modes, without actual implementation. I'm going to start write the initial MT implementation in 1-2 days.

Actually, I haven't work with Akka or similar, but I'd prefer to use plain Java threads+executors. My argument is simple - actor model will introduce unnecessary complexity. We don't have to build distributed reactive checker (building responsive distributed systems is a feature noted in the docs), we just want to run checks in parallel :)

So, I'd vote for the pure java threads.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk May 31, 2017

Contributor

@soon
This is what I wanted to know. If you chose Akka, the CLI options and the implementation will be changed dramatically as you will not actually operate threads. If I were to choose, I would also opt for plain Java threads. Checkstyle is not so big and complex project to depend on Akka.

Contributor

MEZk commented May 31, 2017

@soon
This is what I wanted to know. If you chose Akka, the CLI options and the implementation will be changed dramatically as you will not actually operate threads. If I were to choose, I would also opt for plain Java threads. Checkstyle is not so big and complex project to depend on Akka.

@sabaka

This comment has been minimized.

Show comment
Hide comment
@sabaka

sabaka Jun 2, 2017

Contributor

I would vote for plain java threads.
@soon, if we are going to have several parts of checkstyle running in parallel, what exactly will be specified with this new option?

Contributor

sabaka commented Jun 2, 2017

I would vote for plain java threads.
@soon, if we are going to have several parts of checkstyle running in parallel, what exactly will be specified with this new option?

@sabaka sabaka moved this from ToDo to In Progress in Multi-thread mode for Java files processing Jun 2, 2017

@soon

This comment has been minimized.

Show comment
Hide comment
@soon

soon Jun 3, 2017

Contributor

So, I will use plain java threads.

if we are going to have several parts of checkstyle running in parallel, what exactly will be specified with this new option?

Actually, I don't know if we should provide two threads-number options. Probably, we should create three options:

  1. threads-number - when a user doesn't want to think about implementation details and just specifies the total number of threads. The checkstyle creates two thread pools each with a half of specified threads-number available threads.
  2. checker-threads-number - when a user want to control an number of threads inside a MultiThreadChecker pool.
  3. three-walker-threads-number - when a user want to control an number of threads inside a MultiThreadTreeWalker pool.

@sabaka @rnveach What do you think, will it be useful?

Contributor

soon commented Jun 3, 2017

So, I will use plain java threads.

if we are going to have several parts of checkstyle running in parallel, what exactly will be specified with this new option?

Actually, I don't know if we should provide two threads-number options. Probably, we should create three options:

  1. threads-number - when a user doesn't want to think about implementation details and just specifies the total number of threads. The checkstyle creates two thread pools each with a half of specified threads-number available threads.
  2. checker-threads-number - when a user want to control an number of threads inside a MultiThreadChecker pool.
  3. three-walker-threads-number - when a user want to control an number of threads inside a MultiThreadTreeWalker pool.

@sabaka @rnveach What do you think, will it be useful?

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 3, 2017

Contributor

@ps-sp
From my point if view, checker-threads-number and three-walker-threads-number forces users to understand Checkstyle architecture. If I were a potential user of the multi thread mode I would not want to know anything about TreeWalker and Checker.

Contributor

MEZk commented Jun 3, 2017

@ps-sp
From my point if view, checker-threads-number and three-walker-threads-number forces users to understand Checkstyle architecture. If I were a potential user of the multi thread mode I would not want to know anything about TreeWalker and Checker.

@soon

This comment has been minimized.

Show comment
Hide comment
@soon

soon Jun 3, 2017

Contributor

@MEZk Exactly, this is why I've added only one option now. I'd (as user) also don't want to care about implementation.

Contributor

soon commented Jun 3, 2017

@MEZk Exactly, this is why I've added only one option now. I'd (as user) also don't want to care about implementation.

soon added a commit to soon/checkstyle that referenced this issue Jun 4, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 4, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 4, 2017

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 5, 2017

Member

What do you think, will it be useful?

We do not know for now, but they will be useful for testing purposes for sure.
So if we keep them but do not document them it will be very good.
When user come to optimization of execution, they need to know all details of checkstyle implementation to get what they need.

Member

romani commented Jun 5, 2017

What do you think, will it be useful?

We do not know for now, but they will be useful for testing purposes for sure.
So if we keep them but do not document them it will be very good.
When user come to optimization of execution, they need to know all details of checkstyle implementation to get what they need.

@sabaka

This comment has been minimized.

Show comment
Hide comment
@sabaka

sabaka Jun 13, 2017

Contributor

I agree with Roman and would vote for separate flag for each mt level.
We don't need them documented, but it will be useful for testing - we will be able to tweak threads distribution manually before we make up how to force checkstyle do it.

Contributor

sabaka commented Jun 13, 2017

I agree with Roman and would vote for separate flag for each mt level.
We don't need them documented, but it will be useful for testing - we will be able to tweak threads distribution manually before we make up how to force checkstyle do it.

soon added a commit to soon/checkstyle that referenced this issue Jun 14, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 17, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 18, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 18, 2017

@sabaka sabaka moved this from In Progress to In Review in Multi-thread mode for Java files processing Jun 19, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 20, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 21, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 21, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 21, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 22, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 25, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 26, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 26, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 26, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 26, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 27, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 27, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 29, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 29, 2017

soon added a commit to soon/checkstyle that referenced this issue Jun 30, 2017

soon added a commit to soon/checkstyle that referenced this issue Jul 2, 2017

soon added a commit to soon/checkstyle that referenced this issue Jul 2, 2017

soon added a commit to soon/checkstyle that referenced this issue Jul 2, 2017

romani added a commit that referenced this issue Jul 3, 2017

@romani romani added the miscellaneous label Jul 3, 2017

@romani romani added this to the 8.1 milestone Jul 3, 2017

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jul 3, 2017

Member

I marked this update as miscellaneous to avoid it be visible to users in release notes.

Member

romani commented Jul 3, 2017

I marked this update as miscellaneous to avoid it be visible to users in release notes.

@romani romani closed this Jul 3, 2017

@sabaka sabaka moved this from In Review to Done in Multi-thread mode for Java files processing Jul 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment