Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Issue #4957: Add MT TreeWalker implementation #4958

Closed
wants to merge 1 commit into from

Conversation

soon
Copy link
Contributor

@soon soon commented Aug 17, 2017

Issue #4957

This PR adds the MT mode for TreeWalker fileset check.

WIP untill I implement tests and prepare performance report for changes

@romani
Copy link
Member

romani commented Aug 31, 2017

we released 8.2 version, please rebase all your PRs to our latest master to avoid CI problems

@rnveach
Copy link
Member

rnveach commented Sep 25, 2017

@soon ping
Since there is only 1 commit, I assume this PR can go by itself. Please rebase and make CI green.
If this PR relies on something us, please let us know.

@shalugin
Copy link

@soon Can you rebase your PR?

@soon
Copy link
Contributor Author

soon commented Feb 11, 2018

@shalugin I have rebased it to master, though I did not run tests on it and therefore current implementation might not work. I will fix these issues a bit later, thank you in advance for understanding.

@checkstyle checkstyle deleted a comment Feb 11, 2018
@checkstyle checkstyle deleted a comment Feb 11, 2018
@romani
Copy link
Member

romani commented Jun 1, 2018

@soon , sorry for long silence from my side, please rebase your PR it would help me a lot.

@romani
Copy link
Member

romani commented Jun 13, 2018

this model is to fill pool with tasks(features) on each token to visit - enormous futures emission (worse scenario: amount of futures = amount of AST nodes multiplied by amount of Checks). Not good to my mind.

If we keep in each Checkers thread separate tree of Checks instances, all we need is to split Checks in groups and do traverse of Tree in parallel in several threads , where each thread will contain separate Check instance. The only nuance here is that some Checks/Modules could be related to each by statics reference hacks - we did not supported that well before ans we will not support this now. If such Checks exists whey should be continued to be execution in separate configuration and separate execution in single-thread Checker.
After all thread are finished, we do filtering in single (do filtering in multiple threads is possible but could be done as separate step)

as separate step (idea for future): ideally to spread heavy Checks between groups, but we do not know how heavy is Check, we can measure our Checks only and give them some abstract values 0-9 by annotation to give splitter hints on how to make group that should finish in kind of the same time, to minimize time of waiting.

@pbludov
Copy link
Member

pbludov commented Jun 29, 2018

this model is to fill pool with tasks(features) on each token to visit - enormous futures emission (worse scenario: amount of futures = amount of AST nodes multiplied by amount of Checks). Not good to my mind.

@soon I think this can be easily measured. Just run the single-threaded (current) version. Then the multi-threaded version from this patch. With exactly one thread. The difference will be the overhead of the pool of tasks. Please share the results. Perhaps the overhead is so small that it can be ignored.

@rnveach
Copy link
Member

rnveach commented Jun 29, 2018

I have never been a supporter that TreeWalker by itself can be multi-threaded.

Multi-threading the visitToken call will not work well.
Each check expects visit token to be called in order of the tree. Placing all visitToken calls to different threads can lead to the tree being called out of order for a single check, disrupting it's logic.
To be able to support this, each check would need to keep in memory what it needs and only do violations after it is given all the information, if at all. This very similar to how multi-file checks will behave. So every AbstractCheck would need to be re-written to be like a mult-file check with this implementation.

If we have a free thread to spawn a visitToken or any other split, then we should be spawning a new File thread instead. It is parsing the file that leads to the current slowdown, not traversing the tree or working the checks.

@romani
Copy link
Member

romani commented Jun 30, 2018

Multi-threading the visitToken call will not work well. .... It is parsing the file that leads to the current slowdown, not traversing the tree or working the checks.

I do not support such MT model also. We definitely need to focus on threading Checker first.
But I am still want to allow Treewalker to be MT-ed, BUT it could be done easily later on if we have time or ..... .
As non of us support model proposed in this PR, I suggest to close this PR and focus on Checker MT discussion.

@rnveach , if you agree, please close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants