Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Checker Cache and TranslationCheck conflict #3539
TranslationCheck looks at multiple files, and only does validation ONLY after all the files are given to it.
Processing 1 file and skipping others will cause it to falsely believe that files are missing and report excess violations, even when no violations on a non-cache run exist. This is because we don't guess were the files are, we use the files reported to the check to know where they are and if they exist.
Steps to reproduce on CS locally:
Expected: no violations.
Commit 441d2d3 need to be reverted as fix for issue is provided.
Original problem was reported and discussed at: #3493
This is unfortunate mis-design of TranslationCheck, we can not fix it easily.
There are no problems with validation logic of this Check if validation is launched on just cloned sources (no existing cache scenario).
This issue could be fixed only after "Support multi-file validation Checks in Checkstyle" - #3540 .
referenced this issue
Jun 14, 2018
AS all Checks will have annotations (stateless, filestateful, global), all multi-file Checks have to be "global"
Small summary: Global Checks never skipped by cache. If some Checks existed they most-likely working withtout cache, so it will not be problem for them. The only drawback for vast majority of thirparty Checks as they will stop benefit from cache. This will be unlikely noticeable by user quickly so update to have annotations will take a lot of time and most of Checks never upgrade.
@rnveach , does it make sense ?
I understand what you are saying but I don't agree with it.
This is no different then your MT mode suggestion by separating checks into a different, unsupported group. It is just another workaround and not real support. Java checks will definitely see a difference as now every run will be the exact same as running without the cache. Placing annotation on check will not help, especially for multi-file checks which we are now starting to support, because the annotation they must have will always skip cache by your suggestion.
My proposed implementation can be seen at https://groups.google.com/forum/#!topic/checkstyle-devel/a7S2PePHG0I . It supports multi-file Java checks with cache support. To do this, every multi-file check must be cache aware and maintain its own cache for files it is skipping over.
Even translation check would need something like this to avoid reading the file twice.
we are extending to completely new sector of validations, and it is very clear that we will benefit from it that much, do change our API to seriously.
standard simple Checks will continue benefit from cache. Who want to benefit more should be updated to have annotation.
placing annotation will help for stateless, filestateful Checks. For Global and multi-file no benefit.
Custom Checks can do whatever they want to resolve speed problems. We do not know what they might need for caching and what matter for them.
I tried to get your idea .... not sure ... Looks like your major concern is that Checker is actually reading while file and give content to Checks(here). I already showed that this penalty is close to nothing. But yes, it is might be not ideal. Especially then Check do know better what to do with file and how to read it. I am ok to extend API of AbstractFileSetCheck to have method like
So TranslationCheck can override
We have have small penalty for all, some java Checks do not need parsing, and could quickly be done by own read much faster, but we put them after java parsing just to be easier for us to maintain and easy to extend in future. http://checkstyle.sourceforge.net/config_sizes.html#OuterTypeNumber could be done enormously quicker in custom parsing, but we do not do this. http://checkstyle.sourceforge.net/config_sizes.html#FileLength is also not ideal for performance.
They cannot because they do not control
Yes this is correct. This is why my example code had
You are only looking at penalties for FileSets. Not AbstractChecks. My main concern is them, not FileSets, as they are where we take the most time with.
There shouldn't be any reason Checks need to re-read file. TranslationCheck breaks that at https://github.com/checkstyle/checkstyle/blob/master/src/main/java/com/puppycrawl/tools/checkstyle/checks/TranslationCheck.java#L482 because of the lack of true multi-file support.
This is because they were incorrectly made as AbstractChecks and should be FileSet checks. We have another issue to break compatibility of one such check. There shouldn't be any AbstractChecks that doesn't examine the Java code in some way,
No, it is examining the Java tree so it should be using the Java parser. If the parser is slow, then we should examine what can be done to make it faster.
It is not a java check and it's code looks fine.
I do not support this. This is again picking Checks to support and ignoring others. Java checks was the reason I wanted cache enabled in the project as it was taking minutes for a local run for me on my laptop.
If you want to know the worst offenders use https://github.com/rnveach/checkstyle/commits/more_audits to see real numbers.
Using our own project on my work laptop, full run time was 48.3 seconds.
The worst FileSet is TreeWalker with 27.3 seconds. Runners up are RegexpSinglelineCheck (0.47 seconds), JavadocPackageCheck (0.47 seconds), RegexpMultilineCheck (0.38 seconds).
The worst AbstractCheck is UnusedImportsCheck with 4.3 seconds. Runners up are IndentationCheck (0.56 seconds), CommentsIndentationCheck (0.55 seconds), ParenPadCheck (0.29 seconds).
Java parser took 2.7 seconds total. Javadoc parser took 13.5 seconds total.
How to multi-file(MF) Java Check is going to work while cache is activated?
so as soon it is MF Java Check, it is submodule of TreeWalker where parsing is actually done.
New MF Check: make sure that in all classes *Domain.java, there is serialVersionUID field
MF Checks can do validation only at the end of execution.
In other words, MF Checks are Checks that are working on some set of files.
So, as summary, MF Check need whole set of files to make a decision.
By limitation of Checker we cannot give Check whole set of files in one call.
ATTENTION: we already ignore situation that some Check might change between execution !!!
Lets come back to caching for MF.....
MF implementations are complicated by design and we can not predict what they will need.
MF Check needs calls on each file, and should do decision them-self on skip
WE SHOULD NOT SUPPORT MF Check in TreeWalker, at least for now.
WE SHOULD LET USERS to write MF FileSet Checks, and user want he can reuse Parser in his MF FileSet Check
I am not closing the door for MF Java Checks, I just want to make first step in MF support - for now
referenced this issue
Jul 6, 2018
Once FileSets support it, all TreeWalker has to do is chain it to the checks, the same way Checker chains things to the FileSets.
I'm not sure what you mean by the first part. MF AbstractChecks can cache ASTs however they want, Checker doesn't need to be involved in that and we don't have to cache everything. All checker needs to supply is what files is cached, if the cache is erased, and what is skipped.
Checker's cache and file order don't need to change. The cache just needs to talk with the modules more before deciding if a file is skipped or not.
I have been using my own MF AbstractChecks. My most recent check was ensuring all setter calls for a class are defined in the same order as they are defined in the class. My project uses alot of holders (POJOs) and I prefer they be created in the order of precedence which is what the fields are ordered by.
Nothing that was written specified how the cache will be made 'better'.
MF files implementation:
While debate is hot .... few more nuances:
We just need to have global cache of ASTs, all MF Checks get all calls on each file(not affected by single file cache), all MF Checks should be FileSets with ability to connect somehow to global cache of ASTs, if AST is not present file is parsed, stay in cache and returned to module. In this case we will cover all cases that MF might come up. It will consume a lot of memory .... but it is the only way out for all cases MF support.
Just did a testing overs checkstyle src/main folder, commit.
result in extra ~580M for 364 files:
So user will always have a choice to speedup execution when MF are used: use MT mode, allocate more memory to execution, split MF Checks to separate heavy execution (for non-on-commit validation).