[SPARK-21038][SQL] Reduce redundant generated init code in Catalyst codegen #18255
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In Java, instance fields are guaranteed to be first initialized to their corresponding default values (zero values) before the constructor is invoked. Thus, explicitly code to initialize fields to their zero values is redundant and should be avoided.
It's usually harmless to have such code in hand-written constructors, but in the case of mechanically generating code, such code could contribute to a significant portion of the code size and cause issues.
This ticket is a step in reducing the likelihood of hitting the 64KB bytecode method size limit in the Java Class files. This PR uses simple heuristics to filter out redundant code of initializing mutable state to their default (zero) values in
CodegenContext.addMutableState, by matching string patterns. Basically, if theinitCodein the added mutable state is like:(where
[this.]indicates it's optional)Then it'll be replaced by an empty string instead.
This pattern will catch the most common cases. An example of drawn from production is:
where all of the
isNullNNN = false;initialization code is redundant.Alternatives are:
addMutableState()call sites, and change them to empty string. That's tedious and involves changing a lot of code. But if we do it this way, we could considering givinginitCodean default value of""in the declaration ofaddMutableState(), which could be a nice improvement too.How was this patch tested?
Ran SQL and Catalyst unit tests. Also added some unit tests for the new filtering heuristic.