New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-6763: Codegen optimization of SQL functions with constant values #1481
Conversation
@lushuifeng, thanks for the contribution! Could you please provide the time of execution of the query with and without your change? Please note, that Drill has its own framework for scalar replacement and passing value holders to the methods as arguments may break a scalar replacement for those holders. Also, Drill can generate too large classes. To avoid problems with the class constant pool overflow was implemented splitting the classes into the smaller ones. Is this change incorporates or at least does not break it? |
@vvysotskyi Just setters are added, I think others should work as before.
|
@lushuifeng, thanks, looks great! But could you please also provide results for JDK and Janino compilers with and without this feature. Are there any cases, where this feature breaks something, or cause performance degradation? If no, then no need to provide the option for disabling this functionality. Regarding unit tests, it may be written considering compiler limitation for method size - somehow may be passed huge string literals, so query should fail without these changes. This test should be sufficient. |
@vvysotskyi other tests will be provided later. |
If this fix just improves the performance without any consequences, then there is no need to disable it or specify an option for that. Regarding the unit test, generate a query with large string literals, so they will be specified in the methods in generated code, and query should fail with "code too large" compile exception. |
@vvysotskyi unit tests are added according to your advice. |
@lushuifeng regarding jdbc-all I guess we need to exclude unneeded libs from it to reduce the size or bump the size if there is nothing to exclude. I believe this effects all new PRs so this should be done in separate Jira prior to new PRs. |
@lushuifeng, |
PR #1168 is ready; the remaining part is some advanced testing as requested by @parthchandra (using SQL clients) to ensure that all runtime dependencies are there. It seems the Drill internal unit tests do not provide such guarantees. I didn't have the cycles to perform such testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lushuifeng, have you run all unit tests with your fix since Travis runs some of them?
@@ -135,6 +140,7 @@ public static MappingSet getDefaultMapping() { | |||
this.evaluationVisitor = eval; | |||
this.model = model; | |||
this.optionManager = optionManager; | |||
setters = Maps.newHashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace with the constructor.
generator.getEvalBlock().assign(out.getValue(), JExpr.lit(e.getLong())); | ||
return out; | ||
MajorType majorType = e.getMajorType(); | ||
JType holderType = generator.getHolderType(majorType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common logic from these methods can be moved to the separate one.
return new HoldingContainer(majorType, var, null, null); | ||
generator.declareSetterMethod( | ||
var, | ||
buffer -> ValueHolderHelper.getVarDecimalHolder(buffer, e.getBigDecimal().toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is not connected to this change, but it would be good to change ValueHolderHelper.getVarDecimalHolder
method to take BigDecimal
instance instead of String
since in all places, where it is used, the string is obtained from the BigDecimal
and inside this method BigDecimal
is created using this string.
return new HoldingContainer(majorType, var, null, null); | ||
generator.declareSetterMethod( | ||
var, | ||
buffer -> ValueHolderHelper.getVarCharHolder(buffer, e.value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace e.value
with e.getString()
.
@@ -72,6 +73,7 @@ public Hash64 getHash64(LogicalExpression[] hashFieldsExp, TypedFieldId[] hashFi | |||
ClassGenerator<Hash64> cg = codeGenerator.getRoot(); | |||
setupBuild64Hash(cg, GetHashIncomingBuildColMapping, recordBatch, hashFieldsExp, hashFieldIds); | |||
Hash64 hash64 = context.getImplementationClass(codeGenerator); | |||
CodeGenMemberInjector.injectMembers(cg, hash64, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method call should be added to every place where context.getImplementationClass()
method is called. It would be good to move injectMembers
inside of getImplementationClass()
if it is possible or somehow combine them.
Class.forName(method.params().get(0).type().binaryName())); | ||
setMethod.invoke(instance, setter.getValue().apply(context.getManagedBuffer())); | ||
} catch (Exception e) { | ||
Throwables.propagate(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwables.propagate()
is deprecated. Please replace it with the combination of Throwables.throwIfUnchecked(e)
and new RuntimeException(e)
as it is recommended in its javadoc.
But anyway, is it necessary to wrap unchecked exceptions into RuntimeException
? If no, then only ReflectiveOperationException
may be specified in the catch block and wrapped into RuntimeException
.
@Test | ||
public void testHugeStringConstantCompilation() throws Exception { | ||
try { | ||
testNoResult("alter session set `%s`='JDK'", ClassCompilerSelector.JAVA_COMPILER_OPTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace with setSessionOption()
here and below.
StringBuilder sb = new StringBuilder("select *\n") | ||
.append("from cp.`employee.json`\n") | ||
.append("where last_name ='") | ||
.append(longText).append("'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the second append
to the new line.
@vvysotskyi thanks for your suggestions. |
@vvysotskyi thanks for your comments. Changes are made as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lushuifeng, did you use mvn clean install
to run all the tests?
HoldingContainer out = generator.declare(e.getMajorType()); | ||
generator.getEvalBlock().assign(out.getValue(), JExpr.lit(e.getLong())); | ||
return out; | ||
JVar var = declareClassField(generator, e.getMajorType(), "long"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to move all this logic to the separate method and pass ClassGenerator
, MajorType
and Function
, for example, to receive something like this:
@Override
public HoldingContainer visitLongConstant(LongExpression e, ClassGenerator<?> generator) throws RuntimeException {
return getHoldingContainer(generator,
e.getMajorType(),
buffer -> ValueHolderHelper.getBigIntHolder(e.getLong()));
}
private HoldingContainer getHoldingContainer(ClassGenerator<?> generator,
MajorType majorType,Function<DrillBuf, ? extends ValueHolder> function) {
JType holderType = generator.getHolderType(majorType);
JVar var = generator.declareClassField("value", holderType);
generator.declareSetterMethod(
var,
function);
return createHoldingContainer(majorType, var);
}
@@ -228,4 +247,14 @@ public void testNestedLoopJoin() throws Exception { | |||
testNoResult("drop table if exists %s", tableName); | |||
} | |||
} | |||
|
|||
@Test | |||
public void testHugeStringConstantCompilation() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test for Janino also.
...ava-exec/src/main/java/org/apache/drill/exec/physical/impl/common/CodeGenMemberInjector.java
Show resolved
Hide resolved
@vvysotskyi Sorry for my misunderstanding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that tests will pass with -T 8
, but they should pass with the default threads count.
@@ -536,6 +543,19 @@ public JVar declareClassField(String prefix, JType t, JExpression init) { | |||
return clazz.field(JMod.NONE, t, prefix + index++, init); | |||
} | |||
|
|||
public JMethod declareSetterMethod(JVar var, Function<DrillBuf, ? extends ValueHolder> function) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Javadoc here also.
the tests without |
@lushuifeng a good practice to have clean run on master and then on the branch with your changes thus you will know the reason of the failures. Try to run tests on master first. |
exec/jdbc-all/pom.xml
Outdated
@@ -511,7 +511,7 @@ | |||
This is likely due to you adding new dependencies to a java-exec and not updating the excludes in this module. This is important as it minimizes the size of the dependency of Drill application users. | |||
|
|||
</message> | |||
<maxsize>39000000</maxsize> | |||
<maxsize>39100000</maxsize> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase onto the master and revert this change, since there was PR, were excluded some classes from the JAR.
@arina-ielchiieva thanks for your advice. |
This fix causes the next unit tests failures:
|
There is no such tests on master, but also failed with the same errors on branch DRILL-6763. |
@lushuifeng, since these tests failures are absent on the latest master, they caused by your changes. Could you please fix them? Also, some of the failures in |
@vvysotskyi thanks for your testing, I'm working on this, it may take some time |
@vvysotskyi Could you please take a look at these changes?
I've just found that previous errors in my tests |
@lushuifeng, I have a question regarding nested splitting classes: these setters will be added only to the top level class, or they will be separated between both: top and nested classes? |
Both top and nested classes will be added since the new class member is in inner class if there is a inner class, the inner class member is initialized by invoking top class member function: top: |
@lushuifeng, then this change will break code compilation for the case when a lot of constants were used. Previously, these constants were separated between the outer and nested classes. [1] https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.6 |
I agree with you, it will cause the overflow of the class constant pool in extreme case, that is to say a projection or a filter contains thousands (or maybe larger) of constants The good news is all tests in I'd do some tests to find out how much constants will cause the overflow of the class constant pool. |
I don't think that this is a good idea to add an option. In the case when we have a combination of these two problems, option won't help. I think it is possible to enhance the way for collecting setter methods for example to map full [nested] class names and its setters, so after that receive nested class instances and call setters for them in |
In order to collect setter methods of nested class, some changes will be made:
What is your suggestions? @vvysotskyi thanks. |
@lushuifeng, I meant to replace |
I think it is the way I'm doing besides you wrap two params (DrillBuf and String) into a Pair for the first point. The method in |
@vvysotskyi All tests have been passed, could you please take a look? thanks! |
* @return the depth of nested class, setter method | ||
*/ | ||
public Pair<Integer, JMethod> declareSetterMethod(JVar var, Function<DrillBuf, ? extends ValueHolder> function) { | ||
JMethod setter = clazz.method(JMod.PUBLIC, void.class, "set" + StringUtils.capitalize(var.name())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the initialization to the else block below and put code which uses this var after the if...else
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is It better by NOT declaring the setter function, these fields can be accessed by reflection directly, so the number of constants in constant pool can be reduced
@@ -646,7 +687,7 @@ public void preparePlainJava() { | |||
Class<?> p = params[i]; | |||
childNew.arg(shim.param(model._ref(p), "arg" + i)); | |||
} | |||
shim.body()._return(childNew); | |||
shim.body()._return(JExpr._this().invoke("injectMembers").arg(childNew)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The innerClasses like BatchHolder may have the constants, BatchHolder instance is created via newBatchHolder dynamically when processing recordBatch, that is the difference between innerClasses and nested splitting classes.
The innerClasses member fields is initialized by invoking injectMembers(innerClassInstance) which is override generated by this code as shown in
protected BatchHolder newBatchHolder(int batchRowCount) { return this.injectMembers(new BatchHolder(batchRowCount)); }
} | ||
|
||
protected BatchHolder injectMembers(BatchHolder batchHolder) { | ||
CodeGenMemberInjector.injectMembers(cg, batchHolder, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move this method call to the place, general for other similar classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClassGenerator and FragmentContext is needed in this method for general cases, but not for test classes, see example in ExampleTemplateWithInner.java
newBatchHolder is override by generated code can be implemented as follows to remove above method
protected BatchHolder newBatchHolder(int batchRowCount) { return CodeGenMemberInjector.injectMembers(cg, new BatchHolder(batchRowCount), context); }
but the cg
and context
params are hard coded, not flexable. I think it is better to make a function call to hide the implementation which may differ in each template.
*/ | ||
public static void injectMembers(ClassGenerator<?> cg, Object instance, FragmentContext context) { | ||
Map<Integer, Object> cachedInstances = new HashMap<>(); | ||
for (Map.Entry<Pair<Integer, JMethod>, Function<DrillBuf, ? extends ValueHolder>> setter : cg.getSetters().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may change the type of collection which used for storing setter methods. Can be used Guava's Table, so we will be able to omit usage of cachedInstances
map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't quite follow you since the instance is created somewhere else not in setter collection
@@ -87,6 +95,8 @@ | |||
private LinkedList<SizedJBlock>[] blocks; | |||
private LinkedList<SizedJBlock>[] oldBlocks; | |||
|
|||
private JVar innerClassField; | |||
|
|||
/** | |||
* Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that with introducing new methods in the class, this formula should be revised to take them into account.
@vvysotskyi setter method have been removed, the formula of DRILL-6722 seems to be resolved with this change by accident, the root cause is not found yet. Test |
@vvysotskyi Could you please take a look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lushuifeng, overall looks good, just single comment.
Please squash commits and rebase onto the master.
@@ -151,7 +161,7 @@ public static MappingSet getDefaultMapping() { | |||
// from the JDK Modifier values to the JCodeModel JMod values: the | |||
// values are different. | |||
|
|||
int mods = JMod.PRIVATE + JMod.FINAL; | |||
int mods = JMod.PUBLIC + JMod.FINAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are this change and change below required for the current approach (without using setter methods for inner classes)?
ceeebba
to
c9abbf7
Compare
c9abbf7
to
cf896b1
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this issue, +1
Thank you for your review and comments. |
Details in DRILL-6763:
Here is the descriptions of the change:
1. add system optionexec.optimize_function_compilation
to toggle to state of this functionality.2. codegen is changed by declaring setter method in EvaluationVisitor#visitXXXconstants
3. the member declared in step2 is initialized when the instance of the class is created in XXBatch and others
4. attachment is the code of the same query mentioned in DRILL-6763 generated by setting the value of exec.optimize_function_compilation to true
@arina-ielchiieva @vdiravka Would you please take a look?
What kind of unit tests should be added?
query.txt