Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6763: Codegen optimization of SQL functions with constant values #1481

Closed
wants to merge 1 commit into from

Conversation

lushuifeng
Copy link
Contributor

@lushuifeng lushuifeng commented Sep 26, 2018

Details in DRILL-6763:

Here is the descriptions of the change:
1. add system option exec.optimize_function_compilation to toggle to state of this functionality.
2. codegen is changed by declaring setter method in EvaluationVisitor#visitXXXconstants
3. the member declared in step2 is initialized when the instance of the class is created in XXBatch and others
4. attachment is the code of the same query mentioned in DRILL-6763 generated by setting the value of exec.optimize_function_compilation to true
@arina-ielchiieva @vdiravka Would you please take a look?
What kind of unit tests should be added?
query.txt

@vvysotskyi
Copy link
Member

vvysotskyi commented Sep 26, 2018

@lushuifeng, thanks for the contribution!

Could you please provide the time of execution of the query with and without your change?

Please note, that Drill has its own framework for scalar replacement and passing value holders to the methods as arguments may break a scalar replacement for those holders.

Also, Drill can generate too large classes. To avoid problems with the class constant pool overflow was implemented splitting the classes into the smaller ones. Is this change incorporates or at least does not break it?

@lushuifeng
Copy link
Contributor Author

@vvysotskyi Just setters are added, I think others should work as before.
comparision
The setup time of Filter changes with or without my change, following my step:

  1. toggle the state of exec.optimize_function_compilation
  2. query with sql, this one has no cache(the code generated by different state of exec.optimize_function_compilation is not the same):
    SELECT full_name, hire_date FROM cp.employee.json where last_name = 'Bernard' and hire_date >= '1990-01-01 00:00:00.0';
  3. change the constants in sql, hire_date is changed:
    SELECT full_name, hire_date FROM cp.employee.json where last_name = 'Bernard' and hire_date >= '1990-01-02 00:00:00.0';
  4. see the profile of the second time in the above

@vvysotskyi
Copy link
Member

@lushuifeng, thanks, looks great!

But could you please also provide results for JDK and Janino compilers with and without this feature.
And it would be good to know the results for other literals, except strings and more complex queries.

Are there any cases, where this feature breaks something, or cause performance degradation? If no, then no need to provide the option for disabling this functionality.

Regarding unit tests, it may be written considering compiler limitation for method size - somehow may be passed huge string literals, so query should fail without these changes. This test should be sufficient.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi other tests will be provided later.
This change have been tested in out prod env for a month, no failure and performance degradation are found as far as I know.
Is it better to provide the option to disable this feature by setting the default value to true?
Could you please give more details about the unit tests?

@vvysotskyi
Copy link
Member

vvysotskyi commented Sep 26, 2018

If this fix just improves the performance without any consequences, then there is no need to disable it or specify an option for that.
But in any case, it is useful to see numbers.

Regarding the unit test, generate a query with large string literals, so they will be specified in the methods in generated code, and query should fail with "code too large" compile exception.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi unit tests are added according to your advice.
What should I do due to the error in CI of drill-jdbc-all-1.15.0-SNAPSHOT.jar size (39000409) too large. Max. is 39000000 , the max size is changed temporarily, is it appropriate to change in the PR?
The option will be kept for a while to provide more sql tests

@arina-ielchiieva
Copy link
Member

@lushuifeng regarding jdbc-all I guess we need to exclude unneeded libs from it to reduce the size or bump the size if there is nothing to exclude. I believe this effects all new PRs so this should be done in separate Jira prior to new PRs.

@vvysotskyi
Copy link
Member

@lushuifeng, jdbc-all jar size will be reduced in the scope of #1168.
@sachouche, what is the status of that PR?

@sachouche
Copy link
Contributor

PR #1168 is ready; the remaining part is some advanced testing as requested by @parthchandra (using SQL clients) to ensure that all runtime dependencies are there. It seems the Drill internal unit tests do not provide such guarantees. I didn't have the cycles to perform such testing.

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lushuifeng, have you run all unit tests with your fix since Travis runs some of them?

@@ -135,6 +140,7 @@ public static MappingSet getDefaultMapping() {
this.evaluationVisitor = eval;
this.model = model;
this.optionManager = optionManager;
setters = Maps.newHashMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace with the constructor.

generator.getEvalBlock().assign(out.getValue(), JExpr.lit(e.getLong()));
return out;
MajorType majorType = e.getMajorType();
JType holderType = generator.getHolderType(majorType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common logic from these methods can be moved to the separate one.

return new HoldingContainer(majorType, var, null, null);
generator.declareSetterMethod(
var,
buffer -> ValueHolderHelper.getVarDecimalHolder(buffer, e.getBigDecimal().toString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is not connected to this change, but it would be good to change ValueHolderHelper.getVarDecimalHolder method to take BigDecimal instance instead of String since in all places, where it is used, the string is obtained from the BigDecimal and inside this method BigDecimal is created using this string.

return new HoldingContainer(majorType, var, null, null);
generator.declareSetterMethod(
var,
buffer -> ValueHolderHelper.getVarCharHolder(buffer, e.value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace e.value with e.getString().

@@ -72,6 +73,7 @@ public Hash64 getHash64(LogicalExpression[] hashFieldsExp, TypedFieldId[] hashFi
ClassGenerator<Hash64> cg = codeGenerator.getRoot();
setupBuild64Hash(cg, GetHashIncomingBuildColMapping, recordBatch, hashFieldsExp, hashFieldIds);
Hash64 hash64 = context.getImplementationClass(codeGenerator);
CodeGenMemberInjector.injectMembers(cg, hash64, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method call should be added to every place where context.getImplementationClass() method is called. It would be good to move injectMembers inside of getImplementationClass() if it is possible or somehow combine them.

Class.forName(method.params().get(0).type().binaryName()));
setMethod.invoke(instance, setter.getValue().apply(context.getManagedBuffer()));
} catch (Exception e) {
Throwables.propagate(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwables.propagate() is deprecated. Please replace it with the combination of Throwables.throwIfUnchecked(e) and new RuntimeException(e) as it is recommended in its javadoc.

But anyway, is it necessary to wrap unchecked exceptions into RuntimeException? If no, then only ReflectiveOperationException may be specified in the catch block and wrapped into RuntimeException.

@Test
public void testHugeStringConstantCompilation() throws Exception {
try {
testNoResult("alter session set `%s`='JDK'", ClassCompilerSelector.JAVA_COMPILER_OPTION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace with setSessionOption() here and below.

StringBuilder sb = new StringBuilder("select *\n")
.append("from cp.`employee.json`\n")
.append("where last_name ='")
.append(longText).append("'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the second append to the new line.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi thanks for your suggestions.
I didn't know that Travis only run some of the test.
I'm sorry I've not run all the tests yet, I would like to run them as soon as possible

@lushuifeng
Copy link
Contributor Author

lushuifeng commented Oct 8, 2018

@vvysotskyi thanks for your comments. Changes are made as you suggested.
As you mentioned early, some tests are made to test this functionality:
Both jdk and janino compiler works fine.
Constants with different types(string, double, long, decimal, time, timestamp and so on) are tested.
But some tests beside the Travis are not passed, a lot of No default schema selected errors occured, it seems that these errors are not connected to this change.

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lushuifeng, did you use mvn clean install to run all the tests?

HoldingContainer out = generator.declare(e.getMajorType());
generator.getEvalBlock().assign(out.getValue(), JExpr.lit(e.getLong()));
return out;
JVar var = declareClassField(generator, e.getMajorType(), "long");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to move all this logic to the separate method and pass ClassGenerator, MajorType and Function, for example, to receive something like this:

    @Override
    public HoldingContainer visitLongConstant(LongExpression e, ClassGenerator<?> generator) throws RuntimeException {
      return getHoldingContainer(generator,
          e.getMajorType(),
          buffer -> ValueHolderHelper.getBigIntHolder(e.getLong()));
    }

    private HoldingContainer getHoldingContainer(ClassGenerator<?> generator,
        MajorType majorType,Function<DrillBuf, ? extends ValueHolder> function) {
      JType holderType = generator.getHolderType(majorType);
      JVar var = generator.declareClassField("value", holderType);
      generator.declareSetterMethod(
          var,
          function);
      return createHoldingContainer(majorType, var);
    }

@@ -228,4 +247,14 @@ public void testNestedLoopJoin() throws Exception {
testNoResult("drop table if exists %s", tableName);
}
}

@Test
public void testHugeStringConstantCompilation() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test for Janino also.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi Sorry for my misunderstanding.
yes, simply ran with mvn -T 8 clean install, the tests cann't be passed.

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that tests will pass with -T 8, but they should pass with the default threads count.

@@ -536,6 +543,19 @@ public JVar declareClassField(String prefix, JType t, JExpression init) {
return clazz.field(JMod.NONE, t, prefix + index++, init);
}

public JMethod declareSetterMethod(JVar var, Function<DrillBuf, ? extends ValueHolder> function) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Javadoc here also.

@lushuifeng
Copy link
Contributor Author

the tests without -T 8 still failed.
Could you please run one of my failed tests: org.apache.drill.exec.impersonation.TestImpersonationMetadata#testShowFilesInWSWithNoPermissionsForQueryUser

@arina-ielchiieva
Copy link
Member

@lushuifeng a good practice to have clean run on master and then on the branch with your changes thus you will know the reason of the failures. Try to run tests on master first.

@@ -511,7 +511,7 @@
This is likely due to you adding new dependencies to a java-exec and not updating the excludes in this module. This is important as it minimizes the size of the dependency of Drill application users.

</message>
<maxsize>39000000</maxsize>
<maxsize>39100000</maxsize>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase onto the master and revert this change, since there was PR, were excluded some classes from the JAR.

@lushuifeng
Copy link
Contributor Author

lushuifeng commented Oct 9, 2018

@arina-ielchiieva thanks for your advice.
The tests are not passed with mvn clean install on master, got the same errors as on branch DRILL-6763.
Single Test TestImpersonationMetadata#testShowFilesInWSWithNoPermissionsForQueryUser failed as on the branch DRILL-6763.

@vvysotskyi
Copy link
Member

This fix causes the next unit tests failures:

Failed tests: 
  TestLargeFileCompilation.testJDKHugeStringConstantCompilation:257->BaseTestQuery.resetSessionOption:375 Failed to reset session option `exec.java_compiler`, Error: org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException: Attempted to send a message when connection is no longer valid.

Query submission to Drillbit failed.

[Error Id: 6a397072-825d-4550-a39e-ea4f6eafdc6d ]

Tests in error: 
  TestLargeFileCompilation.testProject:191->BaseTestQuery.testNoResult:384 » Rpc
  TestLargeFileCompilation.testTOP_N_SORT:177->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384 » Rpc
  TestLargeFileCompilation.testEXTERNAL_SORT:171->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384 » Rpc
  TestLargeFileCompilation.testPARQUET_WRITER:156->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384 » Rpc
  TestLargeFileCompilation.testHashJoin:206->BaseTestQuery.testNoResult:358->BaseTestQuery.testNoResult:384 » Rpc
  TestHashJoin.testHashJoinExprInCondition:252 » Rpc org.apache.drill.common.exc...

Tests run: 3545, Failures: 1, Errors: 6, Skipped: 156

@lushuifeng
Copy link
Contributor Author

There is no such tests on master, but also failed with the same errors on branch DRILL-6763.
Comment-out these two new tests and rerun all tests in TestLargeFileCompilation got the same error as above

@vvysotskyi
Copy link
Member

@lushuifeng, since these tests failures are absent on the latest master, they caused by your changes. Could you please fix them? Also, some of the failures in TestLargeFileCompilation may be caused by the previously failed unit tests from this class.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi thanks for your testing, I'm working on this, it may take some time

@lushuifeng
Copy link
Contributor Author

@vvysotskyi Could you please take a look at these changes?

commit 8b44957 fix the compilation error if there are nested splitting classes
commit 5edecc4 and 413891a fix the compilation error of the innerClasses like BatchHolder and OutgoingRecordBatch contain constant values

I've just found that previous errors in my tests mvn clean install are because of there is a DOT in my user name

@vvysotskyi
Copy link
Member

@lushuifeng, I have a question regarding nested splitting classes: these setters will be added only to the top level class, or they will be separated between both: top and nested classes?

@lushuifeng
Copy link
Contributor Author

Both top and nested classes will be added since the new class member is in inner class if there is a inner class, the inner class member is initialized by invoking top class member function:

top:
public void setConstant5(IntHolder intHolder) { this.innerClassField.setConstant5(intHolder) }
inner:
public void setConstant5(IntHolder intHolder) { this.constant5 = intHolder; }

@vvysotskyi
Copy link
Member

@lushuifeng, then this change will break code compilation for the case when a lot of constants were used.

Previously, these constants were separated between the outer and nested classes.
But with these changes, additional items in the class constant pool will be used for setter methods[1], and in the outer class will be present all setters, so it will cause the overflow of the class constant pool.

[1] https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.6

@lushuifeng
Copy link
Contributor Author

I agree with you, it will cause the overflow of the class constant pool in extreme case, that is to say a projection or a filter contains thousands (or maybe larger) of constants

The good news is all tests in TestLargeFileCompilation are succeeded with this change, shall we add an option to disable this functionality or maybe there is a good way to avoid?

I'd do some tests to find out how much constants will cause the overflow of the class constant pool.

@vvysotskyi
Copy link
Member

I don't think that this is a good idea to add an option. In the case when we have a combination of these two problems, option won't help.

I think it is possible to enhance the way for collecting setter methods for example to map full [nested] class names and its setters, so after that receive nested class instances and call setters for them in CodeGenMemberInjector.injectMembers().

@lushuifeng
Copy link
Contributor Author

In order to collect setter methods of nested class, some changes will be made:

  1. public void setConstant4(IntHolder constant4) will be changed to public void setConstant4(IntHolder constant4, String name), the second param name is to store the name of the first param, here is the string "constant4", unfortunately the second param is useless unless there is a nested class.
  2. methods will be collected during __DRILL_INIT__, Lambda is not supported in JCodeModel 2.6, it seems that anonymousClass in JCodeModel 2.6 can't be narrowed, the <ValueHolder, String> is missing. some casts have to be added
Map<String, BiConsumer<ValueHolder, String>> nestedClassFunctions = new HashMap();
this.nestedClassFunctions.put("constant355", new BiConsumer<ValueHolder, String>() {
     @Override
      public void accept(ValueHolder constant355, String name) {
         (innerClassField).setConstant355(((IntHolder) constant355), "constant355");
      }
 });
  1. InnerClassField is referenced in anonymousClass, keyword FINAL should be added to its mods, so it should be initialized in constructor not in __DRILL_INIT__

What is your suggestions? @vvysotskyi thanks.

@vvysotskyi
Copy link
Member

vvysotskyi commented Oct 12, 2018

@lushuifeng, I meant to replace
Map<JMethod, Function<DrillBuf, ? extends ValueHolder>> setters
with something like this:
Map<JMethod, Pair<String, Function<DrillBuf, ? extends ValueHolder>>> setters,
where the String is the name of nested class field, or the name, which corresponds to the full path to this class, for example like this: nestedClassField.nestedNestedClassField.nestedNestedNestedClassField. So in the CodeGenMemberInjector.injectMembers(), required nested class instance may be received using its name and setter method may be called on it.

@lushuifeng
Copy link
Contributor Author

I think it is the way I'm doing besides you wrap two params (DrillBuf and String) into a Pair for the first point. The method in CodeGenMemberInjector.injectMembers() will get two params instead of old one.
Have you tried about anonymousClass in JCodeModel 2.6, it is a little confused to me that the narrow function not work correctly.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi All tests have been passed, could you please take a look? thanks!

* @return the depth of nested class, setter method
*/
public Pair<Integer, JMethod> declareSetterMethod(JVar var, Function<DrillBuf, ? extends ValueHolder> function) {
JMethod setter = clazz.method(JMod.PUBLIC, void.class, "set" + StringUtils.capitalize(var.name()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the initialization to the else block below and put code which uses this var after the if...else block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is It better by NOT declaring the setter function, these fields can be accessed by reflection directly, so the number of constants in constant pool can be reduced

@@ -646,7 +687,7 @@ public void preparePlainJava() {
Class<?> p = params[i];
childNew.arg(shim.param(model._ref(p), "arg" + i));
}
shim.body()._return(childNew);
shim.body()._return(JExpr._this().invoke("injectMembers").arg(childNew));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what is the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The innerClasses like BatchHolder may have the constants, BatchHolder instance is created via newBatchHolder dynamically when processing recordBatch, that is the difference between innerClasses and nested splitting classes.

The innerClasses member fields is initialized by invoking injectMembers(innerClassInstance) which is override generated by this code as shown in
protected BatchHolder newBatchHolder(int batchRowCount) { return this.injectMembers(new BatchHolder(batchRowCount)); }

}

protected BatchHolder injectMembers(BatchHolder batchHolder) {
CodeGenMemberInjector.injectMembers(cg, batchHolder, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this method call to the place, general for other similar classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClassGenerator and FragmentContext is needed in this method for general cases, but not for test classes, see example in ExampleTemplateWithInner.java

newBatchHolder is override by generated code can be implemented as follows to remove above method
protected BatchHolder newBatchHolder(int batchRowCount) { return CodeGenMemberInjector.injectMembers(cg, new BatchHolder(batchRowCount), context); }
but the cg and context params are hard coded, not flexable. I think it is better to make a function call to hide the implementation which may differ in each template.

*/
public static void injectMembers(ClassGenerator<?> cg, Object instance, FragmentContext context) {
Map<Integer, Object> cachedInstances = new HashMap<>();
for (Map.Entry<Pair<Integer, JMethod>, Function<DrillBuf, ? extends ValueHolder>> setter : cg.getSetters().entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change the type of collection which used for storing setter methods. Can be used Guava's Table, so we will be able to omit usage of cachedInstances map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't quite follow you since the instance is created somewhere else not in setter collection

@@ -87,6 +95,8 @@
private LinkedList<SizedJBlock>[] blocks;
private LinkedList<SizedJBlock>[] oldBlocks;

private JVar innerClassField;

/**
* Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that with introducing new methods in the class, this formula should be revised to take them into account.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi setter method have been removed, the formula of maxIndex should be same as before.

DRILL-6722 seems to be resolved with this change by accident, the root cause is not found yet.

Test TestLargeFileCompilation#testProject has 10% improvement in the running time compare to master without cached classes.

@lushuifeng
Copy link
Contributor Author

@vvysotskyi Could you please take a look, thanks.

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lushuifeng, overall looks good, just single comment.

Please squash commits and rebase onto the master.

@@ -151,7 +161,7 @@ public static MappingSet getDefaultMapping() {
// from the JDK Modifier values to the JCodeModel JMod values: the
// values are different.

int mods = JMod.PRIVATE + JMod.FINAL;
int mods = JMod.PUBLIC + JMod.FINAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this change and change below required for the current approach (without using setter methods for inner classes)?

@lushuifeng
Copy link
Contributor Author

done

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this issue, +1

@lushuifeng
Copy link
Contributor Author

Thank you for your review and comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants