TAJO-906: Runtime code generation for evaluating expression trees.#113
TAJO-906: Runtime code generation for evaluating expression trees.#113hyunsik wants to merge 136 commits into
Conversation
…eption and InvalidOperation.
…into CodeGen Conflicts: tajo-common/src/main/java/org/apache/tajo/exception/InvalidCastException.java tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java tajo-core/src/main/java/org/apache/tajo/engine/eval/InvalidCastException.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java tajo-core/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java tajo-core/tajo-core-backend/pom.xml tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/InvalidCastException.java
…into CodeGen Conflicts: tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java tajo-core/src/main/java/org/apache/tajo/engine/eval/NotEval.java tajo-core/src/main/java/org/apache/tajo/engine/eval/SignedEval.java
…into CodeGen Conflicts: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java tajo-common/src/main/java/org/apache/tajo/datum/BooleanDatum.java tajo-common/src/main/java/org/apache/tajo/datum/exception/InvalidCastException.java tajo-common/src/main/java/org/apache/tajo/exception/InvalidCastException.java tajo-common/src/main/java/org/apache/tajo/util/TUtil.java tajo-core/src/main/java/org/apache/tajo/engine/codegen/CodeGenException.java tajo-core/src/main/java/org/apache/tajo/engine/eval/AlgebraicUtil.java tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalType.java tajo-core/src/main/java/org/apache/tajo/engine/eval/SimpleEvalNodeVisitor.java tajo-core/src/main/java/org/apache/tajo/engine/eval/UnaryEval.java tajo-core/src/main/java/org/apache/tajo/engine/planner/ExprAnnotator.java tajo-core/src/test/java/example/Example.java tajo-core/src/test/java/org/apache/tajo/engine/eval/TestSQLExpression.java
…into TAJO-906 Conflicts: tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalType.java tajo-core/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java tajo-core/src/main/java/org/apache/tajo/engine/eval/SimpleEvalNodeVisitor.java
…into TAJO-906 Conflicts: tajo-common/src/main/java/org/apache/tajo/util/Pair.java tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java
|
I've rebased and fixed many bugs. Even though there are some trivial issues to do, this patch is ready to be reviewed. |
…into TAJO-906 Conflicts: tajo-jdbc/src/main/java/org/apache/tajo/jdbc/MetaDataTuple.java tajo-storage/src/main/java/org/apache/tajo/storage/FrameTuple.java tajo-storage/src/main/java/org/apache/tajo/storage/LazyTuple.java tajo-storage/src/main/java/org/apache/tajo/storage/Tuple.java tajo-storage/src/main/java/org/apache/tajo/storage/VTuple.java
|
I've updated the patch. I've improved TajoTestingCluster to take system properties as follows: If For test for real queries, you need to set a session variable CODEGEN as follows: |
|
Hi @hyunsik This is a really great work. But I recommend for you to consider license. Cheers |
* Removed ASF standard license according to http://www.apache.org/legal/src-headers.html. * Moved ASM from the top-level dir to tajo-thirdparty. * Add ASM's license to LICENSE
…into DirectMemTuple
|
Thanks @blrunner for the review. I've updated the follows according to ASF license policy.
|
|
I checked on ASF license deletion from ASM source codes. And I double-checked tajo LICENSE file. But I have a question about this patch. Is there any way of checking runtime code generation on local cluster? |
|
Hi @blrunner, If CODEGEN session variable is enabled, all queries will work by using code generation. But, currently, we cannot expect its performance improvement. The code generation is designed to avoid Datum objects creations during any computations and reduce interpretation overheads like branches. But, in order to keep the compatibility against our current Tuple and Datum mechanism, code generation feature still create lots of Datum objects; currently it creates objects two times for compatibility. Currently, I'm working new tuple structure using direct memory, which uses a sequence of bytes as a row blocks instead of an array of Datum objects. After than, it will give big performance benefits. Thanks! |
|
+1 Thank you for your detailed comments. |
Support Select input form
This is still an ongoing work. I share it for reviewing an overall approach.
Note that this patch includes some third party library because the version of ow2.asm is conflict to Hadoop's one. So, we don't need to review ow2.asm.
Also, this patch still does not pass one unit test related to IntervalDatum. I'll fix it soon.