Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-906: Runtime code generation for evaluating expression trees.#113

Closed
hyunsik wants to merge 136 commits into
apache:masterfrom
hyunsik:TAJO-906
Closed

TAJO-906: Runtime code generation for evaluating expression trees.#113
hyunsik wants to merge 136 commits into
apache:masterfrom
hyunsik:TAJO-906

Conversation

@hyunsik
Copy link
Copy Markdown
Member

@hyunsik hyunsik commented Aug 11, 2014

This is still an ongoing work. I share it for reviewing an overall approach.

Note that this patch includes some third party library because the version of ow2.asm is conflict to Hadoop's one. So, we don't need to review ow2.asm.

Also, this patch still does not pass one unit test related to IntervalDatum. I'll fix it soon.

hyunsik added 30 commits April 2, 2014 22:18
…into CodeGen

Conflicts:
	tajo-common/src/main/java/org/apache/tajo/exception/InvalidCastException.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/InvalidCastException.java
	tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java
	tajo-core/tajo-core-backend/pom.xml
	tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/InvalidCastException.java
…into CodeGen

Conflicts:
	tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/NotEval.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/SignedEval.java
…into CodeGen

Conflicts:
	tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java
	tajo-common/src/main/java/org/apache/tajo/datum/BooleanDatum.java
	tajo-common/src/main/java/org/apache/tajo/datum/exception/InvalidCastException.java
	tajo-common/src/main/java/org/apache/tajo/exception/InvalidCastException.java
	tajo-common/src/main/java/org/apache/tajo/util/TUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/codegen/CodeGenException.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/AlgebraicUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalType.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/SimpleEvalNodeVisitor.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/UnaryEval.java
	tajo-core/src/main/java/org/apache/tajo/engine/planner/ExprAnnotator.java
	tajo-core/src/test/java/example/Example.java
	tajo-core/src/test/java/org/apache/tajo/engine/eval/TestSQLExpression.java
…into TAJO-906

Conflicts:
	tajo-core/src/main/java/org/apache/tajo/engine/eval/EvalType.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java
	tajo-core/src/main/java/org/apache/tajo/engine/eval/SimpleEvalNodeVisitor.java
…into TAJO-906

Conflicts:
	tajo-common/src/main/java/org/apache/tajo/util/Pair.java
	tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java
@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Aug 21, 2014

I've rebased and fixed many bugs. Even though there are some trivial issues to do, this patch is ready to be reviewed.

…into TAJO-906

Conflicts:
	tajo-jdbc/src/main/java/org/apache/tajo/jdbc/MetaDataTuple.java
	tajo-storage/src/main/java/org/apache/tajo/storage/FrameTuple.java
	tajo-storage/src/main/java/org/apache/tajo/storage/LazyTuple.java
	tajo-storage/src/main/java/org/apache/tajo/storage/Tuple.java
	tajo-storage/src/main/java/org/apache/tajo/storage/VTuple.java
@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Aug 22, 2014

I've updated the patch. I've improved TajoTestingCluster to take system properties as follows:

mvn clean install -DCODEGEN=true

If CODEGEN is a session variable, it will be applied to QueryContext instance used in all unit tests. So, in order to test the code generation feature, you should give -DCODEGEN=true when you execute mvn install. It can be used for other session variables too.

For test for real queries, you need to set a session variable CODEGEN as follows:

tajo> \set CODEGEN true
tajo> 
tajo> SELECT .....

@blrunner
Copy link
Copy Markdown
Contributor

Hi @hyunsik

This is a really great work. But I recommend for you to consider license.
If you add ASM project license to tajo license file and tajo notice file, it would not be a problem.

Cheers

* Removed ASF standard license according to http://www.apache.org/legal/src-headers.html.
* Moved ASM from the top-level dir to tajo-thirdparty.
* Add ASM's license to LICENSE
@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Aug 23, 2014

Thanks @blrunner for the review. I've updated the follows according to ASF license policy.

@blrunner
Copy link
Copy Markdown
Contributor

I checked on ASF license deletion from ASM source codes. And I double-checked tajo LICENSE file.
'mvn clean install -DCODEGEN=true' finished successfully. And then I found that CODEGEN variable on tsql as follows:
{code:xml}
default> \set CODEGEN true
default> \set
'CURRENT_DATABASE'='default'
'CODEGEN'='true'
'SESSION_ID'='f902f118-2649-46df-97ec-f15c2f0c75f5'
'USERNAME'='blrunner'
'SESSION_LAST_ACCESS_TIME'='1408801963706'
{code}

But I have a question about this patch. Is there any way of checking runtime code generation on local cluster?

@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Aug 23, 2014

Hi @blrunner,

If CODEGEN session variable is enabled, all queries will work by using code generation. But, currently, we cannot expect its performance improvement. The code generation is designed to avoid Datum objects creations during any computations and reduce interpretation overheads like branches.

But, in order to keep the compatibility against our current Tuple and Datum mechanism, code generation feature still create lots of Datum objects; currently it creates objects two times for compatibility. Currently, I'm working new tuple structure using direct memory, which uses a sequence of bytes as a row blocks instead of an array of Datum objects. After than, it will give big performance benefits.

Thanks!

@blrunner
Copy link
Copy Markdown
Contributor

+1

Thank you for your detailed comments.
I'm also looking forward for it to improve tajo performance. :)

@asfgit asfgit closed this in 7603a3d Aug 23, 2014
@hyunsik hyunsik deleted the TAJO-906 branch August 23, 2014 17:52
babokim pushed a commit to babokim/tajo that referenced this pull request Dec 11, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants