Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-907: Implement off-heap tuple block and zero-copy tuple.#133

Closed
hyunsik wants to merge 64 commits intoapache:masterfrom
hyunsik:TAJO-907
Closed

TAJO-907: Implement off-heap tuple block and zero-copy tuple.#133
hyunsik wants to merge 64 commits intoapache:masterfrom
hyunsik:TAJO-907

Conversation

@hyunsik
Copy link
Copy Markdown
Member

@hyunsik hyunsik commented Sep 4, 2014

Hi folks,

One week ago, I contributed run-time code generation for computation efficiency and reduction of objection creations. This work has also the similar purpose.

This patch includes off-heap memory and row block (a list of tuple) using off-heap memory container. Also, it provides various utility classes for them.

In detail, this patch includes lots of changes.

  • OffHeapMemory class
  • OffHeapRowBlock class
  • ZeroCopyTuple, which just points to actual row record stored in OffHeapRowBlock.
  • RowWriter interface
  • RowBlockReader interface and OffHeapBlockReader
  • TupleBuilder and BaseTupleBuilder
  • HeapTuple, which keeps fields in byte array instead of Datum array.
    • HeapTuple internally UnSafe to read and store field values in byte array
  • TupleComparatorCompiler, which is run-time code compiler for TupleComparator
    • It reduces branches even though tuple comparator actually should consider complex logic with all null-first/last, ascending, and descending orders.
  • Improved ExternalSortExec to use OffHeapRowBlock.
  • Others

Later, I'll replace current VTuple by HeapTuple or ZeroCopyTuple. Also, I'm planning the improvement to replace current pull-iterator model of execution engine by the push-based block iterator model. I'll describe it later in another jira issue.

Thanks,
Hyunsik

hyunsik added 30 commits August 24, 2014 02:38
@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Sep 7, 2014

As I mentioned, I improved ExternalSort to make use of OffHeapRowBlock. I'd like to share a brief benchmark test result.

Experimental environment was as follows:

  • Linux,
  • Intel Xeon,
  • 1 JVM worker, and 4 worker currency.
  • A local cluster

Data set is lineitem table of TPC-H 1GB.

The test query is

select * from lineitem order by l_orderkey, l_partkey desc;

Current implementation: 55 - 60 seconds
After OffHeapRowBlock : 22 - 25 seconds

Roughly, it reduces about 50% query response time. This change only affects physical operators, so I think that a local cluster benchmark is sufficient.

Later, I'll adopt OffHeapRowBlock to other operators and our operator model sequentially.

@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Sep 11, 2014

Rebased and implemented clone() of HeapTuple and UnSafeTuple.

…into TAJO-907

Conflicts:
	tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java
	tajo-core/src/main/java/org/apache/tajo/worker/Task.java
@jinossy
Copy link
Copy Markdown
Member

jinossy commented Sep 16, 2014

@hyunsik I can't build on JDK 1.6

java version : 1.6.0_65

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project tajo-common: Compilation failure: Compilation failure:
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[43,11] cannot find symbol
[ERROR] symbol  : variable ARRAY_BOOLEAN_BASE_OFFSET
[ERROR] location: class org.apache.tajo.util.SizeOf
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[43,48] cannot find symbol
[ERROR] symbol  : variable ARRAY_BOOLEAN_INDEX_SCALE
[ERROR] location: class org.apache.tajo.util.SizeOf
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[43,37] incompatible types
[ERROR] found   : <nulltype>
[ERROR] required: long
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[51,11] cannot find symbol
[ERROR] symbol  : variable ARRAY_BYTE_BASE_OFFSET
[ERROR] location: class org.apache.tajo.util.SizeOf
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[51,45] cannot find symbol
[ERROR] symbol  : variable ARRAY_BYTE_INDEX_SCALE
[ERROR] location: class org.apache.tajo.util.SizeOf
[ERROR] /Users/kimjh/tajo-jinossy/incubator-tajo/tajo-common/src/main/java/org/apache/tajo/util/SizeOf.java:[51,34] incompatible types
[ERROR] found   : <nulltype>
[ERROR] required: long

@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Sep 16, 2014

I've removed the constant variables of 1.7, and rebased it.

hyunsik added a commit to hyunsik/tajo that referenced this pull request Sep 16, 2014
@hyunsik
Copy link
Copy Markdown
Member Author

hyunsik commented Sep 16, 2014

Committed it to block_iteration branch.

@hyunsik hyunsik closed this Sep 16, 2014
babokim pushed a commit to babokim/tajo that referenced this pull request Dec 11, 2014
Ability to use zeppelin with ssl both https and wss
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants