Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1493: Make partition pruning based on catalog informations.#624

Closed
blrunner wants to merge 89 commits intoapache:masterfrom
blrunner:TAJO-1493
Closed

TAJO-1493: Make partition pruning based on catalog informations.#624
blrunner wants to merge 89 commits intoapache:masterfrom
blrunner:TAJO-1493

Conversation

@blrunner
Copy link
Contributor

@blrunner blrunner commented Jul 6, 2015

Not yet implemented unit test cases.

blrunner added 8 commits July 6, 2015 10:49
…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-server/src/test/java/org/apache/tajo/catalog/TestCatalog.java
…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java
	tajo-catalog/tajo-catalog-client/src/main/proto/CatalogProtocol.proto
	tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java
	tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/PartitionedTableRewriter.java
…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-client/src/main/proto/CatalogProtocol.proto
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java
@blrunner
Copy link
Contributor Author

blrunner commented Aug 2, 2015

I implemented CatalogStore::getPartitionsByDirectSql which execute a select statement to CatalogStore for searching right partition directories. But current tajo testing cluster use MemStore for default CatalogStore and I couldn't execute the select statement because MemStore just retain all catalog informations to HashMap. If you want to verify this patch, you can test as following:

mvn clean install -Pparallel-test -DLOG_LEVEL=WARN -Dmaven.fork.count=2 -Dtajo.catalog.store.class=org.apache.tajo.catalog.store.DerbyStore

@blrunner
Copy link
Contributor Author

blrunner commented Aug 3, 2015

I found that this patch run as expected with HiveCatalogStore and MySQLStore on my testing cluster. And simple query response had been reported as following:

  • Table schema:
create table partitioned_lineitem (L_SUPPKEY bigint, L_LINENUMBER bigint,
L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_RETURNFLAG text, L_LINESTATUS text,
L_SHIPDATE text, L_COMMITDATE text, L_RECEIPTDATE text, L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text)
partition by column (L_ORDERKEY bigint, L_PARTKEY bigint)
  • Partition numbers: 100,000
  • Select statement: select * from partitioned_lineitem limit 10;
  • Response time:
  • previous rewriter: 15 ~ 16 sec
  • improved rewriter: 12 ~ 13 sec

Honestly, I didn't implement unit test cases for executing queries because current almost tajo unit cases operate on MemStore. If we apply DerbyStore to some unit test cases for physical operator, we would make a lot of effort. It seems not to be the scope of this patch. So, I just added unit test cases for verifying direct sql. But if you want to test this patch with build commands, you can test with -Dtajo.catalog.store.class parameter as following:

mvn clean install -Pparallel-test -DLOG_LEVEL=WARN -Dmaven.fork.count=2 -Dtajo.catalog.store.class=org.apache.tajo.catalog.store.DerbyStore

…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/MemStore.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may forget the removal of commented out lines.

…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog API is public. In terms of this point, this API design does not make sense due to the following problems:

  • Users should know table schemas, their relationship, which underlying persistent storage.
    • Users should make SQL statements for all persistent storages.
  • This approach is very prone to SQL injection.

The solution is that you can use some intermediate representation, using string or java objects. They should be transformed into SQL statements depending on persistent storages in each catalog driver.

…into TAJO-1493

Conflicts:
	tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java
	tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
	tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/CatalogStore.java
	tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/MemStore.java
	tajo-common/src/main/java/org/apache/tajo/exception/ErrorMessages.java
@blrunner
Copy link
Contributor Author

Hi @hyunsik

I've implemented partition API using algebra expressions. For this feature, I implemented some visitors as following:

  • ScanQualConverter: This converts Quals of ScanNode to algebra expressions.
  • PartitionFilterAlgebraVisitor: This build SQL statement for getting partition filters by visiting algebra expressions.
  • PartitionFilterEvalNodeVisitor: This build SQL statement for getting partition filters by visiting EvalNodes.

@blrunner
Copy link
Contributor Author

@hyunsik

Thank you for your detailed review and I've just reflected your comments.

@blrunner blrunner changed the title TAJO-1493: Add a method to get partition directories with filter conditions. TAJO-1493: Make partition pruning based on catalog informations. Sep 11, 2015
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not used, you may do not need to add this method.

@blrunner
Copy link
Contributor Author

Thanks @hyunsik

I reflected your comments and left some comments. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename TimeStamp to Timestamp?

@hyunsik
Copy link
Member

hyunsik commented Sep 16, 2015

Thank you for your work. The latest patch looks good to me. Here is my +1. I leaved one comment. You can commit it after reflecting my comment if you agree.

@blrunner
Copy link
Contributor Author

@hyunsik

Thank you very much for your kind review. I've just reflected your comment. After finishing the travis CI build, I'll ship it. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants