Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

TAJO-1952: Implement PartitionFileFragment #846

Closed
wants to merge 0 commits into from

Conversation

blrunner
Copy link
Contributor

@blrunner blrunner commented Nov 5, 2015

This patch contains following modifications:

  • Remove partition paths from PartitionedTableScanNode
  • Implement PartitionedFileFragment
  • Separate a method for pruning partition paths from PartitionedTableRewriter to PartitionedTableUtil
  • Build a type by using partition name which contains partition keys and values

/**
* Generate the list of files and make them into FileSplits.
*
* @throws IOException
*/
public List<Fragment> getSplits(String tableName, TableMeta meta, Schema schema, Path... inputs)
public List<Fragment> getSplits(String tableName, TableMeta meta, Schema schema, String[] partitions, Path... inputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to give a suggestion. We need to keep each method for one purpose if possible. That approach would keep logic simpler.

@hyunsik
Copy link
Member

hyunsik commented Nov 19, 2015

Could you rebase it against the latest revision? I'd like to try the patch on my machine.

@blrunner
Copy link
Contributor Author

@hyunsik

Thank you for your review. I've just rebased it against the latest version.

repeated string hosts = 5;
repeated int32 disk_ids = 6;
// Partition Name: country=KOREA/city=SEOUL
required string partitionName = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opionion, partitionKeys would be more proper because the attribute includes concatenated partition keys.

@blrunner
Copy link
Contributor Author

@hyunsik

Thank your for your review. I've just reflected your comments.

@blrunner
Copy link
Contributor Author

I updated the patch as following:

  • Renamed PartitionedFileFragment to PartitionFileFragment
  • Recovered codes for pruning partition in PartitionedTableRewrtier
  • Recovered partition paths of PartitionedTableScanNode and added partition keys to PartitionedTableScanNode
  • Implemented PartitionContent which contains partition paths, partition keys, partition volume

For the reference, I've tried to maintain existing codes as far as possible.

public PartitionContent() {
}

public PartitionContent(Path[] partitionPaths) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused constructor

@blrunner
Copy link
Contributor Author

@hyunsik

Updated the patch as following:

  • Remove unused packages
  • Remove PartitionedTableScanNode of PlanProto
  • Remove unnecessary PartitionedTableScanNode usages of Repartitioner and PhysicalPlannerImpl

@blrunner blrunner changed the title TAJO-1952: Implement PartitionedFileFragment TAJO-1952: Implement PartitionFileFragment Nov 25, 2015
@blrunner
Copy link
Contributor Author

Added unit test cases for PartitionedTableRewriter

}

List<Map.Entry<String, Integer>> entries = new ArrayList<>(hostsBlockMap.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be simplified with lambda.

@blrunner
Copy link
Contributor Author

blrunner commented Dec 3, 2015

@hyunsik

Thank you for your detailed review. I reflected your comments.

@blrunner
Copy link
Contributor Author

blrunner commented Dec 7, 2015

@hyunsik

I changed PartitionFileFragment by extending FileFragment because there are some codes for only FileFragment. For example, you can see cast codes as following:

  public FileScanner(Configuration conf, final Schema schema, final TableMeta meta, final Fragment fragment) {
    this.conf = conf;
    this.meta = meta;
    this.schema = schema;
    this.fragment = (FileFragment)fragment;
    this.tableStats = new TableStats();
    this.columnNum = this.schema.size();
  }

@blrunner
Copy link
Contributor Author

Evaluated patch testing before and after PartitionFileFragment implementation as following:

  • Dataset : TPCH-100G
  • Tajo Cluster : 1 Master, 6 Workers
  • Queries: Q1, Q3, Q5, Q6, Q7, Q8, Q9, Q10
  • Tajo version for before PartitionFileFragment implementation
    • 0.11.1-SNAPSHOT
  • Tajo version for after PartitionFileFragment implementation:
    • 0.12.0-SNAPSHOT (partitions exist on catalog)
    • 0.12.0-SNAPSHOT (partitions doesn't exist on catalog)

The results were same as following:

  • The number and order of execution blocks
  • The number of tasks in a each execution block
  • The number of rows in a result
  • All tuples in a result (excluded a few floating point value)

@blrunner
Copy link
Contributor Author

blrunner commented Jan 5, 2016

The Travis CI Build seems like a fail by another reason. Here is the pre-commit report by lasted patch as follows. https://builds.apache.org/job/PreCommit-TAJO-Build/900//console

@blrunner
Copy link
Contributor Author

@hyunsik

Is there a problem that needs to be fixed?

@blrunner
Copy link
Contributor Author

@hyunsik

I updated this PR as following:

  • Remove the list of partition paths in PartitionedTableScanNode. It just contains the path of table root.
  • PartitionedTableRewriter just rewrite ScanNode to PartitionedTableScanNode in BaseLogicalPlanRewriteRuleProvider.
  • Actual partition pruning will be executed in Repartitioner by PartitionedTableRewriter.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants