Skip to content

[FLINK-5859] [table] Add PartitionableTableSource for partition pruning#4667

Closed
godfreyhe wants to merge 2 commits intoapache:masterfrom
godfreyhe:FLINK-5859
Closed

[FLINK-5859] [table] Add PartitionableTableSource for partition pruning#4667
godfreyhe wants to merge 2 commits intoapache:masterfrom
godfreyhe:FLINK-5859

Conversation

@godfreyhe
Copy link
Contributor

What is the purpose of the change

This pull request adds PartitionableTableSource for partition pruning when optimizing the query plan. That way both query optimization time and execution time can be reduced obviously, especially for a large partitioned table.

Brief change log

  • Adds PartitionableTableSource which extends FilterableTableSource
  • Adds setRelBuilder method in FilterableTableSource class
  • Adds implementation for partition pruning and extracting partition predicates

Verifying this change

This change added tests and can be verified as follows:

  • Added integration tests for PartitionableTableSource on batch and stream sql
  • Added test that validates the correct of partition pruning
  • Added test that validates the correct of extracting partition predicates

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (JavaDocs)

/**
* @param relBuilder Builder for relational expressions.
*/
def setRelBuilder(relBuilder: RelBuilder): Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this method to PartitionableTableSource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setRelBuilder method is called in PushFilterIntoTableSourceScanRule. If we move setRelBuilder method to PartitionableTableSource, PushFilterIntoTableSourceScanRule should know FilterableTableSource and PartitionableTableSource both.

/**
* The base class of partition
*/
trait Partition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more detailed description about what is a "Partition" and how one PartitionableTableSource will do partition pruning. User cannot get precise intuition about what is a field of partition, and what does origin value mean.

Copy link
Contributor Author

@godfreyhe godfreyhe Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add more description about them.

The origin value means the entire partition value in the Partition instance. A partition value may be simple, such as the data is split by year (year=2015, year=2016); and A partition value may be complex, such as the data is split by year and month (year=2015,month=01, year=2015,month=02, year=2016,month=01, year=2016,month=02).

* list. Don't try to reorganize the predicates if you are absolutely confident with that.
*
* @param partitionPruned Whether partition pruning is applied.
* @param prunedPartitions Remaining partitions after partition pruning applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the definition of "prunedPartitions" is contrary here. I think we should stick to only one definition, either "prunedPartitions" represents all partitions which have been pruned, or all remaining partitions which survive after pruning.

Copy link
Contributor Author

@godfreyhe godfreyhe Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prunedPartitions=> remainingPartitions

* organized in CNF conjunctive form, and we should only take or leave each element from the
* list. Don't try to reorganize the predicates if you are absolutely confident with that.
*
* @param partitionPruned Whether partition pruning is applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this flag more clear. If you mean this flag represents whether the partition pruning is applied, i would say it should always be true, because when this method been called, at least framework had tried to apply the partition pruning.

Copy link
Contributor Author

@godfreyhe godfreyhe Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionPruned will be false, when the filter dose not contain partition conditions, otherwise it will be true. partitionPruned will be change to isPartitionPrunedApplied

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @godfreyhe , thanks for your contribution, I left some comments.

@KurtYoung
Copy link
Contributor

LGTM, @fhueske @twalthr can you also take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants