[FLINK-5859] [table] Add PartitionableTableSource for partition pruning#4667
[FLINK-5859] [table] Add PartitionableTableSource for partition pruning#4667godfreyhe wants to merge 2 commits intoapache:masterfrom
Conversation
| /** | ||
| * @param relBuilder Builder for relational expressions. | ||
| */ | ||
| def setRelBuilder(relBuilder: RelBuilder): Unit |
There was a problem hiding this comment.
Can you move this method to PartitionableTableSource?
There was a problem hiding this comment.
setRelBuilder method is called in PushFilterIntoTableSourceScanRule. If we move setRelBuilder method to PartitionableTableSource, PushFilterIntoTableSourceScanRule should know FilterableTableSource and PartitionableTableSource both.
| /** | ||
| * The base class of partition | ||
| */ | ||
| trait Partition { |
There was a problem hiding this comment.
Can you provide more detailed description about what is a "Partition" and how one PartitionableTableSource will do partition pruning. User cannot get precise intuition about what is a field of partition, and what does origin value mean.
There was a problem hiding this comment.
I will add more description about them.
The origin value means the entire partition value in the Partition instance. A partition value may be simple, such as the data is split by year (year=2015, year=2016); and A partition value may be complex, such as the data is split by year and month (year=2015,month=01, year=2015,month=02, year=2016,month=01, year=2016,month=02).
| * list. Don't try to reorganize the predicates if you are absolutely confident with that. | ||
| * | ||
| * @param partitionPruned Whether partition pruning is applied. | ||
| * @param prunedPartitions Remaining partitions after partition pruning applied. |
There was a problem hiding this comment.
Looks like the definition of "prunedPartitions" is contrary here. I think we should stick to only one definition, either "prunedPartitions" represents all partitions which have been pruned, or all remaining partitions which survive after pruning.
There was a problem hiding this comment.
prunedPartitions=> remainingPartitions
| * organized in CNF conjunctive form, and we should only take or leave each element from the | ||
| * list. Don't try to reorganize the predicates if you are absolutely confident with that. | ||
| * | ||
| * @param partitionPruned Whether partition pruning is applied. |
There was a problem hiding this comment.
We should make this flag more clear. If you mean this flag represents whether the partition pruning is applied, i would say it should always be true, because when this method been called, at least framework had tried to apply the partition pruning.
There was a problem hiding this comment.
partitionPruned will be false, when the filter dose not contain partition conditions, otherwise it will be true. partitionPruned will be change to isPartitionPrunedApplied
KurtYoung
left a comment
There was a problem hiding this comment.
Hi @godfreyhe , thanks for your contribution, I left some comments.
2fc5f9d to
d993985
Compare
What is the purpose of the change
This pull request adds PartitionableTableSource for partition pruning when optimizing the query plan. That way both query optimization time and execution time can be reduced obviously, especially for a large partitioned table.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation