Skip to content

Conversation

swuferhong
Copy link
Contributor

What is the purpose of the change

This pr is aims to introduce a new dynamic partition pruning strategy named FlinkDynamicPartitionPruningConverterProgram to instead of the old dpp rule DynamicPartitionPruningRule to support more dpp patterns. Now, the old dpp rule is tightly coupled with the join reorder rules, which will affect the result of join reorder. At the same time, the dpp rule don't support these patterns like union node/ agg node in fact sidedim side is a sub-query etc. This pr will fix this two problem together.

Brief change log

  • Introduce a new dynamic partition pruning strategy named FlinkDynamicPartitionPruningConverterProgram to instead of the old dpp rule DynamicPartitionPruningRule.
  • Add more UT tests in FlinkDynamicPartitionPruningConverterProgramTest.

Verifying this change

  • Add more UT tests in FlinkDynamicPartitionPruningConverterProgramTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no docs

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 12, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @swuferhong , I left some comments

Copy link
Contributor

@godfreyhe godfreyhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, LGTM

if (tables.size() == 0) {
tables.add(catalogTable);
} else {
for (ContextResolvedTable thisTable : new ArrayList<>(tables)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why we need new ArrayList instance here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why we need new ArrayList instance here?

The list variable tables have an add operation in the for loop. If we don't new one new ArrayList, a concurrent modification exception will be thrown.

@godfreyhe
Copy link
Contributor

It's better we can add this optimization in flink-tpcds-test module, we can do it in another pr.

@swuferhong
Copy link
Contributor Author

It's better we can add this optimization in flink-tpcds-test module, we can do it in another pr.
Ok. I will make a jira issue to track this work.

@godfreyhe godfreyhe closed this in d9f9b55 Jan 9, 2023
chucheng92 pushed a commit to chucheng92/flink that referenced this pull request Feb 3, 2023
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants