-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-11715][table-planner-blink] Add optimize program to organize optimization phases #7834
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. Bot commandsThe @flinkbot bot supports the following commands:
|
@flinkbot approve all |
@flinkbot disapprove all |
* | ||
* @tparam OC OptimizeContext | ||
*/ | ||
class FlinkChainedPrograms[OC <: OptimizeContext] extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should also extends FlinkOptimizeProgram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea!
/** | ||
* Gets the Calcite [[Context]] defined in [[org.apache.flink.table.api.TableEnvironment]]. | ||
*/ | ||
def getContext: Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to avoid use Calcite's Context to pass some necessary staffs? The Calcite's Context is really obscure to tell what exactly we can get from it. In this case, you might thought use this pass TableConfig
around, but no one can know this though this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only TableConfig
is not enough, FunctionCatalog
is also needed by RexNodeExtractor
which is used in PushFilterIntoTableSourceScanRule
and SelectivityEstimator
in Blink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what i meant is why not just show these two method to OptimizeContext
, like
getTableConfig
and getFunctionCatalog
. Through Context
's API, nobody knows what they can get with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of programs, they need not to know the content in the Context
. The Context
instance is only be used to create HepPlanner
as a whole instance now: new HepPlanner(hepProgram, context.getContext)
. Calcite's Context
is built in TableEnvironment
now, and VolcanoPlanner
and HepPlanner
are built with this context instance. However they are built in difference places, VolcanoPlanner
is built in FlinkRelBuilder
and HepPlanner
is built in FlinkHepProgram
. What I'm worried about is someone may be forget to modify FlinkHepProgram
when he/she wants to extend Context
in TableEnvironment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't depend on Context
to pass things like TableConfig
around, is it still necessary to build Context
in TableEnvironment
? I noticed HepPlanner
has a constructor without Context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A empty Context
will be created in AbstractRelOptPlanner
if building a HepPlanner
without Context
. And the rules executed in HepPlanner
can not get TableConfig
or FunctionCatalog
instances set by user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got your point. What do you think about we just let OptimizeContext
to inherit from Calcite's Context
, and exposes some meaningful APIs to OptimizeContext
. And in each rule which wants to use TableConfig
, just get Context
and cast it to OptimizeContext.
Another comment on BatchOptimizeContext
and StreamOptimizeContext
, can we also try to unify these two? I know there are some differences between them in Blink's code, could you consider possibility of it? Since we may unify table environments for both batch and streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's more clear than before that OptimizeContext
extends Calcite's Context
and exposes meaningful APIs.
BatchOptimizeContext
and StreamOptimizeContext
can be unified after table environments have been unified. We can do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall, but i have some concerns about Calcite's Context..
53e9cab
to
640ebd8
Compare
…ptimization phases
…eContext extends Calcite context
merging this... |
…ptimization phases. This closes apache#7834
What is the purpose of the change
This commit adds optimize program to organize optimization phases.
Currently, Flink organizes the optimization phases by different methods in Batch(Stream)TableEnvironment#optimize. However this is not easy to extend especially there are more than ten optimization stages in Blink.
So in Blink, each optimization stage is abstracted into a FlinkOptimizeProgram, and FlinkChainedPrograms is responsible for organizing all the programs.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation