[FLINK-37763][table] Support multiple table arguments in PTFs#26600
[FLINK-37763][table] Support multiple table arguments in PTFs#26600twalthr merged 2 commits intoapache:masterfrom
Conversation
| public static final ConfigOption<Integer> TABLE_OPTIMIZER_PTF_MAX_TABLES = | ||
| key("table.optimizer.ptf.max-tables") | ||
| .intType() | ||
| .defaultValue(20) |
There was a problem hiding this comment.
out of curiosity do we know if there is any other vendor offering PTF feature and what limit do they have?
There was a problem hiding this comment.
Not many vendors support multi input PTFs. Maybe even just Oracle. In any case, the limit has been discussed with @pnowojski and should be reasonable default that our engine might still be able to handle. All we wanted to avoid is immediately supporting Integer.MAX_VALUE.
snuyanzin
left a comment
There was a problem hiding this comment.
thanks for moving this forward
one minor thing: ptfs for chinese version also would make sense to update
|
Thank you @snuyanzin. I will add the Chinese docs now, I just wanted to wait until the feedback has settled. |
| evalCollector = new PassPartitionKeysCollector(output, changelogMode, tableSemantics); | ||
| } | ||
| onTimerCollector = new PassAllCollector(output, changelogMode); | ||
| // Collect with partition keys for each table |
There was a problem hiding this comment.
nit: is this comment correct? Seems to be the same for PassPartitionKeysCollector which collects with the partition key
There was a problem hiding this comment.
In general the comment is correct. The result is described here, just using a different collector to achieve the same. I will adjust the comment.
What is the purpose of the change
This adds support for multiple table arguments in PTFs. It supports table arguments with set semantics only, as mentioned in the FLIP. Because
MultipleInputStreamOperatoralready supports n-ary inputs, this PR can add support for more than two inputs in the first version.Until Calcite fully supports the
COPARTITIONclause, including the tricky scope resolution ofSELECT E.* FROM f(T1 => TABLE(Emp) AS E, ...), we assume that all given tables must be copartitioned. This should be a reasonable initial assumption as cross products are undesirable in a distributed system. Once theCOPARTITIONis supported, the SQL calling syntax should change fromf(T1 => TABLE Emp PARTITION BY c, ...)tof(T1 => TABLE(Emp) AS E PARTITION BY c,..., COPARTITION(E, ...). Thus, currently we are not standard compliant in this regard but with forward compatible considerations.Brief change log
ProcessTableOperatorto a single-input, non-keyed one and a multi-input, keyed one.Verifying this change
This change added tests and can be verified as follows:
and various unit tests in
ProcessTableFunctionTest.Does this pull request potentially affect one of the following parts:
@Public(Evolving): yesDocumentation