[FLINK-39392][table] Support conditional traits for PTFs by gustavodemorais · Pull Request #27886 · apache/flink

gustavodemorais · 2026-04-02T11:36:23Z

What is the purpose of the change

Adds conditional traits for PTF table arguments, enabling traits that vary based on the SQL call context (e.g., whether PARTITION BY is present). Applied to TO_CHANGELOG: row semantics by default, set semantics when PARTITION BY is provided.

Brief change log

New StaticArgument.withConditionalTrait(trait, condition) and applyConditionalTraits(ctx) APIs
New TraitCondition interface with hasPartitionBy(), argIsEqualTo(), not()
New TraitContext interface for read-only trait resolution context
TO_CHANGELOG uses .withConditionalTrait(SET_SEMANTIC_TABLE, hasPartitionBy())
Planner resolves traits early via applyConditionalTraits in type inference, UID derivation, distribution, and runtime semantics
Renamed isPtfUpsert to ptfRequiresUpdateBefore
Distribution rule uses full trait context
Updated docs with optional [PARTITION BY] syntax

Verifying this change

Plan test testSetSemanticsWithPartitionBy verifies set semantics produces Exchange(hash)
Existing row semantics tests verify no regression
ProcessTableFunctionSemanticTests verify no impact on other PTFs

Does this pull request potentially affect one of the following parts

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (docs / JavaDocs)

flinkbot · 2026-04-02T11:45:34Z

CI report:

d054743 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

twalthr

Thank you for this PR @gustavodemorais. Overall I'm +1 for this change. However, we need to clearly define the boundaries, when static arguments are fully resolved and a trait condition has no effect anymore. Some locations look currently very hacky, we should take another look. Also we need Table API support which is not covered by this PR, at least not in tests.

twalthr · 2026-04-09T12:49:35Z

+    }
+
+    /** True when the named boolean argument is provided and its value is {@code true}. */
+    static TraitCondition argIsTrue(final String name) {


generialize the is true and is false to:

static <T> TraitCondition argIsEqualTo(T obj) {ctx.getScalarArgument(name, obj.getClass) == obj}

twalthr · 2026-04-09T12:58:27Z

        }

        final int timeColumn = inputTimeColumns.get(tableArgCall.getInputIndex());
+        final org.apache.flink.table.types.inference.TraitContext traitCtx =


pay attention to full imports, seems Claude loves to do this

Suggested change

final org.apache.flink.table.types.inference.TraitContext traitCtx =

final TraitContext traitCtx =

same comment as above. resolve the static arg as early as possible to not reconstruct TraitContext multiple times

twalthr · 2026-04-09T13:02:32Z

+                if (operand.getKind() == SqlKind.DEFAULT || !(operand instanceof RexLiteral)) {
+                    return Optional.empty();
+                }
+                return Optional.ofNullable(((RexLiteral) operand).getValueAs(clazz));


this is too simple, it should follow the same rules as CallContext does. Otherwise it won't be possible e.g. to get Instant.class or other literals.

That's right, I've extended it to support NULL, DEFAULT, DESCRIPTOR, MAP and literals using a similar logic as we have in CallContext.

We could theoretically create an OperatorBindingCallContext here to avoid code duplication and delegate but it's heavy to have a whole OperatorBindingCallContext only for this and creates a tight coupling between StreamPhysicalProcessTableFunction

twalthr · 2026-04-09T13:03:45Z

+        final boolean hasPartitionBy = partitionKeys.length > 0;
+        final boolean reportedAsSet = tableCharacteristic.semantics == Semantics.SET;
+        final boolean setIsConditional =
+                staticArg.hasConditionalTrait(StaticArgumentTrait.SET_SEMANTIC_TABLE);


too fragile. determine the effective StaticArgument first and then execute this logic.

We're using now StreamPhysicalProcessTableFunction.buildTraitContext and resolving the traits here before using it

raminqaf

Did a first pass and left some comments for improvements

raminqaf · 2026-04-21T05:40:11Z

+     */
+    public StaticArgument addTraitWhen(
+            final TraitCondition condition, final StaticArgumentTrait trait) {
+        final List<ConditionalTrait> newList = new ArrayList<>(this.conditionalTraits);


Any reason we copy the list everytime we add new elements to it?

In general, not strictly necessary in this case but I follow the immutable builder pattern by default. It's a fluent builder that returns a new instance at each step, just like how StaticArgument.table() returns a new instance. It's the same pattern as EnumSet.of() or List.of() - each call produces a distinct value

raminqaf · 2026-04-21T05:44:04Z

+            @Nullable Class<?> conversionClass,
+            boolean isOptional,
+            EnumSet<StaticArgumentTrait> traits,
+            List<ConditionalTrait> conditionalTraits) {


Not sure if a list is the most suitable data structure for the traits here. Maybe representing as a HashMap would make more sense. With a list we could still allow the user to define two conditional traits with the same StaticArgument.
[StaticArgumentTrait.SET_SEMANTIC_TABLE, hasPartitionBy, StaticArgumentTrait.SET_SEMANTIC_TABLE, hasSomeCondition]

I've given this some though and decided we could go with OR semantics for multiple conditional traits for the same trait. This makes it possible that the user writes multiple simple condition traits and thus a list makes sense. the trait is activated if any of its conditions is met. I've documented the behavior

raminqaf · 2026-04-21T05:49:55Z

+ */
+@PublicEvolving
+@FunctionalInterface
+public interface TraitCondition extends Serializable {


How about we extend Java's Predicate?

Suggested change

public interface TraitCondition extends Serializable {

public interface TraitCondition extends Serializable, Predicate<TraitContext> {

If we extend Predicate to inherit and(), or(), negate() those return Predicate not TraitCondition, so composition breaks

Also, we can't simply use Predicate because we need it to be serializable, since these are stored. I think we'll have to go with the TraitCondition as it is

gustavodemorais · 2026-04-21T11:17:27Z

Thanks for the reviews, @raminqaf and @twalthr. I've addressed all the comments and tried to make the maintain the change scope to only adding the conditional traits concept. I've also only resolved it where it's necessary for now and we can proceed to do that for the other places where we need to do it as we add new features, instead of resolving it everywhere where we read the traits. Take a look

gustavodemorais · 2026-04-21T11:43:42Z

One note @twalthr, I've had the feeling multiple times that "isPtfUpsert" wasn't the best name for the function so I did a small refactoring and renamed it to "requiresUpdateBefore" adjusting the logic in the appropriate places. That helps with understanding that it's related to the input of the ptf and says exactly what we're checking. I think the code is easier to read now. Take a look and let me know what you think c0e2242

twalthr

Thank you @gustavodemorais. I added some more comments to harden the PTF framework for this.

twalthr · 2026-04-21T12:12:33Z

-| `op`         | No       | A `DESCRIPTOR` with a single column name for the operation code column. Defaults to `op`. |
+| Parameter    | Required | Description                                                                                                                                                                                                                                                                                                                                              |
+|:-------------|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `input`      | Yes      | The input table. With `PARTITION BY`, rows with the same key are co-located and run in the same operator instance. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance.                                      |


Suggested change

| `input` | Yes | The input table. With `PARTITION BY`, rows with the same key are co-located and run in the same operator instance. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance. |

| `input` | Yes | The input table. With `PARTITION BY`, rows with the same key are co-located and run in the same operator instance. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, a provided `PARTITION BY` must match the upsert key of the subquery. |

The key doesn't have to match exactly, it can be a subset. Updating with "For upsert tables, the provided PARTITION BY key should match or be a subset of the upsert key of the subquery"

twalthr · 2026-04-21T12:19:46Z

    }
+
+    /** A trait that is conditionally added based on a {@link TraitCondition}. */
+    private static final class ConditionalTrait implements Serializable {


Suggested change

private static final class ConditionalTrait implements Serializable {

private static final class ConditionalTrait {

twalthr · 2026-04-21T12:21:32Z

+                for (int i = 0; i < staticArgs.size(); i++) {
+                    final StaticArgument arg = staticArgs.get(i);
+                    if (arg.is(StaticArgumentTrait.SCALAR) && arg.getName().equals(name)) {
+                        if (!callContext.isArgumentLiteral(i)) {


just do double check: do we also need a null check here via callContext.isNullLiteral? Or is that covered by getArgumentValue?

getArgumentValue returns Optional.empty() for null args, and isArgumentLiteral returns false for null literals. So the null case is already handled

twalthr · 2026-04-21T12:22:58Z

+                                if (semantics == null) {
+                                    return Stream.<Field>empty();
+                                }


nit:

Suggested change

if (semantics == null) {

return Stream.<Field>empty();

}

twalthr · 2026-04-21T12:25:39Z

+ *
+ * StaticArgument.table("input", Row.class, false, EnumSet.of(TABLE, SUPPORT_UPDATES))
+ *         .withConditionalTrait(SET_SEMANTIC_TABLE, hasPartitionBy());
+ * }</pre>


Mention that hashCode/Equals need to be implemented otherwise StaticArgument.equals/hashCode won't work.

twalthr · 2026-04-21T12:26:16Z

+ */
+@PublicEvolving
+@FunctionalInterface
+public interface TraitCondition extends Serializable {


Suggested change

public interface TraitCondition extends Serializable {

public interface TraitCondition {

I though we needed it - removed

twalthr · 2026-04-21T12:28:21Z

-            if (arg.is(StaticArgumentTrait.ROW_SEMANTIC_TABLE)) {
-                semantics = TableCharacteristic.Semantics.ROW;
-            } else if (arg.is(StaticArgumentTrait.SET_SEMANTIC_TABLE)) {
+            // Report SET if it may apply - which allows the use of Partition BY


Suggested change

// Report SET if it may apply - which allows the use of Partition BY

// Report SET semantics if it may apply - which allows the use of PARTITION BY

twalthr · 2026-04-21T12:31:26Z


        final int timeColumn = inputTimeColumns.get(tableArgCall.getInputIndex());
+        final StaticArgument resolvedArg =
+                tableArg.applyConditionalTraits(


Can we apply them earlier? Ideally, this topic is done as early as possible. org.apache.flink.table.planner.plan.rules.physical.stream.StreamPhysicalProcessTableFunctionRule#convert could be a good location.

Yes, that's a better idea. I've moved the resolution to StreamPhysicalProcessTableFunction's constructor instead of the rule. What do you think?

Take a look aa4f50c

I've moved the resolution to StreamPhysicalProcessTableFunction's constructor

Do we then need resolution in ChangelogModeInferenceProgram?

twalthr · 2026-04-21T12:38:59Z

+     * which require the full CallContext bridge.
+     */
+    @SuppressWarnings("unchecked")
+    private static <T> Optional<T> findScalarLiteral(


Reuse StreamPhysicalProcessTableFunction.toCallContext

Thanks for the pointer. I've created an overload that supports feting the scalar literals but doesn't require the additional changelog information which we don't use and do not require. Using it now

public static CallContext toCallContext(RexCall udfCall) { return toCallContext(udfCall, null, null, null); }

…eFunction and reuse it

raminqaf

Thanks for this PR and solving this issue! Left some comments and questions!

raminqaf · 2026-04-22T06:40:06Z

+     */
+    public StaticArgument withConditionalTrait(
+            final StaticArgumentTrait trait, final TraitCondition condition) {
+        if (trait == StaticArgumentTrait.SCALAR


Should we make these part of the ConditionalTrait class? Suggestion a EnumSet called IllegalConditionalTraits and having a method (isConditionalTrait) that checks it.

raminqaf · 2026-04-22T06:55:11Z

+
+    /** True when the named scalar argument equals the expected value. */
+    @SuppressWarnings("unchecked")
+    static <T> TraitCondition argIsEqualTo(final String name, final T expected) {


Should we insure type safety by passing the Class<T> clazz to the method?

It's a valid question - I've only added this as a placeholder example for now. We can check if type verification is necessary when we use it in the following prs and adjust it acordingly

Given that the expected always comes from reviewed implementation, I don't think he class is necessary here.

raminqaf · 2026-04-22T06:58:35Z

+ * <p>Conditions are evaluated at planning time using the {@link TraitContext} which provides access
+ * to the SQL call's properties (PARTITION BY presence, scalar literal values, etc.).
+ *
+ * <p>Implementations must implement {@code hashCode} and {@code equals} for {@link


Javadocs is inforcing to implement hashCode and equals but none of the implementations bellow (argIsEqualTo and not) is doing this.

raminqaf · 2026-04-22T07:01:45Z

+            @Nullable final TableSemantics semantics,
+            final CallContext callContext,
+            final List<StaticArgument> staticArgs) {
+        return new TraitContext() {


We have two implementations of TraitContext. One here and the other one in StreamPhysicalProcessTableFunction any reason for that?

If they need to be similar move the implementation into TraitContext itself as a static factory (TraitContext.of( @Nullable TableSemantics semantics, CallContext callContext, List<StaticArgument> staticArgs))

Yes, we need the CallContext to resolve the arguments and there no obvious single place to resolve this once so that we don't have multiple places. At logical time, for example when we create TypeInputStrategy, we only have the Logical Operator instance but not the CallContext yet - a logical operator can be reused multiple times throughout multiple calls inside a single SQL command. But I agree with you: I don't like this and that's what i'm doing now, trying to find a better place so we do it only once

I've moved the implementation to TraitContext and simplified resolution. See #27886 (comment)

raminqaf · 2026-04-22T07:12:34Z

+    /** ROW and SET semantics are mutually exclusive - adding one removes the other. */
+    private static void removeMutuallyExclusiveTraits(
+            final EnumSet<StaticArgumentTrait> traits, final StaticArgumentTrait adding) {
+        if (adding == StaticArgumentTrait.SET_SEMANTIC_TABLE) {


We can introduce a getIncompatibleWith() method and simplify this to

Suggested change

if (adding == StaticArgumentTrait.SET_SEMANTIC_TABLE) {

private static void removeMutuallyExclusiveTraits(

EnumSet<StaticArgumentTrait> traits, StaticArgumentTrait adding) {

traits.removeAll(adding.getIncompatibleWith());

}

or add an else branch for fast fails (personally prefer a switch with default here)

…nd ExecNode deserialization

…or TraitCondition

gustavodemorais · 2026-04-22T16:32:30Z

I've tried to simplify the code as much as possible, @twalthr. It's not trivial to implement the feature: having only one place where we resolve the traits is unfortunately conceptually not possible. I've adjusted the code so that all downstream code can simply read the traits and assume they're resolved. For that, I had to resolve it in three places

SystemTypeInference.resolveStaticArgs — validation, called once each from inferInputTypes and inferType. Twice per validation pass.
BridgingSqlFunction.resolveCallTraits — planning, called from FlinkLogicalTableFunctionScan converter.
BridgingSqlFunction.resolveCallTraits — restore, called from StreamExecProcessTableFunction.@JsonCreator.

We have to resolve it inside SystemTypeInference's inferType and inferInputType because we only there have the actual Call passed as a param. We have to resolve it in FlinkLogicalTableFunctionScan because it's the first place we actually create the RexCall. And we also have to resolve it StreamExecProcessTableFunction which is where we restore our unresolved RexCall from the compiled plan.

Now, all relevant places where we read traits should receive the resolved traits

twalthr

Great job @gustavodemorais. The PR looks much better now. I left 2 remaining comments. But should be good in the next iteration.

twalthr · 2026-04-23T10:43:45Z

+     * kind + args}; the {@code impl} predicate is reused but never compared, so two conditions
+     * built from the same factory inputs are equal.
+     */
+    final class BuiltInCondition implements TraitCondition {


Suggested change

final class BuiltInCondition implements TraitCondition {

private final class BuiltInCondition implements TraitCondition {

Java forbids private on nested types inside an interface. I've moved it to its own file. Top-level package-private class achieves the same encapsulation; users outside org.apache.flink.table.types.inference can't reach BuiltInCondition

twalthr · 2026-04-23T10:49:36Z

+     * org.apache.flink.table.types.inference.CallContext}, since the planner doesn't carry one. The
+     * validation-time equivalent is {@link TraitContext#of}.
+     */
+    private static TraitContext buildTraitContext(


Reusing toCallContext from StreamPhysicalProcessFunction was not an option? Relying directly on RexLiteral might cause issues in the future. It is better to reuse the provided wrapper.

My thinking was that having the code in the BridgingSqlFunction importing thing from the physical layer felt like a bad idea. We would be calling in the early logical layer some code that already in the physical layer and we might create a weird dependency.

But it's a valid point that we should have try to have the logic only once and reuse. I've moved "toCallContext" to the BridgingSqlFunction.java and we're reusing it in StreamPhysicalProcessTableFunction.Java. I think that makes sense and it's also not a lot of code

…stricter encapsulation

…dedup toCallContext

gustavodemorais · 2026-04-23T13:04:28Z

Thank you, @twalthr. I've addressed both comments

twalthr

LGTM, thanks @gustavodemorais!

raminqaf

LGTM! @gustavodemorais Thanks!

…S.md

gustavodemorais marked this pull request as ready for review April 8, 2026 14:46

twalthr reviewed Apr 9, 2026

View reviewed changes

gustavodemorais marked this pull request as draft April 14, 2026 09:28

raminqaf reviewed Apr 21, 2026

View reviewed changes

github-actions Bot added the community-reviewed PR has been reviewed by the community. label Apr 21, 2026

gustavodemorais force-pushed the FLINK-39392 branch 2 times, most recently from 18f4c00 to 9e1ea8a Compare April 21, 2026 11:16

gustavodemorais marked this pull request as ready for review April 21, 2026 11:16

gustavodemorais requested review from raminqaf and twalthr April 21, 2026 11:17

[FLINK-39392][table] Support conditional traits for PTF table arguments

8e8700d

gustavodemorais force-pushed the FLINK-39392 branch from 9e1ea8a to 8e8700d Compare April 21, 2026 11:39

[FLINK-39392][table] Rename isPtfUpsert to ptfRequiresUpdateBefore

c0e2242

twalthr reviewed Apr 21, 2026

View reviewed changes

gustavodemorais added 2 commits April 21, 2026 16:00

[FLINK-39392][table] Improve documentation and simplify code

cacf1f0

[FLINK-39392][table] Resolve traits once in StreamPhysicalProcessTabl…

aa4f50c

…eFunction and reuse it

gustavodemorais force-pushed the FLINK-39392 branch from aedec22 to aa4f50c Compare April 21, 2026 14:25

raminqaf reviewed Apr 22, 2026

View reviewed changes

gustavodemorais requested a review from twalthr April 22, 2026 14:22

gustavodemorais force-pushed the FLINK-39392 branch from f7b4498 to e0c725d Compare April 22, 2026 15:00

[FLINK-39392][table] Simplify trait resolution in TableFunctionScan a…

2c04ce1

…nd ExecNode deserialization

gustavodemorais force-pushed the FLINK-39392 branch from e0c725d to 0c98e03 Compare April 22, 2026 16:03

[FLINK-39392][table] Implement hash and equals via BuiltInCondition f…

f1fe2f2

…or TraitCondition

gustavodemorais force-pushed the FLINK-39392 branch from 0c98e03 to f1fe2f2 Compare April 22, 2026 16:24

twalthr reviewed Apr 23, 2026

View reviewed changes

[FLINK-39392][table] Move BuiltInCondition out of TraitCondition for …

608d255

…stricter encapsulation

[FLINK-39392][table] Resolve scalar args via CallContext wrapper and …

3fd9bd3

…dedup toCallContext

twalthr approved these changes Apr 23, 2026

View reviewed changes

raminqaf approved these changes Apr 23, 2026

View reviewed changes

[FLINK-39392][table] Document PTF conditional traits in planner AGENT…

d054743

…S.md

gustavodemorais force-pushed the FLINK-39392 branch from 0213156 to d054743 Compare April 24, 2026 15:24

twalthr merged commit 055edcc into apache:master Apr 27, 2026

	final org.apache.flink.table.types.inference.TraitContext traitCtx =
	final TraitContext traitCtx =

	public interface TraitCondition extends Serializable {
	public interface TraitCondition extends Serializable, Predicate<TraitContext> {

	\| `input` \| Yes \| The input table. With `PARTITION BY`, rows with the same key are co-located and run in the same operator instance. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, providing `PARTITION BY` is recommended for better performance. \|
	\| `input` \| Yes \| The input table. With `PARTITION BY`, rows with the same key are co-located and run in the same operator instance. Without `PARTITION BY`, each row is processed independently. Accepts insert-only, retract, and upsert tables. For upsert tables, a provided `PARTITION BY` must match the upsert key of the subquery. \|

	private static final class ConditionalTrait implements Serializable {
	private static final class ConditionalTrait {

	public interface TraitCondition extends Serializable {
	public interface TraitCondition {

	// Report SET if it may apply - which allows the use of Partition BY
	// Report SET semantics if it may apply - which allows the use of PARTITION BY

	final class BuiltInCondition implements TraitCondition {
	private final class BuiltInCondition implements TraitCondition {

Conversation

gustavodemorais commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts

Documentation

Uh oh!

flinkbot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

twalthr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raminqaf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gustavodemorais commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gustavodemorais commented Apr 21, 2026

Uh oh!

twalthr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gustavodemorais commented Apr 2, 2026 •

edited

Loading

flinkbot commented Apr 2, 2026 •

edited

Loading

gustavodemorais commented Apr 21, 2026 •

edited

Loading

gustavodemorais Apr 21, 2026 •

edited

Loading