Skip to content

[FLINK-39576][table] Expose NO_PASS_THROUGH as a PTF trait#28065

Closed
gustavodemorais wants to merge 2 commits intoapache:masterfrom
confluentinc:FLINK-39576
Closed

[FLINK-39576][table] Expose NO_PASS_THROUGH as a PTF trait#28065
gustavodemorais wants to merge 2 commits intoapache:masterfrom
confluentinc:FLINK-39576

Conversation

@gustavodemorais
Copy link
Copy Markdown
Contributor

What is the purpose of the change

Adds a new public PTF argument trait ArgumentTrait.NO_PASS_THROUGH that lets a function fully own its output schema. By default the framework prepends the PARTITION BY columns and appends a rowtime suffix when on_time is
provided; with this trait, neither is added. Useful when a PTF emits its input verbatim or otherwise produces a schema that should not be augmented by the framework.

Brief change log

  • Add public ArgumentTrait.NO_PASS_THROUGH and internal StaticArgumentTrait.NO_PASS_THROUGH.
  • Introduce PassThroughMode enum (KEY, ALL, NONE) derived per table arg from the declared traits.
  • Unify the runtime pass-through collectors (PassAllCollector, PassPartitionKeysCollector) into a single PassThroughCollector that dispatches per-arg by mode.
  • Reject mutually exclusive traits at construction time in StaticArgument.checkTraits via StaticArgumentTrait.getIncompatibleWith().
  • Document the trait and warn about consequences when the PTF does not forward partition keys or watermarked timestamps itself.

Verifying this change

  • Unit tests in StaticArgumentTest and StaticArgumentTraitTest cover trait-to-mode derivation, conditional-trait resolution, mutual-exclusion validation, and symmetry of incompatibility.
  • Plan test ProcessTableFunctionTest#testNoPassThroughFunctionOwnsOutputSchema exercises the end-to-end planner path and verifies the output rowtype omits the partition key.
  • All pre-existing PTF restore tests in ProcessTableFunctionRestoreTests continue to pass against both the new code and the pre-existing JSON plans + savepoints.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes - new ArgumentTrait.NO_PASS_THROUGH constant)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (JavaDocs on ArgumentTrait.NO_PASS_THROUGH and updates to docs/content/docs/dev/table/functions/ptfs.md and the Chinese translation)

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

2.1.117 (Claude Code)

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 29, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

break;
case NONE:
prefix = EMPTY_PREFIX;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing default branch
better to have it otherwise adding new mode will lead to painful debug

case ALL:
return tableArg.getType().getFieldCount();
default:
// KEY
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have dedicated case for KEY?

better to have it otherwise adding new mode will lead to painful debug

Comment on lines +1241 to +1245
callContext
.getTableSemantics(0)
.orElseThrow(IllegalStateException::new)
.dataType()
.notNull()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about extraction into a separate var to reduce formatting

Comment on lines +417 to +419
input ->
input.hasSetSemantics()
&& input.passThroughMode() != PassThroughMode.ALL);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use separate var

.withConditionalTrait(StaticArgumentTrait.NO_PASS_THROUGH, ctx -> false);

final StaticArgument resolved = base.applyConditionalTraits(noopContext());
assertThat(resolved.getPassThroughMode()).isEqualTo(PassThroughMode.KEY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in other places better to compare enums with ==

Suggested change
assertThat(resolved.getPassThroughMode()).isEqualTo(PassThroughMode.KEY);
assertThat(resolved.getPassThroughMode()).isSameAs(PassThroughMode.KEY);

Comment on lines +60 to +85
void passColumnsThroughTraitMapsToAll() {
final StaticArgument arg =
StaticArgument.table(
"in",
Row.class,
false,
EnumSet.of(
StaticArgumentTrait.TABLE,
StaticArgumentTrait.SET_SEMANTIC_TABLE,
StaticArgumentTrait.PASS_COLUMNS_THROUGH));
assertThat(arg.getPassThroughMode()).isEqualTo(PassThroughMode.ALL);
}

@Test
void noPassThroughTraitMapsToNone() {
final StaticArgument arg =
StaticArgument.table(
"in",
Row.class,
false,
EnumSet.of(
StaticArgumentTrait.TABLE,
StaticArgumentTrait.SET_SEMANTIC_TABLE,
StaticArgumentTrait.NO_PASS_THROUGH));
assertThat(arg.getPassThroughMode()).isEqualTo(PassThroughMode.NONE);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a number of tests with input StaticArgument and checking for

assertThat(arg.getPassThroughMode())

can we just parameterized them?
probably some others as well

Comment on lines +382 to +392
trait ->
trait.getIncompatibleWith()
.forEach(
incompatible -> {
if (traits.contains(incompatible)) {
throw new ValidationException(
String.format(
"Invalid argument traits for argument '%s'. Trait %s is mutually exclusive with %s.",
name, trait, incompatible));
}
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and another snippet above are very similar, can we extract common logic or at least vars to void heavy formatting?

The `ArgumentTrait.NO_PASS_THROUGH` instructs the system to omit all framework-added columns. The output is fully
controlled by the function's declared output type.

By default, the framework prepends the `PARTITION BY` columns and appends a `rowtime` column when `on_time` is provided.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rowtime column is not a pass-through column. The value depends on both the incoming on_time value OR the timer firing value. If we don't output it, following temporal operations become unavailable. It is possible to disable on_time for this purpose (via TypeInference.disableSystemArgs) or just don't pass on_time to the call. Therefore, there is an alternative for the user I would vote for only handling the partition key.

@gustavodemorais
Copy link
Copy Markdown
Contributor Author

I've though long and had a deep look at our plans with TO and FROM Changelog and changed my mind. My goal was to make this not impact the metadata that comes out the of PTFs but this is not easily possible - partition keys, unique keys and upsert keys. We need these metadata to be preserved and it's better to keep this specific ptf change out of scope from this init. Thus I'm closing this and we'll move forward with the schema changes while preserving the partition by columns with the help of the ptf framework, which pre appends them at the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants