[FLINK-39261][table] Add FROM_CHANGELOG built-in process table function by raminqaf · Pull Request #27901 · apache/flink

raminqaf · 2026-04-07T12:06:03Z

What is the purpose of the change

Implement the FROM_CHANGELOG built-in process table function as specified in FLIP-564, section 4.1.3.1 (append-only stream to upsert stream, flat mode).

FROM_CHANGELOG converts an append-only table with an explicit operation code column (e.g., Debezium's 'c', 'u', 'd') into a dynamic table backed by a Flink upsert stream ({+I, +U, -D}). This is the reverse of TO_CHANGELOG. The implementation is stateless — each input record maps directly to one output record with the appropriate RowKind.

SELECT * FROM FROM_CHANGELOG(
    input => TABLE cdc_stream PARTITION BY id,
    op => DESCRIPTOR(__op),
    op_mapping => MAP['c, r', 'INSERT', 'u', 'UPDATE_AFTER', 'd', 'DELETE']
)

Brief change log

Add optional ChangelogFunction delegate to BuiltInFunctionDefinition so built-in PTFs can declare their output changelog mode
Update FlinkChangelogModeInferenceProgram to handle the new delegate
Add FromChangelogTypeStrategy with input validation (op column existence, STRING type, op_mapping value validation) and output type inference (removes op column from output)
Add FROM_CHANGELOG built-in function definition with ChangelogMode.upsert(false) output
Add FromChangelogFunction runtime implementation using ProjectedRowData for zero-copy projection
Add fromChangelog() convenience method to PartitionedTable Table API
Add documentation for FROM_CHANGELOG in changelog.md

Verifying this change

This change added tests and can be verified as follows:

Added 7 unit tests for FromChangelogTypeStrategy input validation (valid mapping, op column not found, wrong type, invalid descriptor, invalid RowKind, UPDATE_BEFORE rejected, duplicate RowKind)
Added 11 semantic tests covering: default op_mapping, Debezium-style mapping, custom op column name, unmapped codes dropped, Table API convenience method, and 6 error validation tests
Added 2 plan tests verifying changelog mode propagation (changelogMode=[I,UA,D] for PTF output, changelogMode=[I] for source input)

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): yes (PartitionedTable.fromChangelog())
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? docs / JavaDocs

flinkbot · 2026-04-07T12:10:47Z

CI report:

a56c0af Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

gustavodemorais

Thanks for the PR, @raminqaf! In general, it looks very consistent with to_changelog.

Here is my first set of reviews. In this one, I think there are some minor details we have to decide on and two bigger things:

Going with retract by default as the output mode
Finding a nice way of expressing the changelog mode with the BuiltInFunctionsDefinition

See below for more details

docs/content/docs/sql/reference/queries/changelog.md

gustavodemorais · 2026-04-07T15:43:00Z

docs/content/docs/sql/reference/queries/changelog.md

+SELECT * FROM FROM_CHANGELOG(
+  input => TABLE source_table PARTITION BY key_col,
+  [op => DESCRIPTOR(op_column_name),]
+  [op_mapping => MAP['c, r', 'INSERT', 'u', 'UPDATE_AFTER', 'd', 'DELETE']]


The invalid_op_handling param would already be relevant here but we can add it in a follow up PR

Good point and good idea, let's keep this PR not grow too big.

gustavodemorais · 2026-04-07T15:46:10Z

docs/content/docs/sql/reference/queries/changelog.md

+
+#### Default op_mapping
+
+When `op_mapping` is omitted, the following standard names are used:


We're missing the UPDATE_BEFORE -> UPDATE_BEFORE

We're actually not going to drop UPDATE_BEFORE in the default implementations. We want a simply flat mapping in the default behavior

Added a dynamic way to detect the changelog mode based on the op_mapping:

no op_mapping (default) -> retract

op_mapping not containing UPDATE_BEFORE -> upsert

op_mapping containing UPDATE_BEFORE -> retract

docs/content/docs/sql/reference/queries/changelog.md

flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/PartitionedTable.java

gustavodemorais · 2026-04-07T15:59:05Z

...-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinitions.java

+                                @Override
+                                public ChangelogMode getChangelogMode(
+                                        ChangelogContext changelogContext) {
+                                    return ChangelogMode.upsert(false);


I think we'll go with ChangelogMode.ALL

Please refer too: #27901 (comment)

gustavodemorais · 2026-04-07T16:04:18Z

...k-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinition.java

            String runtimeClass,
-            boolean isInternal) {
+            boolean isInternal,
+            @Nullable ChangelogFunction changelogFunction) {


We need to add a way for BultInFunctions to set the ChangelogMode but this adding it like this doesn't feel very clean. We also have to add some dead code below in the func definition when creating 'new ChangelogFunction()' and it's very specific.

A simple alternative would be just going adding changelogMode to the BuiltInFunctionDefinition, but this might not be enough for eventual optimizations where we want to decide between append/upsert/retract stream output depending on the args inside opMapping

BuiltInFunctionDefinition.newBuilder() .name("FROM_CHANGELOG") .outputChangelogMode(ChangelogMode.upsert(false)) .build();

I think we have to sync with Timo on this. Do you have other alternatives you think could be better to solve this?

I have added a new changelogModeResolver that takes in a Function with the CallContext and outputs a ChangelogMode

/** * Sets a resolver that dynamically determines the output {@link ChangelogMode} for this * built-in PTF. The resolver receives the {@link CallContext} and can inspect function * arguments (e.g., op_mapping) to adapt the changelog mode. Only needed for PTFs that emit * updates (e.g., FROM_CHANGELOG). */ public Builder changelogModeResolver( Function<CallContext, ChangelogMode> changelogModeResolver) { this.changelogModeResolver = changelogModeResolver; return this; }

gustavodemorais

Thanks for the updates, Ramin! Added some comments, take a look

gustavodemorais · 2026-04-10T10:52:20Z

docs/content/docs/sql/reference/queries/changelog.md

 -- +U[id:1, name:'Alice2']
 -- -D[id:2, name:'Bob']
+
+-- Table state after all events: {id:1, name:'Alice2'}


nit: this looks more like json. Can you use a table format?

gustavodemorais · 2026-04-10T10:55:04Z

...k-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinition.java

Can't we implement "ChangelogFunction" and then have a default implementation keep the current behavior and then just change this for our new builtin function

I tried this but, implementing ChangelogFunction on BuiltInFunctionDefinition can't give us dynamic changelog mode based on (input) arguments, because ChangelogContext only provides input changelog modes and downstream requirements and not the function's arguments like op_mapping.

We need the CallContext to inspect the inputs (i.e., from op_mapping), which is what the resolver provides.

I see. What do you think about extending ChangelogContext and adding new functionality to it like getScalarValue to make it able to access arguments?

gustavodemorais · 2026-04-10T10:56:47Z

...rg/apache/flink/table/planner/plan/optimize/program/FlinkChangelogModeInferenceProgram.scala

+    val inputChangelogModes = children.map(toChangelogMode(_, None, None))
+    val changelogModeOpt: Option[ChangelogMode] = definition match {
+      // User-defined PTFs that implement ChangelogFunction
+      case cf: ChangelogFunction =>


Ideally we don't touch this file. If you implement ChangelogFunction that might be enough and we can then skip these changes

See comment: #27901 (comment)

gustavodemorais · 2026-04-10T11:33:09Z

docs/content/docs/sql/reference/queries/changelog.md

+
+```sql
+SELECT * FROM FROM_CHANGELOG(
+  input => TABLE source_table PARTITION BY key_col,


We're moving to row semantics as the first default version - no partition by. I'm finishing a new PR that should help you see the changes that you'll have to make

Here is the PR #27911

gustavodemorais · 2026-04-10T14:57:57Z

...k-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinition.java


    private final SqlCallSyntax sqlCallSyntax;

+    private final @Nullable Function<ChangelogFunction.ChangelogContext, ChangelogMode>


Suggested change

private final @Nullable Function<ChangelogFunction.ChangelogContext, ChangelogMode>

private final @Nullable Function<ChangelogContext, ChangelogMode>

gustavodemorais · 2026-04-10T14:59:58Z

...c/main/java/org/apache/flink/table/types/inference/strategies/FromChangelogTypeStrategy.java

+     */
+    @SuppressWarnings({"unchecked", "rawtypes"})
+    public static ChangelogMode resolveChangelogMode(
+            final ChangelogFunction.ChangelogContext changelogContext) {


I think we almost always follow this pattern

Suggested change

final ChangelogFunction.ChangelogContext changelogContext) {

final ChangelogContext changelogContext) {

You've done this in multiple locations. Can you take a look?

gustavodemorais · 2026-04-10T15:02:38Z

...rg/apache/flink/table/planner/plan/optimize/program/FlinkChangelogModeInferenceProgram.scala

-    definition match {
-      case changelogFunction: ChangelogFunction =>
-        val inputChangelogModes = children.map(toChangelogMode(_, None, None))
+    val inputChangelogModes = children.map(toChangelogMode(_, None, None))


can we undo the changes to this file? This is one of the trickiest files so we usually want to change it if we have a good reason

gustavodemorais · 2026-04-10T15:15:37Z

...c/main/java/org/apache/flink/table/types/inference/strategies/FromChangelogTypeStrategy.java

+                        throwOnFailure,
+                        String.format(
+                                "Invalid target mapping for argument 'op_mapping'. "
+                                        + "Duplicate change operation: '%s'.",


Some users may think they can they need two entries mapping different user codes to the same RowKind (e.g., 'c' -> INSERT, 'r'
-> INSERT).

I think here we can have a better message: "If you need mulitple change codes to map to the same type, use a comma separated list, e.g. 'c, r' -> 'INSERT'. What do you think?

gustavodemorais · 2026-04-10T15:17:05Z

...t/java/org/apache/flink/table/planner/plan/nodes/exec/stream/FromChangelogSemanticTests.java

+                FromChangelogTestPrograms.DEBEZIUM_MAPPING,
+                FromChangelogTestPrograms.UNMAPPED_CODES_DROPPED,
+                FromChangelogTestPrograms.TABLE_API_DEFAULT,
+                FromChangelogTestPrograms.MISSING_PARTITION_BY);


I think there are two good tests we could add here

round-trip FROM_CHANGELOG(TO_CHANGELOG(table)) (important, might work better when we merge [FLINK-39419][table] Switch TO_CHANGELOG to row semantics with full deletes + require update before #27911 which is approved)

custom op column name - There's no test for op => DESCRIPTOR(operation)

[FLINK-39261][table] Add FROM_CHANGELOG built-in process table function

c9f0b44

gustavodemorais reviewed Apr 7, 2026

View reviewed changes

github-actions bot added the community-reviewed PR has been reviewed by the community. label Apr 7, 2026

[FLINK-39261][table] Add dynamic UPDATE_BEFORE changelog detection

f11a83a

raminqaf force-pushed the FLINK-39261 branch from 2a641f6 to f11a83a Compare April 9, 2026 14:29

gustavodemorais reviewed Apr 10, 2026

View reviewed changes

[FLINK-39261][table] Improve doc formatting

1bda3fc

gustavodemorais reviewed Apr 10, 2026

View reviewed changes

[FLINK-39261][table] Fetch argument value in ChangelogContext

a56c0af

gustavodemorais reviewed Apr 10, 2026

View reviewed changes


		#### Default op_mapping

		When `op_mapping` is omitted, the following standard names are used:


		private final SqlCallSyntax sqlCallSyntax;

		private final @Nullable Function<ChangelogFunction.ChangelogContext, ChangelogMode>

	final ChangelogFunction.ChangelogContext changelogContext) {
	final ChangelogContext changelogContext) {

Conversation

raminqaf commented Apr 7, 2026

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

gustavodemorais left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gustavodemorais Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gustavodemorais left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gustavodemorais Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gustavodemorais Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

flinkbot commented Apr 7, 2026 •

edited

Loading

gustavodemorais left a comment •

edited

Loading

gustavodemorais Apr 7, 2026 •

edited

Loading

gustavodemorais Apr 10, 2026 •

edited

Loading

gustavodemorais Apr 10, 2026 •

edited

Loading