Skip to content

[flink] Skip emitting update events when only non-projected columns are modified in source connector #3095

@wuchong

Description

@wuchong

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Users frequently query wide tables in Flink where only a subset of columns is required. For example, a Fluss table contains columns a, b, and c, but the Flink job only projects a and b. Currently, if the upstream system updates only column c while a and b remain unchanged, the Fluss connector still reads and emits this update event. This triggers unnecessary deserialization, network transfer, and downstream computation in Flink, leading to wasted resources and reduced throughput.

The connector should leverage projection pushdown metadata to filter update events at the source level. When an update event modifies only columns that are not included in the projected column list, the connector should skip emitting this record to the Flink runtime.

Solution

  1. Extract the exact list of projected columns from the Flink planner pushdown mechanism.
  2. Leverage the underlying storage read path to identify which columns were actually modified in each update event.
  3. Implement a filtering layer in the reader that compares changed columns against projected columns. If the intersection is empty, discard the event before deserialization and transmission.
  4. Ensure correctness for primary key tracking, watermark propagation, state management, and schema evolution scenarios.

Anything else?

Benefits

  • Significantly reduces CPU, memory, and network overhead for Flink streaming jobs.
  • Improves end-to-end throughput and lowers latency in wide-table CDC and streaming scenarios.
  • Aligns with modern connector optimization practices for column pruning and change filtering.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions