Skip to content

[SPARK-56956][SDP] Introduce AutoCDC Flow Dataclasses#56042

Open
AnishMahto wants to merge 18 commits into
apache:masterfrom
AnishMahto:SPARK-56956-introduce-flow-data-classes
Open

[SPARK-56956][SDP] Introduce AutoCDC Flow Dataclasses#56042
AnishMahto wants to merge 18 commits into
apache:masterfrom
AnishMahto:SPARK-56956-introduce-flow-data-classes

Conversation

@AnishMahto
Copy link
Copy Markdown
Contributor

@AnishMahto AnishMahto commented May 21, 2026

Approved AutoCDC SPIP: https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7


This is a stacked PR. Review incremental diff here: AnishMahto/spark@SPARK-56870-extend-microbatch-with-cdc-metadata...SPARK-56956-introduce-flow-data-classes


What changes were proposed in this pull request?

Introduce dataclass for unresolved AutoCDC flow (AutoCdcFlow) and resolved AutoCDC flow (AutoCdcMergeFlow). Add wiring to analyze an AutoCdcFlow to an AutoCdcMergeFlow.

A small refactor was additionally made on the UnresolvedFlow and ResolvedFlow class hierarchy.

Why are the changes needed?

Support AutoCDC flow registration and analysis. AutoCDC flow execution will be supported in a future PR. Previously, an UnresolvedFlow additionally always represented an untyped-flow; a flow where do not yet know its execution-type, i.e streaming, append-once, etc.

AutoCdcFlow is a specialized flow with support for only streaming flows, hence it represents a flow whose execution-type we know at construction. It is still unresolved at registration time, and needs to go through resolution to determine its position in the DAG and its input/outut schemas.

Hence we introduce the intermediary child UntypedFlow for UnresolvedFlow, which all previous flows are classified as during registration. An AutoCdcFlow directly implements UnresolvedFlow (skipping `UntypedFlow in its inheritance chain) because it is not untyped.

Does this PR introduce any user-facing change?

No, the AutoCDC feature is not released anywhere yet.

How was this patch tested?

ConnectValidPipelineSuite and AutoCdcFlowSuite

Was this patch authored or co-authored using generative AI tooling?

Co-authored.

Generated-by: Claude-Opus-4.7-thinking-xhigh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant