Skip to content

Conversation

@ericm-db
Copy link
Contributor

@ericm-db ericm-db commented Dec 29, 2025

What changes were proposed in this pull request?

This PR introduces infrastructure for tracking and propagating source identifying names through query analysis for streaming queries. It adds:

  1. StreamingSourceIdentifyingName - A sealed trait hierarchy representing the naming state of streaming sources:

    • UserProvided(name) - Explicitly set via .name() API
    • FlowAssigned(name) - Assigned by external flow systems (e.g., SDP)
    • Unassigned - No name assigned yet (to be auto-generated)
  2. NamedStreamingRelation - A transparent wrapper node that:

    • Carries source identifying names through the analyzer phase
    • Extends UnaryNode for transparent interaction with analyzer rules
    • Stays unresolved until explicitly unwrapped by a future NameStreamingSources analyzer rule
    • Provides withUserProvidedName() to attach user-specified names
  3. NAMED_STREAMING_RELATION tree pattern for efficient pattern matching

Why are the changes needed?

Streaming sources need stable, predictable names for:

  • Checkpoint location stability - Schema evolution and offset tracking require consistent source identification
  • Schema lookup at specific offsets - Analysis-time operations need to reference sources by name
  • Flow integration - SDP and similar systems need per-source metadata paths

By introducing this wrapper during analysis (rather than at execution planning), we enable these capabilities while maintaining a clean separation between parsing, analysis, and execution phases.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests in NamedStreamingRelationSuite covering:

  • Source name state transitions (Unassigned → UserProvided)
  • Output delegation to child plan
  • Tree pattern registration
  • Resolved state behavior
  • String representation

Was this patch authored or co-authored using generative AI tooling?

No.

… identification during analysis

### What changes were proposed in this pull request?

This PR introduces infrastructure for tracking and propagating source identifying names through query analysis for streaming queries. It adds:

1. **StreamingSourceIdentifyingName** - A sealed trait hierarchy representing the naming state of streaming sources:
   - `UserProvided(name)` - Explicitly set via `.name()` API
   - `FlowAssigned(name)` - Assigned by external flow systems (e.g., SDP)
   - `Unassigned` - No name assigned yet (to be auto-generated)

2. **NamedStreamingRelation** - A transparent wrapper node that:
   - Carries source identifying names through the analyzer phase
   - Extends `UnaryNode` for transparent interaction with analyzer rules
   - Stays unresolved until explicitly unwrapped by a future `NameStreamingSources` analyzer rule
   - Provides `withUserProvidedName()` to attach user-specified names

3. **NAMED_STREAMING_RELATION** tree pattern for efficient pattern matching

### Why are the changes needed?

Streaming sources need stable, predictable names for:
- **Checkpoint location stability** - Schema evolution and offset tracking require consistent source identification
- **Schema lookup at specific offsets** - Analysis-time operations need to reference sources by name
- **Flow integration** - SDP and similar systems need per-source metadata paths
- **User control** - Allow users to explicitly name sources via the `.name()` API

By introducing this wrapper during analysis (rather than at execution planning), we enable these capabilities while maintaining a clean separation between parsing, analysis, and execution phases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests in `NamedStreamingRelationSuite` covering:
- Source name state transitions (Unassigned → UserProvided)
- Output delegation to child plan
- Tree pattern registration
- Resolved state behavior
- String representation

### Was this patch authored or co-authored using generative AI tooling?

No.
@ericm-db ericm-db force-pushed the named-streaming-relation branch from d7588b7 to 89b259a Compare December 29, 2025 20:38
Copy link
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants