Prototype: support eager analysis inside Pipelines query functions #53700

sryza · 2026-01-06T20:42:25Z

What changes were proposed in this pull request?

This is a WIP change that adds support for using functions like DataFrame.schema and DataFrame.columns inside pipeline query functions.

The change makes graph resolution partially asynchronous.

Many of the data structures that were previously maintained as local variables inside transformDownNodes have been moved to a GraphAnalysisContext object. Moving them into a separate object makes them accessible from Spark Connect RPC handlers that:

Register query function results
Poll for query functions to execute
Analyze within the context of the graph

Were also essentially introducing a new state that flows can be in during resolution, which is “waiting for query function result”.

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

github-actions · 2026-01-06T20:42:34Z

⚠️ Pull Request Title Validation

This pull request title does not contain a JIRA issue ID.

Please update the title to either:

Include a JIRA ID: [SPARK-12345] Your description
Mark as minor change: [MINOR] Your description

For minor changes that don't require a JIRA ticket (e.g., typo fixes), please prefix the title with [MINOR].

This comment was automatically generated by GitHub Actions

sryza added 6 commits December 19, 2025 18:02

graph analysis context

fe33f30

PythonPipelineSuite stuff

e8fa38d

more test

a06a934

run start_run in test

a53ff15

resolvedGraph

5dad12f

instrumentation and wait for pipeline execution

4dea491

github-actions bot added SQL PYTHON CONNECT labels Jan 6, 2026

sryza changed the title ~~Analyze in query function~~ Prototype: support eager analysis inside Pipelines query functions Jan 6, 2026

remove prints

29580a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype: support eager analysis inside Pipelines query functions #53700

Prototype: support eager analysis inside Pipelines query functions #53700

Uh oh!

sryza commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Prototype: support eager analysis inside Pipelines query functions #53700

Are you sure you want to change the base?

Prototype: support eager analysis inside Pipelines query functions #53700

Uh oh!

Conversation

sryza commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Pull Request Title Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sryza commented Jan 6, 2026 •

edited

Loading

github-actions bot commented Jan 6, 2026 •

edited

Loading