Skip to content

[SPARK-52812][SQL] Make Spark Connect Catalog.createTable eager#56064

Closed
rishav23 wants to merge 1 commit into
apache:masterfrom
rishav23:fix-spark-52812-createtable-eager-v2
Closed

[SPARK-52812][SQL] Make Spark Connect Catalog.createTable eager#56064
rishav23 wants to merge 1 commit into
apache:masterfrom
rishav23:fix-spark-52812-createtable-eager-v2

Conversation

@rishav23
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR makes Spark Connect Catalog.createTable eager. Previously, createTable() only constructed a lazy DataFrame, requiring users to explicitly trigger an action such as .collect() for the table creation to actually execute. This change eagerly executes the command internally while preserving the existing return type. A regression test has also been added to verify that tables are created immediately without requiring an explicit action.

Why are the changes needed?

Catalog.createTable() is a side-effecting operation and should execute eagerly to match expected Catalog API semantics.

Does this PR introduce any user-facing change?

Yes. Previously spark.catalog.createTable(....) did not immediately create the table in Spark Connect unless an action was triggered. Now the table is created eagerly.

How was this patch tested?

  • Added a regression test in CatalogSuite
  • Ran build/sbt compile

Was this patch authored or co-authored using generative AI tooling?

No

@rishav23 rishav23 force-pushed the fix-spark-52812-createtable-eager-v2 branch from 741a54c to 37369e5 Compare May 22, 2026 14:37
@rishav23 rishav23 force-pushed the fix-spark-52812-createtable-eager-v2 branch from 37369e5 to 5645a6d Compare May 22, 2026 16:24
Copy link
Copy Markdown
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Copy Markdown
Contributor

Merging to master. Thanks!

asf-gitbox-commits pushed a commit that referenced this pull request May 26, 2026
### What changes were proposed in this pull request?
This PR makes Spark Connect Catalog.createTable eager. Previously, createTable() only constructed a lazy DataFrame, requiring users to explicitly trigger an action such as .collect() for the table creation to actually execute. This change eagerly executes the command internally while preserving the existing return type. A regression test has also been added to verify that tables are created immediately without requiring an explicit action.

### Why are the changes needed?
Catalog.createTable() is a side-effecting operation and should execute eagerly to match expected Catalog API semantics.

### Does this PR introduce _any_ user-facing change?
Yes. Previously spark.catalog.createTable(....) did not immediately create the table in Spark Connect unless an action was triggered. Now the table is created eagerly.

### How was this patch tested?

- Added a regression test in CatalogSuite
- Ran build/sbt compile

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #56064 from rishav23/fix-spark-52812-createtable-eager-v2.

Authored-by: rishav23 <sinharishav31@gmail.com>
Signed-off-by: Herman van Hövell <herman@databricks.com>
(cherry picked from commit 6dbe197)
Signed-off-by: Herman van Hövell <herman@databricks.com>
@rishav23
Copy link
Copy Markdown
Contributor Author

Thanks for the review and quick turnaround!
Happy to help improve Spark Connect Catalog API semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants