Skip to content

[SPARK-55689][SQL] Add withSchemaEvolution() in dataframe writer API#55446

Closed
johanl-db wants to merge 4 commits intoapache:masterfrom
johanl-db:SPARK-55689-df-with-schema-evolution
Closed

[SPARK-55689][SQL] Add withSchemaEvolution() in dataframe writer API#55446
johanl-db wants to merge 4 commits intoapache:masterfrom
johanl-db:SPARK-55689-df-with-schema-evolution

Conversation

@johanl-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Introduce a new method .withSchemaEvolution() to the dataframe writer API to enable schema evolution during writes.

This fills the current gap where schema evolution in inserts can only be enabled via SQL using INSERT WITH SCHEMA EVOLUTION. MERGE using dataframe already has a similar user-surface: withSchemaEvolution().

Checks are added to fail when schema evolution is enabled on writes that don't support it: creating or replacing a table, writes on DSv1 tables.

Does this PR introduce any user-facing change?

Yes, users can now enable schema evolution using the dataframe writer API, for example:

df.write
  .mode("append")
  .withSchemaEvolution()
  .saveAsTable(t)

and other variations of writes using DataframeWriter & DataFrameWriterV2.

How was this patch tested?

Added tests covering enabling (and using) schema evolution via all supported dataframe write methods.
Negative tests for calls that don't support schema evolution

@johanl-db
Copy link
Copy Markdown
Contributor Author

The PR is ready for review.
I still need to figure how to get the protobuf files in sync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants