Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table #41611

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Jun 15, 2023

What changes were proposed in this pull request?

Implements upsert mode for SaveMode.Append of the MsSql, Postgres, Derby, H2 and oracle JDBC source.

This uses MERGE INTO in combination with a temporary table. A batch of rows is inserted into the temporary table (rather than the target table) and merged into the target table with one MERGE INTO command per batch.

See #41518 for an alternative for databases not supporting MERGE INTO syntax.

Why are the changes needed?

The JDBC writer only supports either truncating the existing table or inserting. Duplicates, i.e. rows with identical values in the primary or unique index columns, cause an exception, permitting updating existing and inserting new rows.

Re-evaluating a partition due to executor loss will insert rows that have been inserted in an earlier attempt, which kills the entier Spark job.

Does this PR introduce any user-facing change?

This adds upsert and upsertKeyColumns options for SaveMode.Append of the JDBC source.

How was this patch tested?

Tests in JdbcSuite and integration suites.

@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from 7a0e4d2 to 9cc6b39 Compare June 15, 2023 09:39
@EnricoMi EnricoMi changed the title [SPARK-38200][SQL] JDBC upsert MERGE INTO using temp table [SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table Jun 15, 2023
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch 3 times, most recently from 1ade6b4 to 76d0429 Compare June 16, 2023 06:57
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from 76d0429 to 0439e5d Compare June 23, 2023 10:23
@EnricoMi EnricoMi changed the title [SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table Jun 23, 2023
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from 95dc877 to 265dd1f Compare June 30, 2023 09:34
@github-actions github-actions bot removed the CORE label Jun 30, 2023
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from 265dd1f to 2c3faec Compare July 18, 2023 12:46
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from cdb889d to db2d78d Compare July 26, 2023 12:32
@github-actions github-actions bot removed the INFRA label Jul 26, 2023
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from db2d78d to 464a19a Compare October 9, 2023 13:39
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch 2 times, most recently from 7658bbc to 80a2a9c Compare October 26, 2023 10:21
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch 3 times, most recently from 38ce6b9 to 60e41ca Compare January 23, 2024 09:04
@github-actions github-actions bot removed the CONNECT label Jan 23, 2024
@EnricoMi EnricoMi force-pushed the jdbc-upsert-merge-temp-table branch from 10fd22b to e3ade6e Compare April 20, 2024 16:19
@github-actions github-actions bot removed the DOCS label Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant