Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19478][SS] JDBC Sink #17190

Closed
wants to merge 16 commits into from
Closed

Conversation

GaalDornick
Copy link

What changes were proposed in this pull request?

Implementation of Sink that supports storing structured streaming data into a JDBC compliant RDBMS database. It supports Overwrite and Append modes. By default it supports atleast once operations and can be configured to support exactly once

To keep track of batches that have been written to a table, it creates a log table with the name $_SINK_LOG. This table has 2 columns: batchID and status of batch. The status can either be COMMITTED or UNCOMMITTED. When JDBC Sink receives a batch it checks if there is an entry in the sink log table for that batch with status = COMMITTED. If status is COMMITTED, it ignores the batch, other wise it tries the append/overwrite operation

To enable exactly once the client should create a column in the original table that stores the batchID. This column should be of LongType. The name of the column should be passed in the options with the name batchIdCol. If the JDBC Sink finds that this option is set, it will use exactly once mode. In this mode, it will set the batchIdCol to the batch id that is inserting or overwriting the record. Also, in the beginning of the batch, if it finds a batch with status=UNCOMMITTED, it deletes the records in the original table that match the batchID

How was this patch tested?

Implemented JDBCSinkSuite that is modeled along the lines of other Sink tests

@GaalDornick GaalDornick changed the title [SPARK-19478][SS] JDBC Sink [WIP] [SPARK-19478][SS] JDBC Sink Mar 10, 2017
@sgireddy
Copy link

@GaalDornick @AmplabJenkins Appreciate excellent work on bringing jdbc support to streams. Really want this feature complete, is it going to be back-ported to 2.1.x? Could you please provide an update on this?

@wuciawe
Copy link

wuciawe commented Oct 25, 2017

@GaalDornick hi, I think

def quote(colName: String): String = {
  s""""$colName""""
}

should be

def quote(colName: String): String = {
  s"""`$colName`"""
}

(quoted in backticks instead of double quotes)?

@AydinChavez
Copy link

Anybody responsible for bringing this one to life? I think a jdbc sink is an essential feature (or database writing in general) for structured streaming.

One manual way by implementing it via a custom ForEachWriter is not useful due to lots of conn opens/terminations in its open+close methods - see also: https://stackoverflow.com/questions/47130229/spark-2-2-struct-streaming-foreach-writer-jdbc-sink-lag

@GaalDornick
Copy link
Author

Sorry for abandoning this. Micheal Armburst had indicated to me that this should be really be a Spark package, and not part of Spark itself. So, it is unlikely that this will get merged. I haven't got the time to create a package

Also, Spark 2.3 contains a rewrite of the Data SOurce/SInk interface, so this component will probably need a rewrite to the new interface

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mshtelma
Copy link

@GaalDornick
Have you thought about publishing this code as external spark package ?
If you do not mind, I can try doing this.

@gaborgsomogyi
Copy link
Contributor

@GaalDornick are you still working on this? It's open for 2 years...

@gaborgsomogyi
Copy link
Contributor

@HyukjinKwon or somebody can we just close this?
Open for almost 2 years and here is the agreement with Micheal:

Micheal Armburst had indicated to me that this should be really be a Spark package, and not part of Spark itself.

@gaborgsomogyi
Copy link
Contributor

Thanks @HyukjinKwon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants