[Feature]support CDC in framework (not only flink cdc, also for spark) #963

CalvinKirs · 2022-01-06T12:24:56Z

support change data capture(CDC) in SeaTunnel both both in flink and spark backend,

for flink, we ca integrate flink cdc directly, which is actually debezium.
for spark, maybe we can use debezium directly.

xtr1993 · 2022-01-07T07:50:09Z

cdc is for real-time calculations
flink is compatible with debezium to support CDC
maybe others My understanding is that you want to achieve it in other ways without debezium？

chenhu · 2022-01-10T11:30:07Z

It would be woh ~ for spark !

yuangjiang · 2022-01-28T05:50:23Z

spark cdc is feasible, only need to define a spark source to do a simple test. CDC can be developed based on debezium
similar to this
class DefaultSource extends StreamSourceProvider with DataSourceRegister with Logging {
override def sourceSchema(sqlContext: SQLContext, schema: Option[StructType], providerName: String, parameters: Map[String, String]): (String, StructType) = {
(shortName(),schema.get)
}

override def createSource(sqlContext: SQLContext, metadataPath: String, schema: Option[StructType], providerName: String, parameters: Map[String, String]): Source = {
val debeziumOffset:DebeziumOffset = new DebeziumOffset
val handover:Handover = new Handover
val changeConsumer:DebeziumChangeConsumer = new DebeziumChangeConsumer(handover)
val debeziumEngine:SparkDebeziumEngine = new SparkDebeziumEngine(debeziumOffset,handover,changeConsumer)
val javaParameters:java.util.Map[String,String] = new util.HashMapString,String
for (parameter <- parameters){
javaParameters.put(parameter._1,parameter._2)
}
debeziumEngine.EngineInit(javaParameters)
new DebeziumSource(sqlContext,schema.get,debeziumOffset,debeziumEngine)
}

override def shortName(): String = "debezium"
}

obobj · 2022-02-21T03:13:23Z

Is there any progress ?

UUIDUsername · 2022-02-23T05:40:21Z

Can we introduce BloomFilter under the characteristics of timing？

yuangjiang · 2022-02-23T05:41:13Z

this is a spark datasource using by seatunnel connector test code like this

object DebeziumTest {
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession
.builder()
.master("local[3]")
.appName("debezium")
.getOrCreate()

val structType =
  StructType(
      StructField("a", StringType, nullable = true) ::
      StructField("b", DecimalType(6,2), nullable = true) ::
      StructField("c", DecimalType(22,0), nullable = true) ::
      StructField("d", DecimalType(6,0), nullable = true) ::
      StructField("e", DecimalType(22,0), nullable = true) ::
      StructField("f", DecimalType(6,0), nullable = true) ::
        StructField("g", DecimalType(22,0), nullable = true)::
      StructField("h", StringType, nullable = true) ::
      StructField("op", StringType, nullable = true) ::
      StructField("ts_ms", StringType, nullable = true) ::
        Nil
  )

val parameters = new util.HashMap[String,String]()
parameters.put("database.hostname","localhost")
parameters.put("database.port","3306")
parameters.put("database.user","root")
parameters.put("database.password","123456")
parameters.put("database.include.list","test")
parameters.put("snapshot.mode","schema_only")
parameters.put("table.include.list","temp.ts_cdc_test")
parameters.put("connector.class","io.debezium.connector.mysql.MySqlConnector")
val dataFrame = sparkSession
   .readStream
  .options(parameters)
   .schema(structType)
   .format("org.apache.spark.sql.execution.streaming.debezium.DefaultSource")
   .load()

dataFrame.createOrReplaceTempView("test")
sparkSession.sql("select * from test")
  .writeStream
  .format("console")
  .outputMode(OutputMode.Append())
  .start()
  .awaitTermination()

}
}

can support all debezium connetcor by seatunnel stream mode

yuangjiang · 2022-02-23T05:45:36Z

Don't know how to implement this function

…

------------------ 原始邮件 ------------------ 发件人: "apache/incubator-seatunnel" ***@***.***>; 发送时间: 2022年2月23日(星期三) 中午1:40 ***@***.***>; ***@***.******@***.***>; 主题: Re: [apache/incubator-seatunnel] [Feature]support CDC in framework (not only flink cdc, also for spark) (Issue #963) Can we introduce BloomFilter under the characteristics of timing？ — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: ***@***.***>

dijiekstra · 2022-05-08T02:10:44Z

I think we need to upgrade all Row to RowData first in Flink Module

CalvinKirs mentioned this issue Jan 6, 2022

[Roadmap][Volunteer-Wanted] SeaTunnel 2.0 Roadmap Task #720

Open

54 tasks

CalvinKirs changed the title ~~support CDC in framework (not only flink cdc, maybe others)~~ [Feature]support CDC in framework (not only flink cdc, maybe others) Jan 6, 2022

CalvinKirs added discuss Feature labels Jan 6, 2022

CalvinKirs added this to discussion in 2.0 RoadMap Jan 6, 2022

xtr1993 mentioned this issue Jan 7, 2022

[Feature][Binlog] support Binlog #952

Closed

3 tasks

garyelephant removed the Feature label Jan 8, 2022

garyelephant changed the title ~~[Feature]support CDC in framework (not only flink cdc, maybe others)~~ [Feature]support CDC in framework (not only flink cdc, also for spark) Jan 8, 2022

This was referenced Jan 8, 2022

[Feature][Spark] support debezium #946

Closed

[plugin] MySQL Binlog StreamingInput #233

Closed

CalvinKirs mentioned this issue Jan 11, 2022

[OnlineMeeting]2.0 core framework discuss #1010

Closed

7 tasks

CalvinKirs mentioned this issue Feb 28, 2022

[OnlineMeeting]SeaTunnel community meeting Topic collection #1335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]support CDC in framework (not only flink cdc, also for spark) #963

[Feature]support CDC in framework (not only flink cdc, also for spark) #963

CalvinKirs commented Jan 6, 2022 •

edited by garyelephant

Loading

xtr1993 commented Jan 7, 2022

chenhu commented Jan 10, 2022 •

edited

Loading

yuangjiang commented Jan 28, 2022

obobj commented Feb 21, 2022

UUIDUsername commented Feb 23, 2022

yuangjiang commented Feb 23, 2022

yuangjiang commented Feb 23, 2022 via email

dijiekstra commented May 8, 2022

[Feature]support CDC in framework (not only flink cdc, also for spark) #963

[Feature]support CDC in framework (not only flink cdc, also for spark) #963

Comments

CalvinKirs commented Jan 6, 2022 • edited by garyelephant Loading

xtr1993 commented Jan 7, 2022

chenhu commented Jan 10, 2022 • edited Loading

yuangjiang commented Jan 28, 2022

obobj commented Feb 21, 2022

UUIDUsername commented Feb 23, 2022

yuangjiang commented Feb 23, 2022

yuangjiang commented Feb 23, 2022 via email

dijiekstra commented May 8, 2022

CalvinKirs commented Jan 6, 2022 •

edited by garyelephant

Loading

chenhu commented Jan 10, 2022 •

edited

Loading