[SUPPORT] HoodieDeltaStreamer AWSDmsAvroPayload fails to handle deletes in MySQL #6590
Labels
on-call-triaged
priority:critical
production down; pipelines stalled; Need help asap.
release-0.11.0
writer-core
Issues relating to core transactions/write actions
Describe the problem you faced
inspired by this blog, I am trying to set up Hudi Deltastreamer to continuously pick up changes in MySQL for a performance benchmark. My setup hosts MySQL on AWS RDS, captures changes in MySQL with AWS DMS as Parquet in S3, and runs HoodieDeltaStreamer with
--continuous
on AWS EMR to write the changes into a Hudi table on S3.It's working fine with updates and inserts but throws exceptions on deletes. The row deleted in MySQL is not deleted in the Hudi table. I am new to Hudi so it's possible I have something configured wrong.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Hudi should monitor and capture any changes (Inserts, updates, and deletes) in the MySQL table and writes them into the Hudi table.
I specified
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload
, which should tell Hudi the right way to handle a row withOp = D
. I.e. when a row in MySQL is deleted, Hudi should capture the change and delete the corresponding row in the Hudi table.Environment Description
Hudi version : 0.11.0
Spark version : 3.2.1
Hive version : should be irrelevant, but 3.1.3
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
The command I ran to start Hudi is slightly different from that provided in the blog. The original one didn't work for me out of the box. Please let me know if I passed in the wrong configs in the command that might've caused this issue.
Stacktrace
The text was updated successfully, but these errors were encountered: