[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20542

ganlute · 2022-08-11T03:59:03Z

What is the purpose of the change

https://issues.apache.org/jira/browse/FLINK-28910

Brief change log

*Add reduce when hbase connector process mutation.

Verifying this change

CI passed

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no

flinkbot · 2022-08-11T04:06:19Z

CI report:

3b56735 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

ganlute · 2022-08-12T13:49:52Z

@flinkbot run azure

ganlute · 2022-08-18T12:18:12Z

@flinkbot run azure

ganlute · 2022-08-18T18:48:00Z

hi @luoyuxia, could you please help me review the changes？Thank you.

ganlute · 2022-08-25T11:50:25Z

@wuchong @dannycranmer could you please help me review the changes？Thank you.😄

ganlute · 2022-09-02T09:04:18Z

@MartijnVisser could you please help me review the changes？Thank you.😄

kylemeow

For clarity, the title could be changed to something like 'Fix potential data deletion while updating HBase rows'. Just my suggestion : )

ganlute · 2022-09-13T14:58:27Z

For clarity, the title could be changed to something like 'Fix potential data deletion while updating HBase rows'. Just my suggestion : )

Thank you for your suggestion, I think it is really much more clear.

MartijnVisser · 2022-09-15T07:54:11Z

@MartijnVisser could you please help me review the changes？Thank you.😄

@ganlute I have no experience with HBase, so I can't review it unfortunately. I think the Flink community is lacking on HBase maintainers in general to be honest.

MartijnVisser · 2022-09-15T07:58:18Z

@leonardBang Do you think you could have a look? Since you have experience with CDC, I thought you might could help out here :)

dannycranmer

I also have no experience with HBase, however your thread safe logic looks ok, besides the following callouts.

dannycranmer · 2022-09-15T08:01:18Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

@@ -76,6 +79,7 @@
    private transient ScheduledExecutorService executor;
    private transient ScheduledFuture scheduledFuture;
    private transient AtomicLong numPendingRequests;
+    private static Map<byte[], Mutation> mutationMap = new HashMap<>();


Why is this static? This means subtasks of the same job would all have access to, and try to flush the same data.

Thank you for your review. I agree with you. I try to make it be the global queue to reduce the rowkey before. However, in this case, In fact, there is no need to declare it as static.

I will fix it later.

dannycranmer · 2022-09-15T08:03:35Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

@@ -213,6 +225,7 @@ public void close() throws Exception {

        if (mutator != null) {
            try {
+                flush();


While your thread safety looks ok (besides the static Map), generally speaking flush() on close() can cause issues. If the destination is down, the job might fail to stop. A better solution is to checkpoint the internal buffer. Longer term we can consider migrating to the Async Sink base.

Hi @dannycranmer , personally I reckon this might not be an issue because according to the documentation, the close() method of mutator inherently flushes buffered data to the HBase server before closing the connection, so the flush logic is already there before this PR.

Also, as it is already in the try ... catch block, when an IOException is thrown by the client during the flush, the job-stopping process would not be interrupted as well.

master:
close(){
...
mutator.close();
...
}

this pr:
close(){
...
flush();
mutator.close();
...
}

On the one hand, mutator.close will call mutator.flush as well. I think the metioned problem will happen at master as well. On the other hand, If the destination is down, mutator.close/mutator.flush will throw IOException.

zjuwangg · 2022-09-15T12:48:14Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

-
+        Mutation mutation = mutationConverter.convertToMutation(value);
+        synchronized (mutationMap) {
+            mutationMap.put(mutation.getRow(), mutation);


In this way, all mutations with the same rowkey value will only remain the last one.
But mutationConverter behavior is not controlled, which means such mutations as following list will casuse data quaility problem.
-D (rk1, f1:v1)

+I (rk1, f2:v2)

For now, all mutationConverter implementations is ok in this case.

…ting HBase rows

ganlute · 2022-09-18T14:01:37Z

The failure of CI seems to have nothing to do with pr.

kylemeow · 2022-09-20T02:54:40Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

-
+        Mutation mutation = mutationConverter.convertToMutation(value);
+        synchronized (mutationMap) {
+            mutationMap.put(mutation.getRow(), mutation);


One potential bug might arise when the mutation.getRow() returns an array. As we know, the hashCode and equals of two different array instances are different regardless of whether their contents are identical.

To overcome this, I suggest to convert it to a base64 string, e.g.:

String key = Base64.getEncoder().encodeToString(mutation.getRow());

Or to create a simple wrapper class, where the equals and hashCode are overridden properly for arrays.

thank you for yours suggestion, I will improve it.

ferenc-csaky

Thank you for the ByteBuffer improvement, I added another comment, but the logic itself LGTM.

ferenc-csaky · 2022-11-04T13:28:41Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

@@ -76,6 +80,7 @@
    private transient ScheduledExecutorService executor;
    private transient ScheduledFuture scheduledFuture;
    private transient AtomicLong numPendingRequests;
+    private Map<ByteBuffer, Mutation> mutationMap = new HashMap<>();


This field could be final as well. Can we move it under mutationConverter and init it inside the ctor to be consistent with the field initialization?

YesOrNo828

@ganlute I tested about 2 million data ingest into the HBase,
When sink.buffer-flush.max-rows=1000, the data can finally be consistent.
When sink.buffer-flush.max-rows=1, the data cannot be guaranteed to be consistent

YesOrNo828 · 2023-06-25T13:07:30Z

...nector-hbase-base/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java

@@ -201,6 +208,12 @@ public void invoke(T value, Context context) throws Exception {
    }

    private void flush() throws IOException {
+        synchronized (mutationMap) {


Adding mutationMap to drop duplicated data on the client side could not avoid the data consistency issue. For example:
sink.buffer-flush.max-rows=1
+I(1,...)
-U(1,...)
+U(1,...)
These three rows are put into HBase with the same timestamp version.
In the end, HBase cannot find the data with rowkey=1.

Hi @YesOrNo828 , this issue has already been addressed in FLINK-32139, and the PR is merged into master. Therefore, this PR may no longer be needed.

MartijnVisser · 2023-06-29T14:23:45Z

So is this superseded by #22612 or not?

ferenc-csaky · 2023-06-29T16:21:29Z

So is this superseded by #22612 or not?

Yes, the 2 issues have the same root cause, that an insert and a delete operation are passed to HBase with the same millisecond precision TS and in that case, the order of the HBase execution is not guaranteed. The changes made in #22612 explicitly sets nanosecond precision timestamps for the HBase operations, so it eliminates the possibility to have multiple operations "at the same time", so deletes and inserts will be executed in correct order.

ganlute closed this Aug 11, 2022

ganlute deleted the FLINK-28910 branch August 11, 2022 04:00

ganlute restored the FLINK-28910 branch August 11, 2022 04:00

ganlute reopened this Aug 11, 2022

flinkbot added the component=Connectors/HBase label Aug 11, 2022

ganlute changed the title ~~[FLINK-28910]CDC From Mysql To Hbase Bugs~~ [WIP][FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs Aug 11, 2022

ganlute changed the title ~~[WIP][FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs~~ [FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs Aug 12, 2022

ganlute force-pushed the FLINK-28910 branch from f13f8dc to 3963728 Compare August 12, 2022 14:33

ganlute changed the title ~~[FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs~~ [WIP][FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs Aug 12, 2022

ganlute force-pushed the FLINK-28910 branch from e2a223e to 62609bb Compare August 18, 2022 15:39

ganlute changed the title ~~[WIP][FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs~~ [FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs Aug 18, 2022

ganlute changed the title ~~[FLINK-28910][connectors/hbase]CDC From Mysql To Hbase Bugs~~ [FLINK-28910][Connectors/hbase]CDC From Mysql To Hbase Bugs Aug 25, 2022

ganlute changed the title ~~[FLINK-28910][Connectors/hbase]CDC From Mysql To Hbase Bugs~~ [FLINK-28910][Connectors/hbase]Hbase Sink Bug Sep 13, 2022

kylemeow reviewed Sep 13, 2022

View reviewed changes

ganlute changed the title ~~[FLINK-28910][Connectors/hbase]Hbase Sink Bug~~ [FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows Sep 13, 2022

dannycranmer requested changes Sep 15, 2022

View reviewed changes

zjuwangg reviewed Sep 15, 2022

View reviewed changes

[FLINK-28910][Connectors/hbase]Fix potential data deletion while upda…

0b7ce93

…ting HBase rows

ganlute force-pushed the FLINK-28910 branch from 308b10a to 0b7ce93 Compare September 16, 2022 14:06

ganlute mentioned this pull request Sep 17, 2022

[WIP][FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20851

Closed

kylemeow reviewed Sep 20, 2022

View reviewed changes

[FLINK-28910][Connectors/hbase]Use ByteBuffer instead of []byte

3b56735

ferenc-csaky reviewed Nov 4, 2022

View reviewed changes

YesOrNo828 reviewed Jun 25, 2023

View reviewed changes

MartijnVisser closed this Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20542

[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20542

ganlute commented Aug 11, 2022

flinkbot commented Aug 11, 2022 •

edited

Loading

ganlute commented Aug 12, 2022

ganlute commented Aug 18, 2022

ganlute commented Aug 18, 2022

ganlute commented Aug 25, 2022

ganlute commented Sep 2, 2022

kylemeow left a comment

ganlute commented Sep 13, 2022

MartijnVisser commented Sep 15, 2022

MartijnVisser commented Sep 15, 2022

dannycranmer left a comment •

edited

Loading

dannycranmer Sep 15, 2022 •

edited

Loading

ganlute Sep 16, 2022

ganlute Sep 16, 2022

dannycranmer Sep 15, 2022

kylemeow Sep 16, 2022

ganlute Sep 16, 2022

zjuwangg Sep 15, 2022 •

edited

Loading

zjuwangg Sep 16, 2022

ganlute commented Sep 18, 2022

kylemeow Sep 20, 2022

ferenc-csaky Sep 30, 2022

ganlute Nov 4, 2022

ferenc-csaky left a comment

ferenc-csaky Nov 4, 2022

YesOrNo828 left a comment

YesOrNo828 Jun 25, 2023

kylemeow Jun 26, 2023 •

edited

Loading

MartijnVisser commented Jun 29, 2023

ferenc-csaky commented Jun 29, 2023

[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20542

[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows #20542

Conversation

ganlute commented Aug 11, 2022

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Aug 11, 2022 • edited Loading

CI report:

ganlute commented Aug 12, 2022

ganlute commented Aug 18, 2022

ganlute commented Aug 18, 2022

ganlute commented Aug 25, 2022

ganlute commented Sep 2, 2022

kylemeow left a comment

Choose a reason for hiding this comment

ganlute commented Sep 13, 2022

MartijnVisser commented Sep 15, 2022

MartijnVisser commented Sep 15, 2022

dannycranmer left a comment • edited Loading

Choose a reason for hiding this comment

dannycranmer Sep 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjuwangg Sep 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganlute commented Sep 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferenc-csaky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YesOrNo828 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylemeow Jun 26, 2023 • edited Loading

Choose a reason for hiding this comment

MartijnVisser commented Jun 29, 2023

ferenc-csaky commented Jun 29, 2023

flinkbot commented Aug 11, 2022 •

edited

Loading

dannycranmer left a comment •

edited

Loading

dannycranmer Sep 15, 2022 •

edited

Loading

zjuwangg Sep 15, 2022 •

edited

Loading

kylemeow Jun 26, 2023 •

edited

Loading