[ISSUE-92] fix bug that flink never flush lsn to PG #107

eric3zhao · 2021-02-23T08:23:06Z

support debezium heartbeat.interval.ms option
flush lsn to pg after offset was writen to checkpoint

wuchong · 2021-02-24T07:14:28Z

Hi @eric3zhao , did you validate whether this PR works well in your production envrionment? Is it possible to add a test to cover this?

Besides, could you drop the first commit which has been merged into master?

eric3zhao · 2021-02-24T08:35:35Z

@wuchong I have not validate this in PRD environment, I only do some test in DEV. test case is:
after method DebeziumSourceFunction#snapshotState is completed

step.1

read offset info from checkpoint "sourceOffset":{"transaction_id":null,"lsn_proc":358680102240,"lsn":358680102240,"ts_usec":1614064954691947}

step.2

use SELECT * from pg_get_replication_slots() get slot info from PG

slot_name	plugin	slot_type	datoid	temporary	active	active_pid	xmin	catalog_xmin	restart_lsn	confirmed_flush_lsn
modelhome_zwz_test_book	decoderbufs	logical	17497	f	f			1394055	53/7E0016A8	53/83000560

step.3

compare sourceOffset.lsn_proc to confirmed_flush_lsn

358680102240 -> 0x5383000560 -> 53/83000560 os offset in checkpoint equals to confirmed_flush_lsn in PG

I also want to write a unit test for this, but I have no idea how to compare the offset in the checkpoint with the lsn in the database in java code

last, I checkout the branch from tag: release-1.1.0 far behind master now, I will rebase my fix branch to master

wuchong · 2021-02-25T02:37:21Z

Please do not use "git merge" to rebase branches, otherwise the changes is hard to track. Please use "git rebase" instead.

eric3zhao · 2021-02-25T05:27:58Z

sorry, but I have already push the commits to the remote, should I create a new PR based on master?

wuchong · 2021-02-25T05:29:24Z

You can force push the branch .

…and flush lsn to pg after offset was writen to checkpoint

eric3zhao · 2021-02-25T07:20:12Z

You can force push the branch .

done

Tan-JiaLiang · 2021-02-26T09:41:12Z

that's great! i found the same situation in my postgres database in these days, it is luck for me to found this PR, i hope this PR will merge into master as soon as possible, thanks for the commitors @eric3zhao @wuchong

wuchong · 2021-02-26T10:45:00Z

Thanks for the contribution @eric3zhao . However, I think there are 3 problem in the PR:

it's very hack to get and invoke on ChangeEventSourceCoordinator. Actually, Debezium provides API to commit offsets by io.debezium.engine.DebeziumEngine.RecordCommitter.
we should commit offsets when checkpoint complete instead of performing checkpoint. Otherwise, we may lose data if the checkpoint is failed.
there is no tests to cover this case.

I have fixed the problem in 4127661.

eric3zhao · 2021-02-27T04:36:56Z

great job, but I recommend update debeziumOffset to the heartbeat‘s offset when received heartbeat record, not just skip it

if (isHeartbeatEvent(record)) {
 emitRecordsUnderCheckpointLock(new ArrayDeque<>(), record.sourcePartition(), record.sourceOffset());					
  continue;
}

because, if skip heartbeat record, offset is updated only when the table data is updated. Suppose I set 'table-name' = 'table_A' then if table_A has on data updated for a long time the WAL disk space keep growing. However, if the data of any table in the PG database is updated, the offset in heartbeat record will be updated.

wuchong · 2021-02-27T05:30:01Z

@eric3zhao , you are right, otherwise the LSN can't be advanced if there is no updates. But we only need to update debeziumOffset under checkpoint lock instead of emitRecordsUnderCheckpointLock.

Do you want to contribute a pull request?

wuchong linked an issue Feb 24, 2021 that may be closed by this pull request

Support ' heartbeat.interval.ms' options to reduce WAL disk space consumption of PG #92

Closed

[postgres][ISSUE-92] support debezium heartbeat.interval.ms option …

4569c6c

…and flush lsn to pg after offset was writen to checkpoint

eric3zhao force-pushed the pg-slot-flush branch from 1328a28 to 4569c6c Compare February 25, 2021 07:18

wuchong closed this Feb 26, 2021

eric3zhao deleted the pg-slot-flush branch February 27, 2021 05:47

eric3zhao mentioned this pull request Feb 27, 2021

[postgres] update debezium offset when receive heartbeat record #111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE-92] fix bug that flink never flush lsn to PG #107

[ISSUE-92] fix bug that flink never flush lsn to PG #107

eric3zhao commented Feb 23, 2021

wuchong commented Feb 24, 2021

eric3zhao commented Feb 24, 2021

wuchong commented Feb 25, 2021

eric3zhao commented Feb 25, 2021

wuchong commented Feb 25, 2021

eric3zhao commented Feb 25, 2021

Tan-JiaLiang commented Feb 26, 2021

wuchong commented Feb 26, 2021 •

edited

eric3zhao commented Feb 27, 2021 •

edited

wuchong commented Feb 27, 2021

[ISSUE-92] fix bug that flink never flush lsn to PG #107

[ISSUE-92] fix bug that flink never flush lsn to PG #107

Conversation

eric3zhao commented Feb 23, 2021

wuchong commented Feb 24, 2021

eric3zhao commented Feb 24, 2021

wuchong commented Feb 25, 2021

eric3zhao commented Feb 25, 2021

wuchong commented Feb 25, 2021

eric3zhao commented Feb 25, 2021

Tan-JiaLiang commented Feb 26, 2021

wuchong commented Feb 26, 2021 • edited

eric3zhao commented Feb 27, 2021 • edited

wuchong commented Feb 27, 2021

wuchong commented Feb 26, 2021 •

edited

eric3zhao commented Feb 27, 2021 •

edited