Skip to content

Latest commit

 

History

History
95 lines (56 loc) · 6.86 KB

20201127_02.md

File metadata and controls

95 lines (56 loc) · 6.86 KB

正确的配置和使用 Debezium CDC 捕获、同步 PG 逻辑增量数据

作者

digoal

日期

2020-11-27

标签

PostgreSQL , CDC , Debezium


背景

https://elephanttamer.net/?p=50

注意事项:

  • decoder, 一定要使用 PG 10 以上版本, 内置pgoutput plugin. 因为低版本不支持pgoutput, 效率不高. 当然如果你用的是10以下的PG版本, 可以选择wal2json, decoderbufs
  • 检查wal_level, 需要logical
  • Debezium 版本, 千万不要使用2.2和3.0
  • 手工创建slot(pub发布), 为什么不让debezium自己创建呢, 因为debezium会发布所有表, 有些表可能没有PK, 有些表可能不是你需要订阅的, 非常影响性能.
  • 关注被发布表是否有PK, 建议一定要有PK, 没有PK会导致产生大量wal日志, 同时捕获的日志量也会暴增, 在下游也不能很有效率地恢复.
  • 监控消费(slot, 延迟), 如果终止消费或者debezium出现故障, 消费会延迟, 导致数据库wal文件堆积. 严重的把磁盘写满, 导致数据库故障

Debezium is a popular Open Source change data capture (CDC) solution. It’s based on a set of well-established components (Kafka, Kafka Connect, ZooKeeper) and used successfully in production by companies like Dehlivery or BlaBlaCar. However, the setup is not as straightforward as one may think, and you may encounter some obstacles. In this post, I will share some tips originating from hooking up Debezium to a quite big and busy OLTP Postgres database.

Originally, Debezium was connected to a Postgres DB using a wal2json or decoderbufs plugins. Don’t use them – as native logical replication became available in Postgres 10, a better option called “pgoutput” was introduced in Debezium. From the database perspective, the Debezium connector will be treated just like another logical replication subscriber, and usual rules of logical replication will apply.

Check wal_level

Logical decoding – on which logical replication and Debezium depend on – requires wal_level configuration parameter to be set to “logical”, and its change requires Postgres restart. Check this beforehand and schedule a database restart if necessary.

Pick right Debezium version

NEVER, EVER use Debezium versions between 2.2 and 3.0. Those versions come with a bug which is hard to understand for anoyne who isn’t a Postgres expert, but its consequences are simple and dangerous : using such buggy Debezium version will take the master database down sooner or later. This is because Debezium will consume data changes, but won’t confirm the consumption to the database server. The server will retain all WAL segments since replication slot creation, and you will eventually run out of disk space (or money, if you use some pay-as-you-go storage like Amazon EFS or keep adding new disks).

Create slot and publication manually

While Debezium is capable of creating required database entities (logical replication slot and publication) itself, it’s a bad idea. First – it requires superuser privileges for Debezium user. Second – it will create publication for all tables, what can have unpleasant side effects – for example it will silently block all write queries to tables that don’t have a primary key. Third – you may run into an “publication does not exist” error even if the publication is indeed created correctly, see here.

My advice is – create a publication and a replication slot manually before setting up a connector. Beware though – creation order matters! You have to create publication before you create replication slot, otherwise you will see “publication does not exist” error. The correct order is:

CREATE PUBLICATION debezium FOR TABLE <your table list here>;  
SELECT pg_create_logical_replication_slot('debezium', 'pgoutput');  

You are free to change the table list after that at any time using ALTER PUBLICATION.

On the Debezium side, set:

"publication.name":"debezium",  
"slot.name":"debezium"  

in your JSON config. Of course the “debezium” name isn’t mandatory, you can be more specific – this is just an example.

Also, especially for high-volume databases, it’s recommended to use the Avro format for serialization:

"key.converter": "io.confluent.connect.avro.AvroConverter",  
"value.converter":"io.confluent.connect.avro.AvroConverter"  

Monitor closely

As with any replication slot, if the subscriber doesn’t consume changes, WAL segments will start to pile up. You have to monitor if the Debezium is up, if the replication lag isn’t too big, and if the database server has some free disk space. Most of the time, if Debezium goes down, a simple restart will suffice, but you have to know when to do it. Debezium is much more crash-prone than Postgres, so a monitoring solution is a must-have.

If you look into pg_stat_replication system view in Postgres, here are some differences between Debezium and regular logical replication:

  • The replay lag (or difference between pg_current_wal_lsn and replay_lsn) can become quite big and drop suddenly in one minute intervals. This is because Debezium commits change consumption every minute.
  • The reply_time in pg_stat_replication is in the past – in my case it was 1999-12-21 03:26:12 – and it appears to be weird, but normal.

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat