**🎯 Kết nối Debezium với PostgreSQL - Từng bước cụ thể**
Tôi sẽ hướng dẫn bạn từng bước một cách chi tiết nhất!


**✅ Bước 0: Verify containers đang chạy**


In [None]:
# Check containers
docker ps | grep -E "postgres|kafka|zookeeper|debezium"

# Kết quả phải thấy:
# - postgres_db
# - kafka
# - zookeeper
# - debezium

**🗄️ Bước 1: Chuẩn bị PostgreSQL**
1.1 Verify PostgreSQL config


In [None]:
# Check wal_level (phải là 'logical')
docker exec -it postgres_db psql -U admin -d myapp_db -c "SHOW wal_level;"

# Output phải là:
#  wal_level
# -----------
#  logical

Nếu KHÔNG phải logical:


In [None]:
# Sửa trong docker-compose.yml (bạn đã có rồi):
command:
  - "postgres"
  - "-c"
  - "wal_level=logical"

# Restart PostgreSQL
docker-compose restart postgres

**1.2 Tạo table test**


In [None]:
# Connect vào PostgreSQL
docker exec -it postgres_db psql -U admin -d myapp_db

# Tạo table test
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW()
);

# Insert data mẫu
INSERT INTO users (name, email) VALUES
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com'),
    ('Carol', 'carol@example.com');

# Verify
SELECT * FROM users;

# Exit
\q

**🔌 Bước 2: Verify Debezium đang chạy**


In [None]:
# Check Debezium health
curl http://localhost:8087/

# Output:
# {"version":"3.6.1","commit":"Se3c2b738d253ff5","kafka_cluster_id":"..."}

# Check connectors (lần đầu sẽ empty)
curl http://localhost:8087/connectors

# Output: []

**📝 Bước 3: Tạo Debezium Connector**


**3.1 Tạo file config connector**

In [None]:
{
  "name": "postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",

    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "admin",
    "database.password": "admin",
    "database.dbname": "myapp_db",
    "database.server.name": "postgres",

    "table.include.list": "public.users",

    "plugin.name": "pgoutput",

    "topic.prefix": "postgres",

    "slot.name": "debezium_slot",

    "publication.name": "debezium_publication",

    "snapshot.mode": "initial",

    "decimal.handling.mode": "string",
    "time.precision.mode": "adaptive",

    "heartbeat.interval.ms": "10000",

    "topic.creation.default.partitions": 3,
    "topic.creation.default.replication.factor": 1
  }
}

**3.2 Tạo connector qua REST API**

In [None]:
# Method 1: Dùng curl với inline JSON
curl -X POST http://localhost:8087/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "postgres-connector",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "tasks.max": "1",
      "database.hostname": "postgres",
      "database.port": "5432",
      "database.user": "admin",
      "database.password": "admin",
      "database.dbname": "myapp_db",
      "database.server.name": "postgres",
      "table.include.list": "public.users",
      "plugin.name": "pgoutput",
      "topic.prefix": "postgres",
      "slot.name": "debezium_slot",
      "publication.name": "debezium_publication",
      "snapshot.mode": "initial"
    }
  }'

Hoặc save config vào file rồi dùng:


In [None]:
# Method 2: Dùng file (tiện hơn)
# Save artifact trên thành file: postgres-connector.json

curl -X POST http://localhost:8087/connectors \
  -H "Content-Type: application/json" \
  -d @postgres-connector.json

**3.3 Verify connector created**

In [None]:
# Check connector status
curl http://localhost:8087/connectors/postgres-connector/status | jq

# Output:
{
  "name": "postgres-connector",
  "connector": {
    "state": "RUNNING",  # ← Phải là RUNNING
    "worker_id": "debezium:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",  # ← Task cũng phải RUNNING
      "worker_id": "debezium:8083"
    }
  ]
}

**🔍 Bước 4: Verify PostgreSQL side effects**

Debezium đã tạo replication slot và publication trong PostgreSQL:


In [None]:
# Check replication slot
docker exec -it postgres_db psql -U admin -d myapp_db -c "SELECT * FROM pg_replication_slots;"

# Output:
#  slot_name      | plugin   | slot_type | active | ...
# ----------------+----------+-----------+--------+-----
#  debezium_slot  | pgoutput | logical   | t      | ...

# Check publication
docker exec -it postgres_db psql -U admin -d myapp_db -c "SELECT * FROM pg_publication;"

# Output:
#       pubname          | pubowner | puballtables | ...
# ----------------------+----------+--------------+-----
#  debezium_publication | 10       | f            | ...

# Check publication tables
docker exec -it postgres_db psql -U admin -d myapp_db -c "SELECT * FROM pg_publication_tables;"

# Output:
#       pubname          | schemaname | tablename
# ----------------------+------------+-----------
#  debezium_publication | public     | users

**📊 Bước 5: Verify Kafka topics created**

In [None]:
# List topics
docker exec -it kafka kafka-topics --list --bootstrap-server localhost:9092

# Output:
# debezium_configs
# debezium_offsets
# debezium_statuses
# postgres.public.users  ← Topic cho table users!

# Describe topic
docker exec -it kafka kafka-topics \
  --describe \
  --topic postgres.public.users \
  --bootstrap-server localhost:9092

# Output:
# Topic: postgres.public.users
# PartitionCount: 3
# ReplicationFactor: 1

**👀 Bước 6: Xem snapshot data trong Kafka**

Debezium đã snapshot 3 rows ban đầu:


In [None]:
# Xem messages
docker exec -it kafka kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic postgres.public.users \
  --from-beginning \
  --max-messages 3

# Output: 3 JSON messages (snapshot của Alice, Bob, Carol)

Để xem đẹp hơn với `jq`:


In [None]:
docker exec -it kafka kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic postgres.public.users \
  --from-beginning \
  --max-messages 1 | jq

**🧪 Bước 7: Test CDC realtime**

**7.1 INSERT data**

In [None]:
# Terminal 1: Watch Kafka messages
docker exec -it kafka kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic postgres.public.users \
  --property print.timestamp=true

# Terminal 2: INSERT vào PostgreSQL
docker exec -it postgres_db psql -U admin -d myapp_db -c \
  "INSERT INTO users (name, email) VALUES ('David', 'david@example.com');"

**Trong Terminal 1 bạn sẽ thấy message ngay lập tức:**

In [None]:
{
  "payload": {
    "before": null,
    "after": {
      "id": 4,
      "name": "David",
      "email": "david@example.com",
      "created_at": 1699200060000000
    },
    "source": {
      "snapshot": "false",  // ← Không phải snapshot
      "lsn": 23456999
    },
    "op": "c",  // ← Create (INSERT)
    "ts_ms": 1699200060123
  }
}

**7.2 Test UPDATE**

In [None]:
# Terminal 2: UPDATE
docker exec -it postgres_db psql -U admin -d myapp_db -c \
  "UPDATE users SET email = 'david.new@example.com' WHERE id = 4;"

**Message trong Kafka:**

In [None]:
{
  "payload": {
    "before": {
      "id": 4,
      "name": "David",
      "email": "david@example.com"
    },
    "after": {
      "id": 4,
      "name": "David",
      "email": "david.new@example.com"  // ← Changed
    },
    "op": "u",  // ← Update
    "ts_ms": 1699200070456
  }
}

**7.3 Test DELETE**

In [None]:
# Terminal 2: DELETE
docker exec -it postgres_db psql -U admin -d myapp_db -c \
  "DELETE FROM users WHERE id = 4;"

**Message trong Kafka:**

In [None]:
{
  "payload": {
    "before": {
      "id": 4,
      "name": "David",
      "email": "david.new@example.com"
    },
    "after": null,  // ← Null vì đã xóa
    "op": "d",  // ← Delete
    "ts_ms": 1699200080789
  }
}

**📊 Bước 8: Monitor với Kafka UI**

In [None]:
# Mở Kafka UI
http://localhost:8085

# Navigate:
Topics → postgres.public.users → Messages

# Bạn sẽ thấy:
├─ 3 messages snapshot (op: "r")
├─ 1 INSERT message (op: "c")
├─ 1 UPDATE message (op: "u")
└─ 1 DELETE message (op: "d")

**🎯 Bước 9: Verify với Debezium UI**

In [None]:
# Mở Debezium UI
http://localhost:8088

# Bạn sẽ thấy:
├─ Connector: postgres-connector
├─ Status: RUNNING
├─ Tasks: 1 (RUNNING)
└─ Tables: public.users

**✅ Checklist hoàn thành**

In [None]:
# Script để check tất cả
cat > check-debezium.sh << 'EOF'
#!/bin/bash

echo "=== Debezium CDC Check ==="

echo -e "\n1. Containers running:"
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "postgres|kafka|debezium"

echo -e "\n2. PostgreSQL WAL level:"
docker exec postgres_db psql -U admin -d myapp_db -t -c "SHOW wal_level;"

echo -e "\n3. Replication slot:"
docker exec postgres_db psql -U admin -d myapp_db -t -c "SELECT slot_name, active FROM pg_replication_slots;"

echo -e "\n4. Debezium connector:"
curl -s http://localhost:8087/connectors

echo -e "\n5. Connector status:"
curl -s http://localhost:8087/connectors/postgres-connector/status | jq '.connector.state, .tasks[0].state'

echo -e "\n6. Kafka topics:"
docker exec kafka kafka-topics --list --bootstrap-server localhost:9092 | grep postgres

echo -e "\n7. Message count:"
docker exec kafka kafka-run-class kafka.tools.GetOffsetShell \
  --broker-list localhost:9092 \
  --topic postgres.public.users 2>/dev/null

echo -e "\n✅ All checks complete!"
EOF

chmod +x check-debezium.sh
./check-debezium.sh

**🐛 Troubleshooting - Nếu có lỗi**

**Problem 1: Connector không RUNNING**

In [None]:
# Check logs
docker logs debezium | tail -50

# Common issues:
# - Wrong password
# - PostgreSQL not reachable
# - wal_level not logical

**Fix:**

In [None]:
# Test connection
docker exec -it debezium bash
ping postgres  # Should resolve
exit

# Verify credentials
docker exec postgres_db psql -U admin -d myapp_db -c "SELECT 1;"

**Problem 2: No messages in Kafka**

In [None]:
# Check replication slot active
docker exec postgres_db psql -U admin -d myapp_db -c \
  "SELECT slot_name, active FROM pg_replication_slots;"

# Output:
#  slot_name      | active
# ----------------+--------
#  debezium_slot  | t       ← Phải là 't' (true)

**Fix:**

In [None]:
# Restart connector
curl -X POST http://localhost:8087/connectors/postgres-connector/restart

**Problem 3: Table not captured**

In [None]:
# Check table.include.list trong config
curl http://localhost:8087/connectors/postgres-connector | jq '.config."table.include.list"'

# Output: "public.users"

**Fix:**

In [None]:
# Update config nếu sai
curl -X PUT http://localhost:8087/connectors/postgres-connector/config \
  -H "Content-Type: application/json" \
  -d '{
    ...
    "table.include.list": "public.users,public.orders",
    ...
  }'