|
| 1 | +Originally from: [tweet](https://twitter.com/samokhvalov/status/1727344499943493900), [LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# How to convert a physical replica to logical |
| 6 | + |
| 7 | +> I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +In some cases, it might be beneficial to convert an existing regular asynchronous physical replica to logical, or to |
| 11 | +create a new physical replica first, and then convert it to logical. |
| 12 | + |
| 13 | +This approach: |
| 14 | + |
| 15 | +- on the one hand, eliminates the need to execute initial data load step that can be fragile and quite stressful in case |
| 16 | + of large, heavily-loaded DB, but |
| 17 | +- on another, the logical replica created in such way has everything that the source Postgres instance has. |
| 18 | + |
| 19 | +So, this method suits better in case when you need all the data from the source be presented in the logical replica |
| 20 | +you're creating, and it is extremely useful if you work with very large, heavily-loaded clusters. |
| 21 | + |
| 22 | +The steps below are quite straightforward. In this case, we use a physical replica that replicates data immediately from |
| 23 | +the primary via streaming replication `primary_conninfo` and replication slot (e.g. under Patroni's control), not |
| 24 | +involving cascaded replication (although it's possible to implement too). |
| 25 | + |
| 26 | +## Step 1: have a physical replica for conversion |
| 27 | + |
| 28 | +Choose a physical replica to convert, or create a new one using `pg_basebackup`, recovering from backups, or creating it |
| 29 | +from a cloud snapshot. |
| 30 | + |
| 31 | +Make sure this replica is not used by regular users while we're converting it. |
| 32 | + |
| 33 | +## Step 2: ensure the requirements are met |
| 34 | + |
| 35 | +First, ensure that the settings are prepared for logical replication, as described |
| 36 | +in the [logical replication config](https://postgresql.org/docs/current/logical-replication-config.html). |
| 37 | + |
| 38 | +Primary settings: |
| 39 | + |
| 40 | +- `wal_level = 'logical'` |
| 41 | +- `max_replication_slots > 0` |
| 42 | +- `max_wal_senders > max_replication_slots` |
| 43 | + |
| 44 | +On the physical replica we are going to convert: |
| 45 | + |
| 46 | +- `max_replication_slots > 0` |
| 47 | +- `max_logical_replication_workers > 0` |
| 48 | +- `max_worker_processes >= max_logical_replication_workers + 1` |
| 49 | + |
| 50 | +Additionally: |
| 51 | + |
| 52 | +- the replication lag is low; |
| 53 | +- every table has a PK or have |
| 54 | + [REPLICA IDENTITY FULL](https://postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY); |
| 55 | +- `restore_command` is not set on the replica we'll use (if it is, temporarily set its value to an empty string); |
| 56 | +- temporarily, increase `wal_keep_size` (PG13+; in PG12 or older, `wal_keep_segments`) on the primary to a value |
| 57 | + corresponding to a few hours of WAL generation. |
| 58 | + |
| 59 | +## Step 3: stop physical replica |
| 60 | + |
| 61 | +Shut down physical replica and keep it down during the next step. This is needed so its position is guaranteed to be in |
| 62 | +the past compared to the logical slot we're going to create on the primary. |
| 63 | + |
| 64 | +## Step 4: create publication, logical slot, and remember its LSN |
| 65 | + |
| 66 | +On the primary: |
| 67 | + |
| 68 | +- issue a manual `CHECKPOINT`; |
| 69 | +- create publication; |
| 70 | +- create a logical slot and *remember its LSN position*; |
| 71 | + |
| 72 | +Example: |
| 73 | + |
| 74 | +```sql |
| 75 | +checkpoint; |
| 76 | + |
| 77 | +create publication my_pub for all tables; |
| 78 | + |
| 79 | +select lsn |
| 80 | +from pg_create_logical_replication_slot( |
| 81 | + 'my_slot', |
| 82 | + 'pgoutput' |
| 83 | +); |
| 84 | +``` |
| 85 | + |
| 86 | +It is important to remember the `lsn` value from the last command – we'll be using it further. |
| 87 | + |
| 88 | +## Step 5: let the physical replica catch up |
| 89 | + |
| 90 | +Reconfigure the physical replica: |
| 91 | + |
| 92 | +- `recovery_target_lsn` – set it to the LSN value we've got from the previous step |
| 93 | +- `recovery_target_action = 'promote'` |
| 94 | +- `restore_command`, `recovery_target_timeline`, `recovery_target_xid`, `recovery_target_time`, `recovery_target_name` |
| 95 | + are not set or empty |
| 96 | + |
| 97 | +Now, start the physical replica. Monitor its lag and how the replica catches up reaching the LSN we need and then |
| 98 | +auto-promotes. This can take some time. Once it's done, check it: |
| 99 | + |
| 100 | +```sql |
| 101 | +select pg_is_in_recovery(); |
| 102 | +``` |
| 103 | + |
| 104 | +- must return `f`, meaning that this node is now a primary itself (a clone) with position, corresponding to the position |
| 105 | + of the replication slot on the source node. |
| 106 | + |
| 107 | +## Step 6: create subscription and start logical replication |
| 108 | + |
| 109 | +Now, of the freshly created "clone", create logical subscription with `copy_data = false` and `create_slot = false`: |
| 110 | + |
| 111 | +```sql |
| 112 | +create subscription 'my_sub' |
| 113 | +connection 'host=.. port=.. user=.. dbname=..' |
| 114 | +publication my_pub |
| 115 | +with ( |
| 116 | + copy_data = false, |
| 117 | + create_slot=false, |
| 118 | + slot_name = 'my_slot' |
| 119 | +); |
| 120 | +``` |
| 121 | + |
| 122 | +Ensure that replication is now active – check it on the source primary: |
| 123 | + |
| 124 | +```sql |
| 125 | +select * from pg_replication_slots; |
| 126 | +``` |
| 127 | + |
| 128 | +– the field `active` must be `t` for our slot. |
| 129 | + |
| 130 | +## Finalize |
| 131 | + |
| 132 | +- Wait until the logical replication lags fully caught up (occasional acute spikes are OK). |
| 133 | +- Return `wal_keep_size` (`wal_keep_segments`) to its original value on the primary. |
| 134 | + |
| 135 | +## Additional notes |
| 136 | + |
| 137 | +Here we used a single publication and logical slot in this recipe. It is possible to use multiple slots, slightly |
| 138 | +adjusting the procedure. But if you choose to do so, keep in mind the potential complexities of the use of multiple |
| 139 | +slots/publications, first of all, these: |
| 140 | + |
| 141 | +- not guaranteed referential integrity on the logical replica (occasional temporary FK violation), |
| 142 | +- more fragile procedure of publication creation (creation of a publication `FOR ALL TABLES` doesn't require table-level |
| 143 | + locks; but when we use multiple publications and create publication for certain tables, table-level locks are |
| 144 | + required – however, this is just `ShareUpdateExclusiveLock`, |
| 145 | + per [this comment on PostgreSQL source code](https://github.com/postgres/postgres/blob/1b6da28e0668eb977dcab6987d192ddedf32b752/src/backend/commands/publicationcmds.c#L1550)). |
| 146 | + |
| 147 | +And in any case: |
| 148 | + |
| 149 | +- make sure you are prepared to deal with the restrictions of logical replication for your version (e.g., |
| 150 | + [for PG16](https://postgresql.org/docs/16/logical-replication-restrictions.html)); |
| 151 | +- if you consider using this approach to perform a major upgrade, avoid running `pg_upgrade` on the already-converted |
| 152 | + node – it may be not safe |
| 153 | + (see: [pg_upgrade and logical replication](https://postgresql.org/message-id/flat/20230217075433.u5mjly4d5cr4hcfe%40jrouhaud)). |
0 commit comments