## Idea

Logical replication is a different replication technique that records the `logical` changes in the DB. 

Not like `physical` changes, the information sent between the machines is enough to replicate the same transactions as occurred on the original machine, but the replicator should apply those individually.

It's more `declarative` in nature much like SQL itself.

Instead of: *Do Those Operations* -> *Make Sure Those Changes Persist*.

## Use Cases

- Replicate to other hardware architecture / postgres version.
- Replicate from multiple servers to one summary.
- Expose subsets of data in a more efficient way (with other indexes, only relevant data with row filters and column list).

## Terminology

- Publisher - The original machine where the changes are recorded
- Subscriber - The machine that replicates the changes
- Logical Replication Worker - A general name for any one of the workers: `Apply`, `Apply Parallel`, `TableSync`

Any postgres server can serve potentially both as a publisher and a subscriber, there is no application differentiation like in physical replication to `primary` and `standby`, those roles are viewed in a context of a particular replication.

## Architecture

The logical replication is built on top of the physical replication both abstractly and even implementation wise

### Regular Flow

An already created logical replication setup

<img src="./helpers/Logical Replication - Regular Flow.png" alt="drawing" height="500"/>

### Init Flow

Setup a new logical replication

<img src="./helpers/Logical Replication - Initial Flow.png" alt="drawing" height="800"/>

### Worker Types

- Table Sync - Responsible for initial sync of a single **replicated table**. There is always 1-1 relation between table sync workers and tables, multiple can run in parallel for different tables.
- Leader Apply - In charge of coordinating (in parallel mode) and applying (in regular mode) of a single on-going **subscription**.
- Parallel Apply - In charge of applying changes along side with other parallel apply workers in a single **subscription**.

### Streaming Types

- Off - Send transaction to subscriber only when transaction committed on publisher.
- On - Stream changes continuously to subscriber, write to temp file on subscriber and only on transaction commit on publisher side the subscriber commit it as well. Good to technique to deal with long running transaction that include large amount of changes.
- Parallel - Same as `on` but applied in parallel with multiple `parallel apply` workers.

## Which Data To Replicate

### Tables Specification

- Single specified table
- Multiple specified table
- All tables in schema (`TABLES IN SCHEMA`)
- All table in DB (`ALL TABLES`)

### Operation Type

- Insert
- Update - using replica identity
- Delete - using replica identity
- Truncate

### Replica Identity

To perform update / delete operation the published table should have a `replica identity` - an expression that can be used to search for rows on subscriber side to perform operations on existing rows.

This can be:
- Primary key (by default).
- Unique index (must be BTREE, non-partial, leftmost field should be a column on published table).
- FULL - all the tuple is used to compare, very inefficient since it's using full scan + potentially N(columns amount) of comparisons per tuple. 

### Column Lists

Allows replication of a subset of a table on the publisher, when only a subset needed, resulting in higher performance less network traffic and less storage.

### Row Filters

Specifying the rows to replicate changes on with a `WHERE` clause.

If the row filter evaluates to `false` or `NULL` then the row is **not replicated**.

Multiple subscriptions on the same table and same operation are being grouped into `OR` qualifier on publisher.

#### Insert & Delete

Send operation to subscriber if the new / old rows satisfy the `WHERE` clause.

#### Update

Which versions satisfy `WHERE`:
- Both old and new -> send update
- Only old -> send delete
- Only new -> send insert
- Neither -> skip

### Note

Row filters and column lists are not applied on `TRUNCATE` -> any truncate will affect all rows.

## Maintenance

### Handling Errors

When applying a change on subscriber it's possible to run into conflicts which will cause the apply worker process to exit.

Then, the launcher will detect there is a missing apply worker and start a new process which will try to apply the last LSN again.

In the default scenario without manual intervention that is an *endless loop!*

#### Disable On Error

A `DISABLE_ON_ERROR` subscription configuration can be turned on to disable the subscription when an error occurs to not cause this loop.

One should be very careful with that option because without fixing this issue a replication slot will remain on the publisher and cause the WAL directory to bloat over time.

#### Ways To Solve Conflicts

- Change data / constraints / permissions on subscriber side.
- Skip LSN -> `ALTER SUBSCRIPTION sub_alltables SKIP (LSN = '<some_lsn>');`\
The LSN can be found on server log (The LSN can be omitted on parallel mode, to find it you should switch to streaming / regular).

### Monitoring

Same as physical replication and:
- `pg_stat_subscription_stats` - cumulative information about errors in replication.
- `pg_stat_subscription` - dynamic information about currently running logical replication worker processes.

## Related Relations

Publication / Subscription  |   Relation Name               |   View / Table    |   Description
--------------------------  |   -------------               |   ------------    |   -----------
Subscription                |   pg_subscription             |   Table           |   contains all subscriptions
Subscription                |   pg_subscription_rel         |   Table           |   contains subscription <-> replicated tables mapping
Subscription                |   pg_stat_subscription        |   View            |   contains information about currently running logical replication workers
Subscription                |   pg_stat_subscription_stats  |   View            |   contains cumulative information about subscriptions on server
Publication                 |   pg_publication              |   Table           |   contains all publications
Publication                 |   pg_publication_rel          |   Table           |   contains publications <-> replicated tables mapping
Publication                 |   pg_publication_namespace    |   Table           |   contains publications <-> schemas mapping
Publication                 |   pg_publication_tables       |   View            |   contains publications <-> replicated tables mapping in a more readable format **+** joined to pg_publication **+** added `FOR ALL TABLES` publication information in single table granularity



Mapping between Publications <-> Tables can be seen in `pg_publication_tables`

## Caveats

### Missing Functionality

- DDL and Schema changes are not replicated.
- Sequence data is not replicated.
- Only tables and partitioned tables are replicated any other objects not.
- In partitioned table - the partitions on publisher side should exist on the subscriber side as well, except when `publish_via_partition_root=true`.

### Restrictions

- Truncate on subscriber side will fail if it's linked to a table not included in the subscription.
