# Tools

## Patroni

<img src="./helpers/patroni-logo.png" alt="drawing" width="500"/>

### Description

Tool for managing postgres multi node cluster.

### Responsibility

It's main responsibilities are:
- Elect *only a single* primary based on some distributed configuration system
- Create PG clusters based on configuration
- Automatic failover mechanism
- Automatic restart of failed servers (mainly for primaries to come back alive as standbys)

### Monitoring

Patroni can be easily monitored by it's REST API.

It exposes the following endpoints for monitoring:
- `GET /patroni` - Information about this particular patroni deployment and the pg node it manages.
- `GET /cluster` - Information about the whole cluster.
- `GET /history` - History of switches / failovers in the cluster.

Patroni can integrate to monitoring tools like `Prometheus` with `metrics` endpoint or just adjusting the `/patroni` json to the monitoring tool used.

### Install

In [None]:
apt install patroni # Debian
dnf install patroni patroni-etcd # RHEL

patroni --version

## ETCD

<img src="./helpers/etcd-icon.webp" alt="drawing" width="600"/>

### Description

A key-value distributed DB cluster that is highly persistent and used mainly for configuration storage of distributed systems like K8S and our multi node PostgreSQL environment.

### Responsibility In Our Environment

Store our PG multi node cluster configurations consistently and in a way that available for all postgres nodes.

### Install

In [None]:
export ETCD_VER=v3.5.13
export DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/bin --strip-components=1
etcd --version

# Demo

## Single Machine

### Architecture

<img src="./helpers/Replication - Single Machine.png" alt="drawing" width="600"/>

### Prerequisites

- Single linux machine with installed PostgreSQL

### Install Tools

In [None]:
# Patroni
apt install patroni -y

# ETCD
apt install curl -y
export ETCD_VER=v3.5.13
export DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/bin --strip-components=1

### Configure Patroni

#### Single Node Config

Explore this `YAML` configuration file for Patroni.

This is a template for a basic Patroni deployment for a single Patroni - PostgreSQL pair.

*Configuration Explanation By Scope:*
- Global
    - Scope - Cluster name
    - Namespace - Where the data about the cluster is stored on the DCS (in our case ETCD)
    - Name - This deployment name
- restapi - Configurations for Patroni REST API
- ctl - Configurations for Patroni CTL API (script)
- etcd - Config connectivity to ETCD
- bootstrap - Configurations for the bootstrap of a PostgresSQL server, since Patroni is responsible to start a failed or non-existent server it needs PG parameters for that. \
That is complemented by the global `dcs` configurations presented in the next yaml block
- postgresql - This deployment explicit configurations
- watchdog - Can be used to make sure more than one Postgres node is not elected to be primary due to Patroni bug / error
- tags - Patroni tags to change some behaviors of the deployment, some examples attached

```yaml
scope: batman
namespace: /service
name: postgresql0

restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

ctl:
 insecure: false # Allow connections to Patroni REST API without verifying certificates

etcd:
  host: 127.0.0.1:2379

# The bootstrap configuration. Works only when the cluster is not yet initialized.
# If the cluster is already initialized, all changes in the `bootstrap` section are ignored!
bootstrap:
  initdb:  
  - encoding: UTF8
  - data-checksums

postgresql:
  listen: 127.0.0.1:5432
  connect_address: 127.0.0.1:5432
  data_dir: data/postgresql0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: zalando
    rewind:
      username: rewind_user
      password: rewind_password
# Can be used to make sure more than one Postgres node is not elected to be primary due to Patroni bug / error
# By default on
watchdog: 
  mode: off

tags:
    noloadbalance: false # Only HA
    nostream: false # Change to file based continuous recovery
    # replicatefrom: # Create cascading replication with this
```

#### Global Config

This `YAML` block can be entered a single time in one config file across all the cluster since it's stored in ETCD and shared globally across all nodes as GLOBAL Variables.

Specific deployment variables can override this!

*Note* - Pay attention to the `parameters`, commented out are defaults.

```yaml
bootstrap:
  # This section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`.
  # WARNING! If you want to change any of the parameters that were set up
  # via `bootstrap.dcs` section, please use `patronictl edit-config`!
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    primary_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      pg_hba:
      - host replication replicator 127.0.0.1/32 md5
      - host all all 0.0.0.0/0 md5
      parameters:
#        wal_level: hot_standby
#        hot_standby: "on"
#        max_connections: 100
#        max_worker_processes: 8
#        wal_keep_segments: 8
#        max_wal_senders: 10
#        max_replication_slots: 10
#        max_prepared_transactions: 0
#        max_locks_per_transaction: 64
#        wal_log_hints: "on"
#        track_commit_timestamp: "off"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p
```

#### Add Configuration Files

1. Change to postgres user
1. cd to ~
1. Create 3 files:
    - postgres0.yml with global config
    - postgres1.yml -> change names and ports accordingly
    - postgres2.yml -> change names and ports accordingly

### Start ETCD

From a designated terminal start the ETCD service

In [None]:
etcd --data-dir=data/etcd --enable-v2=true

### Start Patroni

Open 3 different terminals using `postgres` user in the directory `~` and start 3 deployment based on the config files we configured earlier

In [None]:
patroni postgres0.yml
patroni postgres1.yml
patroni postgres2.yml

### Monitor Activity

In [None]:
curl -s http://localhost:8008/patroni | jq .
curl -s http://localhost:8008/cluster | jq .
curl -s http://localhost:8008/history | jq .

### Play With Your HA Cluster

- Try killing some servers (check out `primary_start_timeout`)
- Try switching to file based log shipping
- Think of a way to use `pgBackRest` instead of `cp`
- Think how would periodic backup look like in HA

## Multiple Machines

### Architecture

<img src="./helpers/Replication - Multiple Machines.png" alt="drawing" width="600"/>

### Pre Requisites

- 2 Linux machines with `PostgreSQL` server software and `Patroni`
- 2 Linux machine with `etcd`
- 1 Linux machine with client tools to query postgres

### Setup Network Resolution

This should be done in a plain on-prem environment because in most of other modern environments like Docker Compose, Cloud, K8S, etc... the machines have a DNS-like resolution built in.

Put `IP Name` key values in /etc/hosts.

For Example:

10.252.55.129   pg-1\
10.252.54.125   pg-2\
10.252.54.87    etcd-1\
10.252.54.85    etcd-2

### Configure ETCD Cluster

In [None]:
#### Inside ETCD Node ####
PEER_PORT=2380
CLIENT_PORT=2379
# For all machines
export ETCD_INITIAL_CLUSTER="etcd-1=http://etcd-1:$PEER_PORT,etcd-2=http://etcd-2:$PEER_PORT" # The initial cluster nodes
export ETCD_INITIAL_CLUSTER_STATE="new"

# For each machine change host
HOST=<host> # Change on other machines
export ETCD_INITIAL_ADVERTISE_PEER_URLS="http://$HOST:$PEER_PORT"
export ETCD_LISTEN_PEER_URLS="http://0.0.0.0:$PEER_PORT"
export ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:$CLIENT_PORT"
export ETCD_ADVERTISE_CLIENT_URLS="http://$HOST:$CLIENT_PORT"

etcd --data-dir=data/etcd --enable-v2=true --name $HOST

Check out our new ETCD cluster

In [None]:
#### Inside PG Node ####
# Both Nodes
apt install etcd-client

# First Node
ENDPOINT="etcd-1:2379"
etcdctl --endpoints=$ENDPOINT put hello "world"

# Second Node
ENDPOINT="etcd-2:2379"
etcdctl --endpoints=$ENDPOINT get hello
etcdctl --endpoints=$ENDPOINT del hello

# First Node
etcdctl --endpoints=$ENDPOINT get hello

### Configure Patroni

This should look similar to the single machine setup except:
- etcd machines -> change host etcd-1 / etcd-2 (etcd header)
- pg machines -> change host pg-1 / pg-2 (restapi, pg_hba, postgresql-listen, postgresql-connect_address)

Here is a working config for example of pg-1

```yaml
scope: batman
namespace: /service
name: postgresql0

restapi:
  listen: 0.0.0.0:8008
  connect_address: pg-1:8008

ctl:
 insecure: false # Allow connections to Patroni REST API without verifying certificates

etcd3:
  host: etcd-1:2379

# The bootstrap configuration. Works only when the cluster is not yet initialized.
# If the cluster is already initialized, all changes in the `bootstrap` section are ignored!
bootstrap:
  initdb:  
  - encoding: UTF8
  - data-checksums
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    primary_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      pg_hba:
      - host replication replicator <hostname pg-1>/32 md5
      - host replication replicator <hostname pg-2>/32 md5
      - host replication replicator 127.0.0.1/32 md5
      - host all all 0.0.0.0/0 md5

postgresql:
  listen: 127.0.0.1,pg-1:5432
  connect_address: pg-1:5432
  data_dir: data/postgresql0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: zalando
    rewind:
      username: rewind_user
      password: rewind_password
# Can be used to make sure more than one Postgres node is not elected to be primary due to Patroni bug / error
# By default on
watchdog: 
  mode: off

tags:
    noloadbalance: false # Only HA
    nostream: false # Change to file based continuous recovery
    # replicatefrom: # Create cascading replication with this
```

In [None]:
#### Inside PG Node ####
# Both
su postgres
cd ~
patroni <your yaml configuration file>

### Monitor Activity

In [None]:
curl -s http://localhost:8008/patroni | jq .
curl -s http://localhost:8008/cluster | jq .
curl -s http://localhost:8008/history | jq .