# Jepsen
Notes on [this sequence of blog posts](https://aphyr.com/tags/jepsen) exploring data properties of various databases.

## Postgres
* Postgres has various options for consistency, up to and including 2PL serialized isolation.
* However the article focuses on the communication between the client and the server.
* Postgres uses a **two-phase commit** (Byzantine generals problems).
* In order for a commit to succeed on the client side, the (1) transaction must go through and (2) the database must respond to the client with a success.
* If a network partition between the client and the service occurs in between the write and the ack, the client will respond with a failure timeout.
* The data will still be modified, however.
* So if a network partition occurs, and a failure bubbles up, technically services relying on Postgres cannot know if the transaction succeeded or not.
* Is it hard to create a partition between the database and the client? Maybe.

## Redis
* Redis by default runs on a single server. It offers serialized isolation (e.g. the highest possible guarantee level) via actual serialization (everything runs on one thread, and transactions are sequenced on that thread).
* In this configuration Redis is CP.


* Redis can be made highly available.
* It offers asynchronous single-leader replication (see Chapter 6 notes).
* A separate service, called Sentry, is used to detect serious network partitions.
* If a network partition occurs, a quorom of nodes (at least $N / 2 + 1$, so that only one quorom may exist) assembles and elects a new leader.
* The quorom then instructs any client connections to abandon the old configuration and use the new configuration. Nodes are added back as the partition heals and the offline nodes come back online.
* Redis does not gaurantee durability. Replication and disc writes are performed asynchronously, so data that was updated on "lost" nodes may not exist in the new quorom (having a quorom protects against this, but as always it's an availability-consistency tradeoff).


* By default, during a network partition the old primary will continue to accept and deal with requests.
* This will continue until the partition heals and the new quorom-elected leader can reach the primary again. The old primary will be told to step down.
* The data that was accepted by the old primary and acked will be lost!


* Any service built on Redis must be ready to deal with failovers that demolish consistency.
* So in the single-leader replicated configuration, Redis is not consistent.
* Redis is not available, either. If there is a failover there's no node that will accept operations.
* The tradeoff is that it's very fast.
* Caches don't care about consistency loss, which is why Redis is so good for this purpose (the difficulty of cache invalidation notwithstanding).


* You can optionally configure how long until the old primary stops accepting requests. 
* This essentially requires an occassional quorom ack on the primary, which slows the system down. I don't recommend it.
* Really, if you want to not lose data and not deal with acknowledged consistency problems don't use Redis! It wasn't designed for this!


## MongoDB

* MongoDB also uses a single-leader replication with quorom recovery.
* By default MongoDB also uses a two-phase commit against the leader (apparently it used to not even check if a write succeeded on the leader!).
* If a network partition occurs, similarly to Redis the majority quorom will elect a new leader.
* In the meanwhile, the old leader will continue to accept and ack operations.
* Once the partition heals, and the old leader rejoins the pack, the intervening data written to the old leader and not to the new leader is rolled back (specifically, to a rollback file that a system administrator can look at).
* The article claims it's not possible to get split-brain, but it seems extremely possible to get split-brain...
* So by default MongoDB is not consistent.


* As with Redis you can tell MongoDB to use majority acknowledgement. This makes it properly CP, but increases latency by a lot.