- Replication is having multiple copies of the same data on different nodes.
- It is widely used in the distributed & highly available database management systems (DBMS).
Advantage | Description |
---|---|
Scalability | Being able to handle a higher volume of reads than a single machine could handle, by performing reads on replicas |
High availability/Redundancy | Keeping the system running, even when one machine (or several machines, or an entire datacenter) goes down. |
Disconnected operation | Allowing an application to continue working when there is a network interruption. |
Latency | Placing data geographically close to users, so that users can interact with it faster |
Algo | Use Cases | Description |
---|---|---|
⭐ Single-Leader replication | SQL-DBs like Amazon Aurora, PostgreSQL etc. - Message Brokers like Kafka etc. - NoSQL-DBs like DynamoDB, MongoDB etc. |
Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). - Reads can be performed on any replica, but reads from followers might be stale. |
Leaderless Replication | Casandra, Dynamo systems etc. | Clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data. |
Multi-Leader replication | Data-center (a leader in each datacenter) - Clients with offline operation (like mobile apps) - Collaborative editing (like Google Docs) etc. |
Clients send each write to one of several leader nodes, any of which can accept writes. - The leaders send streams of data change events to each other and to any follower nodes. - Biggest problem with this algo is write conflict & its resolution. |