Skip to content

Latest commit

 

History

History
41 lines (32 loc) · 3.79 KB

File metadata and controls

41 lines (32 loc) · 3.79 KB

Partitioning/Sharding

  • For very large datasets, or very high query throughput, replication is not sufficient - we need to break the data up into partitions, also known as sharding.
  • Instead of one shard for writes, we partition/shard the database based on a partition key.
  • This would increase query throughput and overall system write throughput.

Note - This partitioning is not related to network partition (in CAP Theorem).

Key Terminologies

Terminology Description
Partition Key Partitioning would be done based on a partition key.
- Hence we need to carefully choose this key to distribute the data evenly b/w partitions.
Hash Function Hash function helps to determine the partition for a given key.
- MD5 as a hash function used in Casandra, MongoDB.
Secondary Indexes Read more
Consistent Hashing This handles data sharding with dynamic number of servers.
Unique-ID-Generator Since NoSQL dbs don't generate primary key automatically, we would have generate unique ID on the application side.

Kafka Cluster

Other Supported Clusters

Sharding Examples

Example
MySql Sharding @ Pinterest
PostgreSQL Sharding @ Instagram
PostgreSQL Sharding - System Design

References