Generic Tiered Replication implementation.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
trepl
.gitignore
LICENSE
README.md
setup.py

README.md

Trepl

Trepl is a generic Tiered Replication (Cidon et. al) implementation, designed to help pick replica placement of Kafka partitions and configure WADE chains. However, it can be used in any situation where you might want to adjust probability of data loss / unavailability from multiple replica failures.

Tiered Replication follows up on ideas introduced in the Copysets paper, where you'll find detailed information on motivations and use cases:

Usage

Basic Trepl usage is simple:

>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=1)
[['node1', 'node2'], ['node1', 'node3']]

>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=2)
[['node1', 'node2'], ['node1', 'node3'], ['node2', 'node3']]

Trepl also ships with rack and tier aware check functions:

# not rack aware
>>> trepl.build_copysets(['node1', 'node2', 'node3'], R=2, S=1)
[['node1', 'node2'], ['node1', 'node3']]

# rack aware, node1 and node2 can not share a copyset since they're in
# the same rack
>>> rack_map = { 'node1': 'rack1', 'node2': 'rack1', 'node3': 'rack3' }
>>> trepl.build_copysets(
      rack_map.keys(), R=2, S=1,
      checker=trepl.checkers.rack(rack_map),
    )
[['node1', 'node3'], ['node2', 'node3']]

# scatter width must be 2, and data must exist on at least one node in
# the backup tier
>>> primary = ['A', 'B', 'C']
>>> backup = ['d', 'e']
>>> trepl.build_copysets(
      primary + backup, R=2, S=2,
      checker=trepl.checkers.tiered(backup, 2),
    )
[['A', 'd'], ['A', 'e'], ['B', 'd'], ['B', 'e'], ['C', 'd'], ['C', 'e']]

Authors