Skip to content

Commit

Permalink
ns
Browse files Browse the repository at this point in the history
  • Loading branch information
belaban committed Jan 6, 2009
1 parent a60f961 commit 4939e7d
Showing 1 changed file with 28 additions and 23 deletions.
51 changes: 28 additions & 23 deletions doc/design/ReplCache.txt
Expand Up @@ -4,7 +4,7 @@ Replicated Cache
================

Author: Bela Ban
Id: $Id: ReplCache.txt,v 1.7 2009/01/06 10:09:39 belaban Exp $
Id: $Id: ReplCache.txt,v 1.8 2009/01/06 10:23:19 belaban Exp $


Idea
Expand All @@ -20,30 +20,32 @@ replicated (for high availability) or not and if yes, how many times.

There are 3 methods: put(), get() and remove().

When adding a new key/value pair, put() takes the 'replication factor' as argument, alongside the key and value and
When adding a new key/value pair, put() takes the 'replication count' as argument, along with the key, value and
a timeout.

A replication factor of 0 means no replication. The data will be placed on a node in the cluster that corresponds to
the consistent hashcode of the data's key.
A replication count of 1 means the data is stored only on 1 node in the cluster, and there is no replication.
The data will be placed on a node in the cluster that corresponds to the consistent hashcode of the data's key.

A replication factor of -1 means that the element is replicated to all cluster nodes.
A replication count of -1 means that the data item is replicated to all cluster nodes.

A replication factor of K means replicate the element K times, e.g. PUT(KEY, VAL, 2, timeout) means that an element
will be created on 3 nodes. When the view changes, the cluster makes sure that the above KEY, VAL is always present
on 3 nodes. Note that K has to be less than or equal to N (= number of nodes). When K > N, then ReplCache treats
A replication count of K means the element is stored K times, e.g. PUT(KEY, VAL, 2, timeout) means that an element
will be created on 2 nodes. When the view changes, the cluster makes sure that the above KEY, VAL is always present
on 2 nodes. Note that K has to be less than or equal to N (= number of nodes). When K > N, then ReplCache treats
K as -1 (replicate to all nodes).

TBD: a replication factor which defines a percentage, e.g. 0.3 means replicate to 30% of all nodes.
K == 0 is invalid and will throw an exception.

The advantage of defining replication factors per element is that we can define what reliability we want for
TBD: a replication count which defines a percentage, e.g. 0.3 means replicate to 30% of all nodes.

The advantage of defining replication counts per element is that we can define what reliability we want for
individual data items. For example, an element that can easily be retrieved from disk or database probably does
fine with a factor of 0 (= no replication). Here, we use the cache just as a speedup to prevent DB access.
An important item that is costly to recreate, or cannot be recreated at all, should probably have a factor of -1.
fine with a count of 1 (= no replication). Here, we use the cache just as a speedup to prevent DB access.
An important item that is costly to recreate, or cannot be recreated at all, should probably have a count of -1.

The point of being able to define replication factors per data item is that we can save memory. If we
The point of being able to define replication counts per data item is that we can save memory. If we
compare this to RAID 1, then - because we're replicating every single data item - we can effectively only use
half of the memory (disk space) allocated to the RAID system. With per data replication factors, we can increase the
net memory that can be used (unless of course all elements are added with a factor of -1 !).
half of the memory (disk space) allocated to the RAID system. With per data replication counts, we can increase the
net memory that can be used (unless of course all elements are added with a count of -1 !).

Put() always results in a multicast across the cluster. Each node determines by itself whether it will add the KEY|VAL
or not. This is done by computing a set of consistent hashes from KEY, mapping them to a set of servers and determining
Expand Down Expand Up @@ -76,21 +78,24 @@ put(KEY, VAL, K, TIMEOUT):
Places KEY,VAL into the hashmaps of selected cluster nodes. Existing data will be overwritten. KEY and VAL have to
be serializable.

K can be -1 (replicate everywhere), 0 (create only on 1 node) or > 0 <= N (replicate to K nodes).
K can be -1 (replicate everywhere), 1 (create only on 1 node) or > 1 <= N (replicate to K nodes).

TIMEOUT (ms): -1 (no caching), 0 (cache until removed) or > 0 (cache for TIMEOUT ms)
TIMEOUT (ms): -1 (no caching), 0 (cache until removed) or > 0 (cache for TIMEOUT milliseconds)

On reception of put():

- The selected target nodes add the KEY,VAL to their local cache if the conistent hash matches their local_addr
- The selected target nodes add the KEY,VAL to their local cache if
- K is -1, or
- K is 1 and the consistent hash matches their local_addr, or
- K > 1 && <= N and the local_addr matches *one* of the K consisten hashes
- *Everyone* removes KEY from their L1 cache. (Optimization: only the non-selected nodes do this, the selected nodes
also add KEY,VAL to their L1 caches)


The put() method creates a message with KEY, VAL, K and TIMEOUT and multicasts it. Each node which receives the message
does the following:
- If K == -1: add it to the local cache and return
- If K == 0: compute the server based on the consistent hashcode for KEY and see whether local_addr == server. If
- If K == 1: compute the server based on the consistent hashcode for KEY and see whether local_addr == server. If
so, add the KEY, VAL to the local cache and return. Else, drop the message.
- If K > 0: compute K consistent hashes from KEY. If local_addr is part of the set of server addresses, add KEY,VAL
to the local cache. Else, drop the message.
Expand All @@ -114,15 +119,15 @@ View changes:

For a new or left node P, every node N:
- For each local KEY:
- If the K factor is -1: replicate KEY,VAL to P
- If the K factor is 0: compute consistent hash and pick server S
- If the K count is -1: replicate KEY,VAL to P
- If the K count is 1: compute consistent hash and pick server S
- If S == P, the server which hosted KEY before P joined moves KEY,VAL to P (PUT message), and
removes KEY from its local hashmap
- Else: do nothing (no rebalancing needed)
- If the factor is > 0:
- If the count is > 1:
- Compute K consistent hashes and determine the K servers which are supposed to be hosting KEY
- For each server S:
- Do the same as above (factor == 0)
- Do the same as above (count == 1)



Expand Down

0 comments on commit 4939e7d

Please sign in to comment.