/
nursery.tex
62 lines (59 loc) · 4.06 KB
/
nursery.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
\section{Scaling Arakoon}
We want to be able to use arakoon for increasingly large key-value spaces.
For a single arakoon cluster the capacity is limited by the size of a single disk.
So it is only natural to allow different arakoon clusters to team up.
A \emph{nursery}\footnote{after a \emph{a nursery of raccoons}} provides a semi-unified view on a set of arakoon clusters.
Each cluster is uniquely responsible for a prefix range.
\subsection{Limitations}
The simple strategy of mapping a cluster to a range of keys already implies some limitations compared to the single cluster setup.
As a result, applications willing to scale from a single cluster to a nursery need to do some planning.
\subsubsection{impact on sequences}
Sequences are multiple updates that are done atomically.
Since atomicity can only be achieved inside 1 cluster\footnote{you could build transactionality across clusters, but it's a can of worms}, this means that all keys for a sequence need to share the same prefix.
\subsubsection{impact on ranges}
Every cluster is responsible for a specific range.
As client range query will only be served by a single cluster, it means that only ranges that are subranges of cluster ranges can be served.
\subsection{Migrations}
Once a cluster is filled, one needs to be able to split it, or move part of its range elsewhere. This process is called migration.
Each cluster has a \emph{public} range $[k_b,k_e)$ it serves to clients, as well as a \emph{private} range it contains.
As such, migrating a part of a cluster's range to another cluster becomes feasible.
If we're moving keys away from a \emph{source} cluster $[k_b,k_e)$
to a \emph{target} cluster $[k_e,k_f)$, we
\begin{itemize}
\item{} shrink the public range of the source cluster from
$[k_b,k_e)$ to $[k_b, k_e - a)$.
The private range of the source remains $[k_b,k_e)$
\item{} add the key/value pairs in $[k_e - a,k_e)$ to the target cluster
\item{} extend the public range of the target cluster from
$[k_e,k_z)$ to $[k_e - a, k_z)$
\item{} delete the key/value pairs on the source in $[k_e -a,k_e)$.
update the private range on the source to $[k_e - a, k_e)$.
\end{itemize}
This work can be done by a privileged client responsible for the migration.
That client can die at any point, and figure out, at resumption, what it needs to do to complete the task.
The problem with the migration strategy is that there is a point in time where none of the clusters is serving $[k_e -a, k_e)$, so any request for anything in that range is refused.
\subsection{Client side support}
Each client needs to know which cluster is responsible for a certain key(-range).
This information is kept in a routing table. At construction time, a client fetches this from a designated Arakoon that knows all the clusters in a nursery.
The privileged clients performing migrations also must update this designated arakoon.
As far as clients are concerned, it's not really important that the nursery clients have correct routing tables:
If a client asks something from a cluster that's not able to comply, it will simply refuse.
This means that either, there is some migration, or that the client has outdated routing information.
In that case, it can simply refetch the ranges from the clusters it knows, or refetch it from the designated arakoon that keeps this information.
\subsection{Problems}
We depend on having a designated arakoon that knows all the clusters in a nursery, and their routing tables.
So conceptually, we introduced a single point of failure.
Since this point is in reality an arakoon cluster which is synchronuously replicated, that should not pose big practical problems.
\paragraph{}
Having to maintain configuration of (multiple/many?) arakoon clusters on lots of machines will become a significant problem.
As this information is both crucial, and maintained by humans, which is a recipe for disaster. In time, we should move to service discovery.
Possible options are:
\begin{itemize}
\item{XMPP disco}
\item{DNS-Based Service Discovery}
\item{openSLP}
\item{Salutation}
\item{UPnP}
\item{svrloc}
\item{\ldots}
\end{itemize}