authors | state |
---|---|
Nick Zivkovic <nick.zivkovic@joyent.com> |
draft |
Triton services primarily store metadata in Moray, our key value store. The values are JSON objects, and they may reference other objects in the store. For example a VM object may reference a NIC object. However, Moray is oblivious to these relationships. As a result, if requested, Moray will delete/overwrite/orphan/etc an object that another object may depend on. The problem is not that Moray isn't aware of these relationships, but rather that some components of Triton may accidentally remove or neglect to remove objects that are needed or unneeded, respectively.
We've already run into consistency problems that we've repaired by modifying
other Triton modules (i.e. see RFD 58, and NAPI-327). However, we want to
be able to detect, report, and potentially repair inconsistencies
automatically, rather than through trial and error. We want the equivalent of
::findleaks
for Moray. This way, we can become aware of an inconsistency
before it manifests itself as pathological behavior in production. For example,
in the inconsistency described by RFD 58 the pathological behavior was resource
exhaustion.
In the abstract, scrubbing is a process. Let's refer to it as $P
. The details
of how we implement this process are not -- for the moment -- entirely
relevant. Triton's distributed nature and immense size, give us many possible
ways to implement scrubbing. Before we become too attached to any single
scrubbing method, we will want to describe the process of scrubbing as if it
were done 'magically', without consuming any concrete system resources.
In a scrubbing process, there are a few essential elements that we need to be
mindful of. There is the a datastore $D
. There is at least one service $S
that uses the datastore as part of its operation. We know that $D
is
essentially a set of objects, that may or may not refer to each other. We know
that these objects are generated by some $S
. There are four stages to any
scrubbing process: (1) scanning, (2) consistency checking, (3) reporting
inconsistencies, (4) repairing inconsistencies.
Step 1 is the most trivial step, it involves walking or loading all of the
objects in $D
. Step 2 (the essential step), requires a schema of what the
data should look like. The schema needs to describe what each individual
object should look like internally, as well as what the objects should look
like in relation so each other. The former is already largely implemented by
our schemas
repo (and the validation code in our various services). The
latter, at the time of this writing does not yet exist. However, we can either
extend schemas
, or create a complementary repo that contains this
information. Steps 3 and 4 are self-explanatory.
Each of the above steps has a non-zero cost. However, thinking about the
performance implications of each step at this early stage is going to distract
us more than anything else. If we assume that each step in the scrubbing
process is zero-cost, we can redirect our attention to a much more vexing
problem. The problem is that, even though we can run the scrubber in an instant
of time, we can end up running it while objects are 'in flight' from some
service $S
.
Let's say we have two objects A
and B
. These two objects have a
parent-child relationship A -> B
where A references B, but not the other way
around. Both of these objects are being written to $D
by $S
. When the
scrubbing process was run, it was run in an instant in time when $S
finished
writing A
to $D
, but not B
. As a result the scrubber sees that we have a
parent without a child. But this would not have been the case, if it had run at
a later point in time.
Between any two objects, A
and B
there are 3 possible
reference-relationships:
A -> B
B -> A
A -> B && B -> A
What's more not all of the objects that are in a chain of references, are
necessarily created by the same service. For example A
could have been
created by $S[1]
, while B
was created by $S[2]
. So we can't even assume
that they are being created in some kind of order.
One may be tempted to work around this problem by making all services write an
additional property to their objects -- like say a boolean called scrubable
-- that indicates whether a scrub should include the object in its
consistency-checking calculations. But this has the obvious problem that it
would only be useful for newly generated data. Most of the wasted space in
$D
is likely to be old data.
However, what one can do without modifying the format of any objects in $D
,
is to keep a list of inconsistencies ($I
) (i.e. object A
missing child B
,
or object B
missing parent A
, etc) and see if they are present the next
time we scrub. If not, remove the insistency from list $I
. Each inconsistency
should have a counter indicating how many times it was re-detected during a
scrub. If the count is very high, we can suspect that something is (or was)
wrong with the service $S
that generated the objects that are part of the
inconsistency. When the lifetime of an inconsistency exceeds some (operator
defined?) threshold, we can move on to step 3 and report it to the
operator/developer. The developer would then determine how to implement step
4. For example, one may want to implement the cleaning up of orphaned children
inside of an agent instead of inside of the scrubber proper. However the
scrubber should be able to clean up the inconsistent data upon the explicit
request of an operator. The scrubber can even offer a list of possible
corrections, such as deleting the inconsistent object, creating this object's
child, modifying the object's members (for example if a VM's parameters do not
match the parameters of its package), and so forth. Step 4 is very much a
turn-your-key scenario.
Over time, the human-intervention required in step 4 can be automated by
allowing services that have more context (like net-agent
, or vm-agent
) to
be consumers of the scrubber.
However it seems that steps 1, 2, and 3 are the most important steps because they can reveal potential problems and inefficiencies in the product.
So, now that we know how we will detect (likely) inconsistencies, we should
orient ourselves to the performance of steps 1, 2, and 3. They are emphatically
not zero-cost. In this model $D
refers to the set of objects stored in
Moray/Manatee. We can implement step 1 as a scan of all objects in Moray
through the Moray REST API. This is going to induce a transfer of all the
objects from Manatee to Moray and from Moray to step 1 of $P
. $P
would then
need to store this set of data in memory while it checks for inconsistencies.
However, instead of executing step 1 on all of the data, we may be able to execute it on a subset of the data. For example, VMs refer to NICs, and both of these things belong to a CN. We can, for example, check the consistency of data in per-CN chunks. In more abstract terms, if we were to draw a graph (as in graph theory), we would not have a single connected component. We would have a set of many connected components, and we can scrub the data one connected component at a time. Furthermore, Triton's metadata needs to only be eventually consistent. So, we can insert long pauses between consistency checks of connected components. Other strategies have been considered; for example, we can modify Moray so that it can be asked how idle/busy it is. But these other strategies seemed to offer little benefit compared to the connected component approach.
If we know that we can partition the connected components by CN, then we also
know that we can distribute step 2 (consistency checking) across all the CNs.
So we may end up with a scrub-agent
service. We would still need a central
control-point for the scrubbing diagnostics (step 3), so we would likely need a
scrubber0
zone on the HN as well. This would be analogous to the relationship
between NAPI
and net-agent
. So the scrub-agent
would pull connected
components from Moray
to check their consistency, and it would persist this
information to the CN. Then scrubber0
would pull the inconsistencies from
each scrub-agent
at some interval. scrubber0
should also be able to induce
scrub-agent
to do a scrub whenever it wants, this way an operator can
manipulate the frequency with which scrubs happen and so forth.
Some problems that the scrub may run into include upgrading a service (say, NAPI) for some period of time, and then downgrading it. If the scrubber is detecting an inconsistency in objects newly created by an older NAPI, what should we do?
Furthermore, what if the scrubber itself (and its schemas) gets upgraded and
then downgraded? Or what if we don't upgrade all of the scrub-agents
? We may
want to extend the schema so that it can version
objects (see the section
Schema Extensions below). We also may want to make scrubber0
and
scrub-agent
mutually aware of each others' versions. This way a new
scrub-agent
may refuse to interoperate with an outdated scrubber0
-- and
would alert the operator about this situation. Because of these kinds of
situations, it is imperative that the scrubbing process only detect and
report perceived inconsistencies, while allowing a human to act on this
information.
Where should our inconsistency objects get stored? We know that each
scrub-agent
will store a local set of these objects, that will then get
read-in by scrubber0
. But will scrubber0
store it in Moray
? If we upgrade
and then downgrade scrubber0
what will it do with the new inconsistency
objects that it does not (necessarily) recognize? If inconsistency objects get
stored in their own Moray
bucket, will they also be checked for consistency by
scrubber0
(or the scrub-agents
)?
So, we know that the scrubbing logic can use the schemas (both the existing and
planned schemas) to detect inconsistencies after they have landed in Moray.
However, in an ideal world, no inconsistencies would ever land in Moray. We
can get our services and agents to use the schemas to verify the objects that
they create and push to Moray. We probably wouldn't want to cease operation --
there may exist schema violations that occur in production but not in
development. But our services should probably be able to create an
inconsistency-report that is consumable by scrubber0
, and push those to
scrubber0
. This way we may not have to wait for the scrub-agent
s to detect
the inconsistency at some (much) later point in time.
However, this is not a function that is essential to the correctness of the scrubbing process. It can only help us detect some new inconsistencies as they are about to happen.
We apparently want to extend schemas to be able to describe inter-object
references/connections. We may also want to add Triton-specific
meta-information to the schema. We want be able to say that the VM
object is
created by the VMAPI
service. What's more, we want to version the VM
object, so that we can infer which version of VMAPI
would have created which
kind of object. This way, if we run into the upgrade/downgrade problem
mentioned above, we can say that a VM
object is inconsistent for the latest
version of VMAPI, but not the previous version of VMAPI. Currently, our
services do not expose their versions over their public APIs. However, if/when
they do, the scrub-agents
can give an even more detailed diagnostic.
A big challenge with storing service-related metadata in the object-schema, is that we will have to make sure to keep it up to date whenever we change a service. If we use the schema(s) from within the service (as described in the previous section), then we should be able to detect a divergence very quickly during testing (we just have to make sure that every service depends on the latest version of the schema repo(s) at all times).