authors	state
Nick Zivkovic <nick.zivkovic@joyent.com>	draft

RFD 21 Metadata Scrubber For Triton

Background/Problem

Triton services primarily store metadata in Moray, our key value store. The values are JSON objects, and they may reference other objects in the store. For example a VM object may reference a NIC object. However, Moray is oblivious to these relationships. As a result, if requested, Moray will delete/overwrite/orphan/etc an object that another object may depend on. The problem is not that Moray isn't aware of these relationships, but rather that some components of Triton may accidentally remove or neglect to remove objects that are needed or unneeded, respectively.

We've already run into consistency problems that we've repaired by modifying other Triton modules (i.e. see RFD 58, and NAPI-327). However, we want to be able to detect, report, and potentially repair inconsistencies automatically, rather than through trial and error. We want the equivalent of ::findleaks for Moray. This way, we can become aware of an inconsistency before it manifests itself as pathological behavior in production. For example, in the inconsistency described by RFD 58 the pathological behavior was resource exhaustion.

Goals and Requirements

In the abstract, scrubbing is a process. Let's refer to it as $P. The details of how we implement this process are not -- for the moment -- entirely relevant. Triton's distributed nature and immense size, give us many possible ways to implement scrubbing. Before we become too attached to any single scrubbing method, we will want to describe the process of scrubbing as if it were done 'magically', without consuming any concrete system resources.

In a scrubbing process, there are a few essential elements that we need to be mindful of. There is the a datastore $D. There is at least one service $S that uses the datastore as part of its operation. We know that $D is essentially a set of objects, that may or may not refer to each other. We know that these objects are generated by some $S. There are four stages to any scrubbing process: (1) scanning, (2) consistency checking, (3) reporting inconsistencies, (4) repairing inconsistencies.

Step 1 is the most trivial step, it involves walking or loading all of the objects in $D. Step 2 (the essential step), requires a schema of what the data should look like. The schema needs to describe what each individual object should look like internally, as well as what the objects should look like in relation so each other. The former is already largely implemented by our schemas repo (and the validation code in our various services). The latter, at the time of this writing does not yet exist. However, we can either extend schemas, or create a complementary repo that contains this information. Steps 3 and 4 are self-explanatory.

Each of the above steps has a non-zero cost. However, thinking about the performance implications of each step at this early stage is going to distract us more than anything else. If we assume that each step in the scrubbing process is zero-cost, we can redirect our attention to a much more vexing problem. The problem is that, even though we can run the scrubber in an instant of time, we can end up running it while objects are 'in flight' from some service $S.

Let's say we have two objects A and B. These two objects have a parent-child relationship A -> B where A references B, but not the other way around. Both of these objects are being written to $D by $S. When the scrubbing process was run, it was run in an instant in time when $S finished writing A to $D, but not B. As a result the scrubber sees that we have a parent without a child. But this would not have been the case, if it had run at a later point in time.

Between any two objects, A and B there are 3 possible reference-relationships:

A -> B
B -> A
A -> B && B -> A

What's more not all of the objects that are in a chain of references, are necessarily created by the same service. For example A could have been created by $S[1], while B was created by $S[2]. So we can't even assume that they are being created in some kind of order.

One may be tempted to work around this problem by making all services write an additional property to their objects -- like say a boolean called scrubable -- that indicates whether a scrub should include the object in its consistency-checking calculations. But this has the obvious problem that it would only be useful for newly generated data. Most of the wasted space in $D is likely to be old data.

However, what one can do without modifying the format of any objects in $D, is to keep a list of inconsistencies ($I) (i.e. object A missing child B, or object B missing parent A, etc) and see if they are present the next time we scrub. If not, remove the insistency from list $I. Each inconsistency should have a counter indicating how many times it was re-detected during a scrub. If the count is very high, we can suspect that something is (or was) wrong with the service $S that generated the objects that are part of the inconsistency. When the lifetime of an inconsistency exceeds some (operator defined?) threshold, we can move on to step 3 and report it to the operator/developer. The developer would then determine how to implement step 4. For example, one may want to implement the cleaning up of orphaned children inside of an agent instead of inside of the scrubber proper. However the scrubber should be able to clean up the inconsistent data upon the explicit request of an operator. The scrubber can even offer a list of possible corrections, such as deleting the inconsistent object, creating this object's child, modifying the object's members (for example if a VM's parameters do not match the parameters of its package), and so forth. Step 4 is very much a turn-your-key scenario.

Over time, the human-intervention required in step 4 can be automated by allowing services that have more context (like net-agent, or vm-agent) to be consumers of the scrubber.

However it seems that steps 1, 2, and 3 are the most important steps because they can reveal potential problems and inefficiencies in the product.

So, now that we know how we will detect (likely) inconsistencies, we should orient ourselves to the performance of steps 1, 2, and 3. They are emphatically not zero-cost. In this model $D refers to the set of objects stored in Moray/Manatee. We can implement step 1 as a scan of all objects in Moray through the Moray REST API. This is going to induce a transfer of all the objects from Manatee to Moray and from Moray to step 1 of $P. $P would then need to store this set of data in memory while it checks for inconsistencies.

However, instead of executing step 1 on all of the data, we may be able to execute it on a subset of the data. For example, VMs refer to NICs, and both of these things belong to a CN. We can, for example, check the consistency of data in per-CN chunks. In more abstract terms, if we were to draw a graph (as in graph theory), we would not have a single connected component. We would have a set of many connected components, and we can scrub the data one connected component at a time. Furthermore, Triton's metadata needs to only be eventually consistent. So, we can insert long pauses between consistency checks of connected components. Other strategies have been considered; for example, we can modify Moray so that it can be asked how idle/busy it is. But these other strategies seemed to offer little benefit compared to the connected component approach.

If we know that we can partition the connected components by CN, then we also know that we can distribute step 2 (consistency checking) across all the CNs. So we may end up with a scrub-agent service. We would still need a central control-point for the scrubbing diagnostics (step 3), so we would likely need a scrubber0 zone on the HN as well. This would be analogous to the relationship between NAPI and net-agent. So the scrub-agent would pull connected components from Moray to check their consistency, and it would persist this information to the CN. Then scrubber0 would pull the inconsistencies from each scrub-agent at some interval. scrubber0 should also be able to induce scrub-agent to do a scrub whenever it wants, this way an operator can manipulate the frequency with which scrubs happen and so forth.

Service Upgrade/Downgrade Problem

Some problems that the scrub may run into include upgrading a service (say, NAPI) for some period of time, and then downgrading it. If the scrubber is detecting an inconsistency in objects newly created by an older NAPI, what should we do?

Furthermore, what if the scrubber itself (and its schemas) gets upgraded and then downgraded? Or what if we don't upgrade all of the scrub-agents? We may want to extend the schema so that it can version objects (see the section Schema Extensions below). We also may want to make scrubber0 and scrub-agent mutually aware of each others' versions. This way a new scrub-agent may refuse to interoperate with an outdated scrubber0 -- and would alert the operator about this situation. Because of these kinds of situations, it is imperative that the scrubbing process only detect and report perceived inconsistencies, while allowing a human to act on this information.

Inconsistency Storage Problem

Where should our inconsistency objects get stored? We know that each scrub-agent will store a local set of these objects, that will then get read-in by scrubber0. But will scrubber0 store it in Moray? If we upgrade and then downgrade scrubber0 what will it do with the new inconsistency objects that it does not (necessarily) recognize? If inconsistency objects get stored in their own Moray bucket, will they also be checked for consistency by scrubber0 (or the scrub-agents)?

Other Uses For The Schema(s)

So, we know that the scrubbing logic can use the schemas (both the existing and planned schemas) to detect inconsistencies after they have landed in Moray. However, in an ideal world, no inconsistencies would ever land in Moray. We can get our services and agents to use the schemas to verify the objects that they create and push to Moray. We probably wouldn't want to cease operation -- there may exist schema violations that occur in production but not in development. But our services should probably be able to create an inconsistency-report that is consumable by scrubber0, and push those to scrubber0. This way we may not have to wait for the scrub-agents to detect the inconsistency at some (much) later point in time.

However, this is not a function that is essential to the correctness of the scrubbing process. It can only help us detect some new inconsistencies as they are about to happen.

Schema Extensions

We apparently want to extend schemas to be able to describe inter-object references/connections. We may also want to add Triton-specific meta-information to the schema. We want be able to say that the VM object is created by the VMAPI service. What's more, we want to version the VM object, so that we can infer which version of VMAPI would have created which kind of object. This way, if we run into the upgrade/downgrade problem mentioned above, we can say that a VM object is inconsistent for the latest version of VMAPI, but not the previous version of VMAPI. Currently, our services do not expose their versions over their public APIs. However, if/when they do, the scrub-agents can give an even more detailed diagnostic.

A big challenge with storing service-related metadata in the object-schema, is that we will have to make sure to keep it up to date whenever we change a service. If we use the schema(s) from within the service (as described in the previous section), then we should be able to detect a divergence very quickly during testing (we just have to make sure that every service depends on the latest version of the schema repo(s) at all times).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RFD 21 Metadata Scrubber For Triton

Background/Problem

Goals and Requirements

Service Upgrade/Downgrade Problem

Inconsistency Storage Problem

Other Uses For The Schema(s)

Schema Extensions

Files

README.md

Latest commit

History

README.md

File metadata and controls

RFD 21 Metadata Scrubber For Triton

Background/Problem

Goals and Requirements

Service Upgrade/Downgrade Problem

Inconsistency Storage Problem

Other Uses For The Schema(s)

Schema Extensions