-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tla-plus: add spec for store liveness fabric #123523
tla-plus: add spec for store liveness fabric #123523
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Had a lot of fun running it myself and tweaking the algorithm.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.cfg
line 5 at r1 (raw file):
CONSTANT Nodes = {n1, n2}
If the durability property holds for two nodes, it should hold for any larger system, between any two nodes pair-wise. So it doesn't seem necessary to check this for more than two nodes, right?
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 118 at r1 (raw file):
macro restart() begin current_epoch[self] := current_epoch[self];
Aren't we supposed to increment the epoch upon restart?
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 160 at r1 (raw file):
ack := TRUE; elsif by_self_for[self][msg.from].epoch = msg.epoch then if by_self_for[self][msg.from].end_time < msg.end_time then
It is possible that by_self_for[self][msg.from].end_time > msg.end_time
if the message was an old heartbeat for the same epoch, right? So, we don't want to regress the end time. Might be worth a comment.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 171 at r1 (raw file):
elsif msg.type = MsgHeartbeatResp then ReceiveHeartbeatResp:
This behavior differs from the algorithm in the doc. Presumably because we might receive a MsgHeartbeatResp
that corresponds to an older MsgHeartbeat
? Is this just an artifact of TLA+ not supporting RPC?
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
ReceiveHeartbeatResp: if msg.ack then if for_self_by[self][msg.from].epoch < msg.epoch then
I don't think this case is possible. We can't receive an ACK heartbeat response with an epoch for which we didn't explicitly requested support. If we already requested support for it, it will be present in for_self_by[self][msg.from].epoch
.
I confirmed by running the model check with this being the ACK if block:
assert for_self_by[self][msg.from].epoch >= msg.epoch;
if for_self_by[self][msg.from].epoch = msg.epoch then
if for_self_by[self][msg.from].end_time < msg.end_time then
for_self_by[self][msg.from].end_time := msg.end_time;
end if;
end if;
Though I'm not sure the epoch equality check is also necessary (the model check succeeds without it). Can we get an ACK from an older epoch with a higher end time? Maybe only if we decrease the value of HeartbeatInterval
midway through the execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushing some comments from yesterday since I got interrupted. I will finish looking today.
Reviewed 1 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
docs/tla-plus/StoreLiveness/StoreLiveness.cfg
line 10 at r1 (raw file):
HeartbeatInterval = 2 MsgHeartbeat = MsgHeartbeat
pluscal question: what does this mean? Is this a model value https://learntla.com/core/constants.html#model-values?
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 46 at r1 (raw file):
EpochSupportExpired(map, i, j, supporter_time) == map[i][j].end_time < supporter_time \* Is i's support from j (according to i's for_self_by map) expired (according to j's clock)? ForSelfByEpochSupportExpired(i, j) == EpochSupportExpired(for_self_by, i, j, clocks[j])
So this can't be computed in a real system, since it is using the for_self_by state at i and the clock at j, yes? Presumably this is defined for an invariant.
I was expecting there would also be a function that is evaluatable at i, which uses for_self_by state at i and the clock at i, for i to realize it has lost support.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 54 at r1 (raw file):
\* Can i withdraw support for j? CanInvlidateBySelfFor(i, j) == BySelfForEpochValid(i, j) /\ BySelfForEpochSupportExpired(i, j)
nit: invalidate
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 64 at r1 (raw file):
\A j \in Nodes \ {i}: ForSelfByEpochValid(i, j) => (SupportUpheld(i, j) \/ ForSelfByEpochSupportExpired(i, j))
makes sense. This also covers the case where i unilaterally increments its for_self_by[i][j].epoch before the expiry time (though I forget whether there was a need for that in the latest design iteration).
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 71 at r1 (raw file):
\A j \in Nodes \ {i}: current_epoch[i] >= for_self_by[i][j].epoch end define;
we should consider adding an invariant to ensure that the withdrawal epoch does not run away from the original epoch at a node.
by_self_for[i][j].epoch > for_self_by[j][i].epoch /\ by_self_for[i][j].end_time = 0 => by_self_for[i][j].epoch - or_self_by[j][i].epoch <= 1
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 82 at r1 (raw file):
end macro macro send_heartbeat(to)
I suppose we don't need to model the HLC time causality for safety purposes -- for this scheme it helps only for liveness.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 118 at r1 (raw file):
macro restart() begin current_epoch[self] := current_epoch[self];
was this meant to increment? Currently looks like a noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, and @nvanbenschoten)
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
Can we get an ACK from an older epoch with a higher end time? Maybe only if we decrease the value of
HeartbeatInterval
midway through the execution.
The earlier design iterations allowed for that. And I think the latest one needs to to because as you mention we can change the HeartbeatInterval
, and the node could restart and forget that it had support for an earlier epoch that ended at a later time.
3636046
to
80843bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.cfg
line 5 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
If the durability property holds for two nodes, it should hold for any larger system, between any two nodes pair-wise. So it doesn't seem necessary to check this for more than two nodes, right?
Yes, that's correct. In fact, we probably don't even need to heartbeat in two directions in this spec.
docs/tla-plus/StoreLiveness/StoreLiveness.cfg
line 10 at r1 (raw file):
Yes, it is a model value. I'm using these here to serve as enum variants. It's strange to me that these need to be declared in the .cfg file, but the model checker requires it.
The constant parameter MsgHeartbeat is not assigned a value by the configuration file.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 46 at r1 (raw file):
Right, this cannot be computed in a real system and is only used to define the invariant.
I was expecting there would also be a function that is evaluatable at i, which uses for_self_by state at i and the clock at i, for i to realize it has lost support.
Such a function could exist, but it's not needed now that a node never needs to invalidate its own support at the store liveness level.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 54 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: invalidate
Done.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 64 at r1 (raw file):
Previously, sumeerbhola wrote…
makes sense. This also covers the case where i unilaterally increments its for_self_by[i][j].epoch before the expiry time (though I forget whether there was a need for that in the latest design iteration).
There is only a need for that in the current design after a restart.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 71 at r1 (raw file):
Previously, sumeerbhola wrote…
we should consider adding an invariant to ensure that the withdrawal epoch does not run away from the original epoch at a node.
by_self_for[i][j].epoch > for_self_by[j][i].epoch /\ by_self_for[i][j].end_time = 0 => by_self_for[i][j].epoch - or_self_by[j][i].epoch <= 1
Good idea, done.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 118 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
Aren't we supposed to increment the epoch upon restart?
Yes, I must have broken this when trying to get invariants to fire. Fixed.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 118 at r1 (raw file):
Previously, sumeerbhola wrote…
was this meant to increment? Currently looks like a noop.
Yes, fixed.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 160 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
It is possible that
by_self_for[self][msg.from].end_time > msg.end_time
if the message was an old heartbeat for the same epoch, right? So, we don't want to regress the end time. Might be worth a comment.
Correct, we allow out-of-order delivery, so we don't want to regress the end_time. I've added a comment and confirmed that the DurableSupport
invariant and the new SupportProvidedLeadsSupportAssumedInvariant
both fail without this protection.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 171 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
This behavior differs from the algorithm in the doc. Presumably because we might receive a
MsgHeartbeatResp
that corresponds to an olderMsgHeartbeat
? Is this just an artifact of TLA+ not supporting RPC?
By "supporting RPC", are you referring to preventing out-of-order delivery of requests and responses? I don't think we want our implementation to assume ordered delivery. Even with an RPC framework that provides ordered delivery, systems need to be careful for ambiguous delivery when the RPC stream breaks and then the potential for out-of-order delivery after a new stream is created.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
Previously, sumeerbhola wrote…
Can we get an ACK from an older epoch with a higher end time? Maybe only if we decrease the value of
HeartbeatInterval
midway through the execution.The earlier design iterations allowed for that. And I think the latest one needs to to because as you mention we can change the
HeartbeatInterval
, and the node could restart and forget that it had support for an earlier epoch that ended at a later time.
Good point. I added the assertion and simplified this logic for now. I was thinking that there may be cases where we have requested support for a new epoch, but have not yet updated for_self_by
. That's not true though, given how the spec is written.
We could make the heartbeat protocol more flexible to allow for such an ACK, to avoid two RTTs when support has been withdrawn. In doing so, we would remove the need for NACKs. To do so, instead of heartbeating with a specific epoch, we could heartbeat with a minimum epoch. The receiver could then take the maximum of the epoch in the heartbeat and its own local epoch to determine what epoch to support.
I think we probably don't want to do this though, as it means that a delayed message after a node has crashed and other nodes have withdrawn support for it could allow the dead node to temporarily regain support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 171 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
By "supporting RPC", are you referring to preventing out-of-order delivery of requests and responses? I don't think we want our implementation to assume ordered delivery. Even with an RPC framework that provides ordered delivery, systems need to be careful for ambiguous delivery when the RPC stream breaks and then the potential for out-of-order delivery after a new stream is created.
Yeah, that makes sense. It would be nice to be able to assume matching requests and responses. E.g. if I heartbeat epoch 5, epoch 6, and epoch7, when I receive 3 responses (say, ACK, NACK, ACK), I would know that the first ACK was for epoch 7, the NACK was for epoch 5, and the other ACK was for epoch 6. Even though the responses are received in a different order than the requests were sent, we know which response corresponds to which request. Does our RPC framework guarantee this? I think message passing in TLA+, and in general, doesn't.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Good point. I added the assertion and simplified this logic for now. I was thinking that there may be cases where we have requested support for a new epoch, but have not yet updated
for_self_by
. That's not true though, given how the spec is written.
We could make the heartbeat protocol more flexible to allow for such an ACK, to avoid two RTTs when support has been withdrawn. In doing so, we would remove the need for NACKs. To do so, instead of heartbeating with a specific epoch, we could heartbeat with a minimum epoch. The receiver could then take the maximum of the epoch in the heartbeat and its own local epoch to determine what epoch to support.
I think we probably don't want to do this though, as it means that a delayed message after a node has crashed and other nodes have withdrawn support for it could allow the dead node to temporarily regain support.
If we're adding significant logic to handle changes in HeartbeatInterval
, do we want to model increasing/decreasing the HeartbeatInterval
here? I'm not sure how much state-space blowup it will cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 171 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
Yeah, that makes sense. It would be nice to be able to assume matching requests and responses. E.g. if I heartbeat epoch 5, epoch 6, and epoch7, when I receive 3 responses (say, ACK, NACK, ACK), I would know that the first ACK was for epoch 7, the NACK was for epoch 5, and the other ACK was for epoch 6. Even though the responses are received in a different order than the requests were sent, we know which response corresponds to which request. Does our RPC framework guarantee this? I think message passing in TLA+, and in general, doesn't.
gRPC exposes a unary request/response interface which we were planning to use for store liveness, which does make this mapping clear. That could allow us to omit some information from responses (e.g. the epoch being heartbeated, maybe even the end_time).
However, we may not want to do this even if we can. A request/response interface can look single-threaded and hide the need to handle out-of-order or dropped messages on the request path or the response path. This can lead to a more fragile, harder to test implementation.
As a point of comparison, the raft library does not assume request/response pairs, which mostly is a good thing.
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
If we're adding significant logic to handle changes in
HeartbeatInterval
, do we want to model increasing/decreasing theHeartbeatInterval
here? I'm not sure how much state-space blowup it will cause.
We can try. I think it's a good idea.
80843bf
to
64ed3b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @miraradeva, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
If we're adding significant logic to handle changes in
HeartbeatInterval
Added. This required me to remove the by_self_for[self][msg.from].end_time < msg.end_time
assertion in ReceiveHeartbeat
. I added a comment describing why end_time regressions across epochs are safe. Want to convince yourself that you agree?
I'm not sure how much state-space blowup it will cause.
I dropped the MaxClock
down from 5 to 4 to help combat this. Model checking is still reasonable at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @sumeerbhola)
docs/tla-plus/StoreLiveness/StoreLiveness.tla
line 173 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're adding significant logic to handle changes in
HeartbeatInterval
Added. This required me to remove the
by_self_for[self][msg.from].end_time < msg.end_time
assertion inReceiveHeartbeat
. I added a comment describing why end_time regressions across epochs are safe. Want to convince yourself that you agree?I'm not sure how much state-space blowup it will cause.
I dropped the
MaxClock
down from 5 to 4 to help combat this. Model checking is still reasonable at this point.
Looks great, thanks!
This commit adds a new TLA+ specification (expressed in PlusCal) called StoreLiveness. StoreLiveness is a specification for the Store Liveness fabric, which will sit below Raft and power the leader-lease mechanism. The central safety property of Store Liveness is Durable Support. A node in the Store Liveness fabric can receive support from another node with an associated "end time" and be confident that this support will not be withdrawn until that supporting node's clock exceeds the end time. This property allows us to build lease disjointness, as we can be sure that a future lease will have a start time that is greater than the end time of the previous lease's corresponding store liveness support. Epic: None Release note: None
64ed3b6
to
ef95359
Compare
@miraradeva I'll merge this for now. We can iterate on it in other PRs if we want to augment store liveness with state like the bors r=miraradeva |
👎 Rejected by code reviews |
bors r=miraradeva |
This commit adds a new TLA+ specification (expressed in PlusCal) called StoreLiveness. StoreLiveness is a specification for the Store Liveness fabric, which will sit below Raft and power the leader-lease mechanism.
The central safety property of Store Liveness is Durable Support. A node in the Store Liveness fabric can receive support from another node with an associated "end time" and be confident that this support will not be withdrawn until that supporting node's clock exceeds the end time. This property allows us to build lease disjointness, as we can be sure that a future lease will have a start time that is greater than the end time of the previous lease's corresponding store liveness support.
Epic: None
Release note: None