Allow servers to compact their logs once all events have been acked #31

kuujo · 2015-10-23T05:25:11Z

This PR modifies the way logs are committed/compacted to fix #25.

Currently, Copycat does not allow servers to compact any segment that contains entries that haven't been stored on 100% of the servers in the cluster. The reasoning behind this decision was because session events are dependent on commands being applied to the state machine. When events are published to a session, they're held in memory on each server until received by the client. Events are sequenced in a manner similar to how the leader sequences entries to followers. When an event batch is sent to a client, the server that sends the batch with the index of the previous event. The client verifies that it received the previous event and acks the new event in the response. While it's normally safe for a server to take a snapshot before other servers have applied a command, the dependency of events on commands means if one server doesn't apply all commands for each session, servers can end up with inconsistent event histories. Thus, in order to avoid this case, Copycat currently requires that a command be persisted on all servers before it can be removed from the log.

But this approach is not ideal for obvious reasons. As long as at least one server in a configuration is down, no server can compact its logs. This PR proposes a fix to allow servers to compact their logs once all events have been received by their respective clients.

Servers already track which events have been received by which clients through keep-alive requests. When a client sends a KeepAliveRequest for an open session, it sends with it the index of the highest event for which it has received a batch. Event indexes are based on the commands by which events were triggered. For instance, if the application of a command at index 100 triggers the publishing of an event to 5 sessions, the index of that event will be 100. Once each session has received event 100 and sends a KeepAliveRequest, a KeepAliveEntry with the highest index received by the client is logged, replicated, committed, and applied to the internal state machine.

So, since the system already logs and replicates information about the highest event index received for each session, and because event indexes are based on log indexes, we can use the lowest acked event index for all sessions as the basis for safe log compaction. When a client submits a KeepAliveRequest and an associated KeepAliveEntry is committed and applied, each server recalculates the lowest index received by any session. Session IDs are initialized to the index of the session's RegisterEntry, and so each session's event index is also initialized to that index as well (by definition, sessions can't receive events prior to their registration). This ensures that the lowest received index is not reset to 0 when a new session is registered. When the lowest index received by all sessions is recalculated, the log is updated to make segments containing earlier entries available for compaction.

kuujo · 2015-10-23T06:05:35Z

So, there's actually still a problem with this algorithm. Consider the following scenario:

A client registers a session at index 50 in the log
The client's session is initialized with session ID 50 and event index 50
A server applies a command at index 100 which publishes event 100 to the client
The client acks event index 100
The server then compacts index 100 from its log and crashes before applying any more commands or publishing any more events
The server recovers and replays its log up to index 100
The client reconnects to the server
The server then applies a command at index 101 which publishes another event to the client's session
That event will be published with index 101 and previous index 50 (the session's initial event index), while the client's previous index will still be 100

There are a couple ways this could be handled. First, servers could keep only compact logs up to the previous command for which an event was published to the client. However, the downside to this is that a session that receives events infrequently could keep the log from being compacted. If a session receives one event over its entire lifetime, logs can't be compacted until the session is unregistered. In that case, the globalIndex could be reintroduced to compact commands that are stored on all servers.

But a better way to do this might be to use the client's KeepAliveEntry event index as the previous index if the event index is greater than the previous event index for the server's session state. In other words, in the scenario above, since the client acked event 100 in a keep-alive request, that index can be used as the previous index for the event. Safety: because commands will never be compacted until a KeepAliveEntry with a given index has been logged for all sessions, commands between any client's last KeepAliveRequest and the end of the log should never be missing. Therefore, KeepAliveEntry should always indicate the previous event index.

There's also a liveness issue here with respect to log compaction. If log compaction is dependent on clients acking events received via their session, that implies that a client must have events to ack. However, this may frequently not be the case. Some mechanism is necessary to ensure that log compaction advances for sessions that don't receive event messages or infrequently receive events. If a client never receives published events, how can it acknowledge that it knows about an event index?

To resolve this final issue, I propose using information about the event history following the last event received by a client. Once a client acks an event at index i, it essentially acknowledges receipt of all events up to the next published event. So, in the scenario above, once the client acks event 101, if the next message published is at index 201, then the client effectively acknowledged receipt of events 102-200 as well. If no events have been published to a session since the last acked event index, then the event index for that session is equivalent to lastApplied.

…index if no events are queued.

…d command outputs in the internal state machine.

kuujo added 2 commits October 22, 2015 21:38

#25: Allow log compaction without depending on 100% replication.

c51856d

Ensure timestamps for commands are monotonically increasing.

4c32a1d

kuujo added 9 commits October 23, 2015 00:28

Ensure previous event index is properly tracked after compaction.

f975f7b

Ensure last applied index for each session is used as the last event …

9c6e1a2

…index if no events are queued.

Ensure command ConsistencyLevel is taken into account for deduplicate…

fb15f4f

…d command outputs in the internal state machine.

Fix merge conflicts.

724fb30

Fix incorrect entry cast in leader unregister handling.

f66221b

Ensure UnregisterEntry is applied in a synchronous context.

508d118

Remove unused variable from ServerState.

00e24d2

Ensure events are resent for inconsistent event indexes.

7b06eb0

Fix improper removal of queued events when calculating event index.

1890fb0

kuujo added the enhancement label Oct 25, 2015

kuujo merged commit 1890fb0 into master Oct 25, 2015

kuujo mentioned this pull request Oct 28, 2015

Ensure tombstones are applied on all servers before being removed from the log #38

Closed

kuujo deleted the committed-compaction branch October 28, 2015 06:37

kuujo mentioned this pull request Oct 29, 2015

Ensure tombstones are retained in the log until applied on all servers #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow servers to compact their logs once all events have been acked #31

Allow servers to compact their logs once all events have been acked #31

kuujo commented Oct 23, 2015

kuujo commented Oct 23, 2015

Allow servers to compact their logs once all events have been acked #31

Allow servers to compact their logs once all events have been acked #31

Conversation

kuujo commented Oct 23, 2015

kuujo commented Oct 23, 2015