Skip to content
This repository has been archived by the owner on Dec 19, 2017. It is now read-only.

Allow servers to compact their logs once all events have been acked #31

Merged
merged 11 commits into from
Oct 25, 2015

Conversation

kuujo
Copy link
Member

@kuujo kuujo commented Oct 23, 2015

This PR modifies the way logs are committed/compacted to fix #25.

Currently, Copycat does not allow servers to compact any segment that contains entries that haven't been stored on 100% of the servers in the cluster. The reasoning behind this decision was because session events are dependent on commands being applied to the state machine. When events are published to a session, they're held in memory on each server until received by the client. Events are sequenced in a manner similar to how the leader sequences entries to followers. When an event batch is sent to a client, the server that sends the batch with the index of the previous event. The client verifies that it received the previous event and acks the new event in the response. While it's normally safe for a server to take a snapshot before other servers have applied a command, the dependency of events on commands means if one server doesn't apply all commands for each session, servers can end up with inconsistent event histories. Thus, in order to avoid this case, Copycat currently requires that a command be persisted on all servers before it can be removed from the log.

But this approach is not ideal for obvious reasons. As long as at least one server in a configuration is down, no server can compact its logs. This PR proposes a fix to allow servers to compact their logs once all events have been received by their respective clients.

Servers already track which events have been received by which clients through keep-alive requests. When a client sends a KeepAliveRequest for an open session, it sends with it the index of the highest event for which it has received a batch. Event indexes are based on the commands by which events were triggered. For instance, if the application of a command at index 100 triggers the publishing of an event to 5 sessions, the index of that event will be 100. Once each session has received event 100 and sends a KeepAliveRequest, a KeepAliveEntry with the highest index received by the client is logged, replicated, committed, and applied to the internal state machine.

So, since the system already logs and replicates information about the highest event index received for each session, and because event indexes are based on log indexes, we can use the lowest acked event index for all sessions as the basis for safe log compaction. When a client submits a KeepAliveRequest and an associated KeepAliveEntry is committed and applied, each server recalculates the lowest index received by any session. Session IDs are initialized to the index of the session's RegisterEntry, and so each session's event index is also initialized to that index as well (by definition, sessions can't receive events prior to their registration). This ensures that the lowest received index is not reset to 0 when a new session is registered. When the lowest index received by all sessions is recalculated, the log is updated to make segments containing earlier entries available for compaction.

@kuujo
Copy link
Member Author

kuujo commented Oct 23, 2015

So, there's actually still a problem with this algorithm. Consider the following scenario:

  • A client registers a session at index 50 in the log
  • The client's session is initialized with session ID 50 and event index 50
  • A server applies a command at index 100 which publishes event 100 to the client
  • The client acks event index 100
  • The server then compacts index 100 from its log and crashes before applying any more commands or publishing any more events
  • The server recovers and replays its log up to index 100
  • The client reconnects to the server
  • The server then applies a command at index 101 which publishes another event to the client's session
  • That event will be published with index 101 and previous index 50 (the session's initial event index), while the client's previous index will still be 100

There are a couple ways this could be handled. First, servers could keep only compact logs up to the previous command for which an event was published to the client. However, the downside to this is that a session that receives events infrequently could keep the log from being compacted. If a session receives one event over its entire lifetime, logs can't be compacted until the session is unregistered. In that case, the globalIndex could be reintroduced to compact commands that are stored on all servers.

But a better way to do this might be to use the client's KeepAliveEntry event index as the previous index if the event index is greater than the previous event index for the server's session state. In other words, in the scenario above, since the client acked event 100 in a keep-alive request, that index can be used as the previous index for the event. Safety: because commands will never be compacted until a KeepAliveEntry with a given index has been logged for all sessions, commands between any client's last KeepAliveRequest and the end of the log should never be missing. Therefore, KeepAliveEntry should always indicate the previous event index.

There's also a liveness issue here with respect to log compaction. If log compaction is dependent on clients acking events received via their session, that implies that a client must have events to ack. However, this may frequently not be the case. Some mechanism is necessary to ensure that log compaction advances for sessions that don't receive event messages or infrequently receive events. If a client never receives published events, how can it acknowledge that it knows about an event index?

To resolve this final issue, I propose using information about the event history following the last event received by a client. Once a client acks an event at index i, it essentially acknowledges receipt of all events up to the next published event. So, in the scenario above, once the client acks event 101, if the next message published is at index 201, then the client effectively acknowledged receipt of events 102-200 as well. If no events have been published to a session since the last acked event index, then the event index for that session is equivalent to lastApplied.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow log compaction for all committed entries
1 participant