Skip to content

Fix topology replay during bootstrap and startup, decouple Accord from TCM#3781

Closed
ifesdjeen wants to merge 28 commits intoapache:cep-15-accordfrom
ifesdjeen:CASSANDRA-20142
Closed

Fix topology replay during bootstrap and startup, decouple Accord from TCM#3781
ifesdjeen wants to merge 28 commits intoapache:cep-15-accordfrom
ifesdjeen:CASSANDRA-20142

Conversation

@ifesdjeen
Copy link
Copy Markdown
Contributor

Fix topology replay during bootstrap and startup, decouple Accord from TCM

Includes multiple changes, primary ones:

  • Make HarrySimulatorTest work with Accord
  • Removed nodes now live in TCM, no need to discover historic epochs in order to find removed nodes
  • CommandStore <-> RangesForEpochs mappings required for startup are now stored in journal, and CS can be set up without topology replay
  • Topology replay is fully done via journal (where we store topologies themselves), and topology metadata table (where we store redundant/closed information)
  • Fixed various bugs related to propagation and staleness
    • TCM was previously relied on for "fetching" epoch: we can not rely on it as there's no guarantee we will see a consecutive epoch when grabbing Metadata#current
    • Redundant / closed during replay was set with incorrect ranges in 1 of the code paths
    • TCM was contacted multiple times for historical epochs, which made startup much longer under some circumstances

…m TCM

Includes multiple changes, primary ones:
  * Make HarrySimulatorTest work with Accord
  * Removed nodes now live in TCM, no need to discover historic epochs in order to find removed nodes
  * CommandStore <-> RangesForEpochs mappings required for startup are now stored in journal, and CS can be set up _without_ topology replay
  * Topology replay is fully done via journal (where we store topologies themselves), and topology metadata table (where we store redundant/closed information)
  * Fixed various bugs related to propagation and staleness
     * TCM was previously relied on for "fetching" epoch: we can not rely on it as there's no guarantee we will see a consecutive epoch when grabbing Metadata#current
     * Redundant / closed during replay was set with incorrect ranges in 1 of the code paths
     * TCM was contacted multiple times for historical epochs, which made startup much longer under some circumstances
PAXOS_REPAIR (false, "PaxosRepairStage", "internal", FBUtilities::getAvailableProcessors, null, Stage::multiThreadedStage),
INTERNAL_METADATA (false, "InternalMetadataStage", "internal", FBUtilities::getAvailableProcessors, null, Stage::multiThreadedStage),
FETCH_LOG (false, "MetadataFetchLogStage", "internal", () -> 1, null, Stage::singleThreadedStage),
FETCH_METADATA(false, "MetadataFetchLogStage", "internal", () -> 1, null, Stage::singleThreadedStage),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting?

TCM_COMMIT_REQ (802, P0, rpcTimeout, INTERNAL_METADATA, MessageSerializers::commitSerializer, () -> commitRequestHandler(), TCM_COMMIT_RSP ),
TCM_FETCH_CMS_LOG_RSP (803, P0, rpcTimeout, FETCH_LOG, MessageSerializers::logStateSerializer, RESPONSE_HANDLER ),
TCM_FETCH_CMS_LOG_REQ (804, P0, rpcTimeout, FETCH_LOG, () -> FetchCMSLog.serializer, () -> fetchLogRequestHandler(), TCM_FETCH_CMS_LOG_RSP ),
TCM_FETCH_CMS_LOG_RSP (803, P0, rpcTimeout, FETCH_METADATA, MessageSerializers::logStateSerializer, RESPONSE_HANDLER ),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

diskStateManager.loadLocalTopologyState((epoch, syncStatus, pendingSyncNotify, remoteSyncComplete, closed, redundant) -> {
getOrCreateEpochState(epoch).setSyncStatus(syncStatus);
if (syncStatus == SyncStatus.NOTIFYING)
switch (syncStatus)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either handle all cases and throw exception on unhandled, or simply test syncStatus == NOTIFYING?

Should we be doing anything in the NOT_STARTED case? We might need to fetch information from peers about what has been synced, though this will naturally start arriving from later epochs, so perhaps not a problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right; should have only been == NOTIFYING here.

For now, we can do nothing with NOT_STARTED case. But I will add this area of code to the list of things that need audit.

toComplete[i] = it.nextLong();
Arrays.sort(toComplete);
for (long epoch : toComplete)
listener.onEndpointAck(removed, epoch);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to move this and the below listener notifications outside of the synchronised block?

Copy link
Copy Markdown
Contributor

@belliottsmith belliottsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 once feedback addressed

@ifesdjeen ifesdjeen closed this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants