Fix topology replay during bootstrap and startup, decouple Accord from TCM by ifesdjeen · Pull Request #3781 · apache/cassandra

ifesdjeen · 2025-01-08T10:17:31Z

Fix topology replay during bootstrap and startup, decouple Accord from TCM

Includes multiple changes, primary ones:

Make HarrySimulatorTest work with Accord
Removed nodes now live in TCM, no need to discover historic epochs in order to find removed nodes
CommandStore <-> RangesForEpochs mappings required for startup are now stored in journal, and CS can be set up without topology replay
Topology replay is fully done via journal (where we store topologies themselves), and topology metadata table (where we store redundant/closed information)
Fixed various bugs related to propagation and staleness
- TCM was previously relied on for "fetching" epoch: we can not rely on it as there's no guarantee we will see a consecutive epoch when grabbing Metadata#current
- Redundant / closed during replay was set with incorrect ranges in 1 of the code paths
- TCM was contacted multiple times for historical epochs, which made startup much longer under some circumstances

…m TCM Includes multiple changes, primary ones: * Make HarrySimulatorTest work with Accord * Removed nodes now live in TCM, no need to discover historic epochs in order to find removed nodes * CommandStore <-> RangesForEpochs mappings required for startup are now stored in journal, and CS can be set up _without_ topology replay * Topology replay is fully done via journal (where we store topologies themselves), and topology metadata table (where we store redundant/closed information) * Fixed various bugs related to propagation and staleness * TCM was previously relied on for "fetching" epoch: we can not rely on it as there's no guarantee we will see a consecutive epoch when grabbing Metadata#current * Redundant / closed during replay was set with incorrect ranges in 1 of the code paths * TCM was contacted multiple times for historical epochs, which made startup much longer under some circumstances

belliottsmith · 2025-01-08T16:43:22Z

src/java/org/apache/cassandra/concurrent/Stage.java

    PAXOS_REPAIR       (false, "PaxosRepairStage",      "internal", FBUtilities::getAvailableProcessors,             null,                                            Stage::multiThreadedStage),
    INTERNAL_METADATA  (false, "InternalMetadataStage", "internal", FBUtilities::getAvailableProcessors,             null,                                            Stage::multiThreadedStage),
-    FETCH_LOG          (false, "MetadataFetchLogStage", "internal", () -> 1,                                         null,                                            Stage::singleThreadedStage),
+    FETCH_METADATA(false, "MetadataFetchLogStage", "internal", () -> 1, null, Stage::singleThreadedStage),


formatting?

belliottsmith · 2025-01-08T16:53:17Z

src/java/org/apache/cassandra/net/Verb.java

    TCM_COMMIT_REQ         (802, P0, rpcTimeout,      INTERNAL_METADATA,    MessageSerializers::commitSerializer,               () -> commitRequestHandler(),               TCM_COMMIT_RSP         ),
-    TCM_FETCH_CMS_LOG_RSP  (803, P0, rpcTimeout,      FETCH_LOG,            MessageSerializers::logStateSerializer,             RESPONSE_HANDLER                                 ),
-    TCM_FETCH_CMS_LOG_REQ  (804, P0, rpcTimeout,      FETCH_LOG,            () -> FetchCMSLog.serializer,                       () -> fetchLogRequestHandler(),             TCM_FETCH_CMS_LOG_RSP  ),
+    TCM_FETCH_CMS_LOG_RSP  (803, P0, rpcTimeout, FETCH_METADATA, MessageSerializers::logStateSerializer, RESPONSE_HANDLER                                 ),


belliottsmith · 2025-01-09T09:18:09Z

src/java/org/apache/cassandra/service/accord/AccordConfigurationService.java

+        diskStateManager.loadLocalTopologyState((epoch, syncStatus, pendingSyncNotify, remoteSyncComplete, closed, redundant) -> {
            getOrCreateEpochState(epoch).setSyncStatus(syncStatus);
-            if (syncStatus == SyncStatus.NOTIFYING)
+            switch (syncStatus)


Either handle all cases and throw exception on unhandled, or simply test syncStatus == NOTIFYING?

Should we be doing anything in the NOT_STARTED case? We might need to fetch information from peers about what has been synced, though this will naturally start arriving from later epochs, so perhaps not a problem.

You are right; should have only been == NOTIFYING here.

For now, we can do nothing with NOT_STARTED case. But I will add this area of code to the list of things that need audit.

belliottsmith · 2025-01-09T09:37:06Z

src/java/org/apache/cassandra/service/accord/AccordSyncPropagator.java

+            toComplete[i] = it.nextLong();
+        Arrays.sort(toComplete);
+        for (long epoch : toComplete)
+            listener.onEndpointAck(removed, epoch);


might be nice to move this and the below listener notifications outside of the synchronised block?

belliottsmith

+1 once feedback addressed

ifesdjeen added 4 commits January 7, 2025 21:14

Load topologies on start

c7d6287

Return rather than throw

1f599c3

Make topology fetching more reliable

07c3197

ifesdjeen requested a review from belliottsmith January 8, 2025 16:09

belliottsmith reviewed Jan 8, 2025

View reviewed changes

ifesdjeen added 7 commits January 8, 2025 18:11

Formatting

4dbd048

Avoid loading topologies twice

8c8f25e

Repin accord

403e98c

Simplify replay and compaction

3c4a92c

Formatting

6a9bccf

Repin accord

5c9e8ae

Repin accord

5376def

belliottsmith reviewed Jan 9, 2025

View reviewed changes

belliottsmith approved these changes Jan 9, 2025

View reviewed changes

ifesdjeen added 13 commits January 9, 2025 17:12

Address Benedict's comments

b4f160b

Fix tests

d93c27e

Fix backoff test

c9edd9d

Revert test changes

daa369a

Fix AccordConfigurationServiceTest

a3b9d60

Make iteration forward for all entities except diffs

cae2cbb

Fix tests

1bbdb2e

Cosmetics

b818f5a

Fix more tests

7ae63eb

Cleanup

c026bc7

Repin accord

a9a5983

Fix tests

3ad77ce

Repin accord

07d1e32

ifesdjeen added 4 commits January 13, 2025 21:21

Fix an issue with serializers; repin accord

71ec786

Cosmetics

d82a498

Revert changes related to message delivery

60154a7

Repin accord

927cdc6

ifesdjeen closed this Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix topology replay during bootstrap and startup, decouple Accord from TCM#3781

Fix topology replay during bootstrap and startup, decouple Accord from TCM#3781
ifesdjeen wants to merge 28 commits intoapache:cep-15-accordfrom
ifesdjeen:CASSANDRA-20142

ifesdjeen commented Jan 8, 2025

Uh oh!

belliottsmith Jan 8, 2025

Uh oh!

belliottsmith Jan 8, 2025

Uh oh!

belliottsmith Jan 9, 2025

Uh oh!

ifesdjeen Jan 9, 2025

Uh oh!

belliottsmith Jan 9, 2025

Uh oh!

belliottsmith left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ifesdjeen commented Jan 8, 2025

Uh oh!

belliottsmith Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

belliottsmith Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

belliottsmith Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

ifesdjeen Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

belliottsmith Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

belliottsmith left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants