-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Cassandra 18954 #2839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
jacek-lewandowski
wants to merge
229
commits into
apache:cep-21-tcm
from
jacek-lewandowski:CASSANDRA-18954
Closed
Cassandra 18954 #2839
jacek-lewandowski
wants to merge
229
commits into
apache:cep-21-tcm
from
jacek-lewandowski:CASSANDRA-18954
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Members of the ClusterMetadataService (CMS) replicate the global log table, and are responsible for linearizing inserts into the the log. Log entries contain transformations which are applied to ClusterMetadata in the prescribed order. Log entries are replicated to non-CMS members only after being ordered and inserted into the log. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Schema itself is a component of ClusterMetadata and all DDL updates are applied by the CMS inserting a log entry containing a schema transformation. As the log entries are disseminated around the cluster, each peer applies the transformation to its local ClusterMetadata, enacting the schema change. This entails some changes to the way the database objects represented in schema are intialised (db objects refers to classes like Keyspace, ColumnFamilyStore, etc). Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Adds a new Directory component to ClusterMetadata to manage member identity, state location and addressing. This duplicates some of the functions of TokenMetadata, Topology et al but with updates performed consistently via the global log. Although it isn't actually used for anything yet it is a prerequisite for managing data ownership through TCM, which will eventually replace TokenMetadata completely. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Introduce new classes for representing placement of data ranges on replicas, along with the movement of data via transitions from one placement to the next. Eventually, these placements will be statically calculated in response to events with alter either the topology of the cluster (i.e. adding/removing/moving nodes) or the replication profile of the data itself (i.e. creating/altering keyspaces). These triggering events will be distributed and enacted consistently using the global log. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Minimal modifications to AbstractReplicationStrategy implementations to support the production of DataPlacements using ClusterMetadata while retaining calculateNaturalReplicas. Also adds tests to compare the output of both methods and assert their equivalence. Eventually, the original implementations based on TokenMetadata will be retired and will be retained in the test source to guard against regressions. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Introduces transformations to modify ClusterMetadata with the affect of modifying data ownership and placement i.e. join, replace, move & decommission. These operations do not simply modify metadata however, so simple atomic updates to cluster metadata are not sufficient. Streaming data in and out of nodes must also occur and obviously read and write operations must continue whilst this is in progress. These operations then are performed in phases, planned in advance and properly linearized using the global log. e.g. to join an new node, the full set of phased range movements required is calculated, generating an actionable plan which is then serialised into ClusterMetadata. Concurrent operations are permitted as long as they only affect disjoint token ranges, ensuring that concurrent range movements remain safe and cluster invariants are preserved at all times, including in the case of the failure to complete any operation (i.e. failed bootstrap). This commit only adds the transformations and supporting TCM components (LockedRanges, InProgressSequences etc), the implementation of actually performing the operations follows in subsequent commits. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
WIP commit (i.e. does not compile) beginning the process of removing gossip as the source of truth regarding membership, ownership, topology and data placement. This task will be split over mutiple commits. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
WIP commit (i.e. does not compile) replacing initial toy implementation of CMS membership with proper implementation. Membership of the CMS is determined by ownership of keyspaces with the META replication strategy (more precisely, by being a member of the _read_ placements for meta strategy keyspaces, a node is considered a member of the CMS). Also implements more of the "real" [pre]initialization of the CMS, in preparation for supporting upgrading a running cluster from a gossip based system. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 1 of 7 commits applying the main changes migrate StorageService away from managing state using TokenMetadata with updates propagated using gossip. This commit makes the initial bulk changes to StorageService itself and thoroughly breaks compilation. Subsequent commits in the series fix the main build before test code is updated later. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 2 of 7 moves most of the data placement/ownership code over to the TCM structures. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 3 of 7 adds the ability to detect version mismatches between peers on the read/write path and to handle such divergence. Lagging peers will attempt to catch up from the CMS if the coordinator in a r/w operation has seen newer metadata. Coordinators may fail writes if the cluster metadata changes while the write is in flight, if the consistency level can no longer be satisfied by the original replica plan. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 4 of 7 modifications to ColumnFamilyStore, mostly related to: * ShardBoundaries * DiskBoundaries Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 5 of 7 only compilation errors in non-test code are directly related to TokenMetadata Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 6 of 7 Completely remove TokenMetadata, the intention is to bring it back in a stripped down form, available to tests only, so we can continue to verify equivalence between old and new code. Test code is still extremely broken at this point, but non-test code is buildable again, though almost certainly not actually runnable. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Part 7 of 7 brings StorageService.operationMode back into sync with previous behaviour. Many external coordination tools depend on accessing this state via JMX, so this is an important external interface. This commit also adds a virtual version of the system.local table, as we can fully construct the data for this from ClusterMetadata, meaning we no longer the on-disk system table, though this is retained for now. In future, more system tables can be virtualised (system.peers, system_schema, etc). Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Adds new nodetool commands to: * list members of the CMS * initiate a snapshot of ClusterMetadata via submitting a SealPeriod operation Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Adds a handful of implementations to subclasses in the org.apache.cassandra.utils.concurrent package Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Adds a property for use in tests and debugging which preserves the stacktrace of when a thread is created by NamedThreadFactory. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Following an upgrade, nodes in an existing cluster will enter a minimal modification mode. In this state, the set of allowed cluster metadata modifications is constrained to include only the addition, removal and replacement of nodes, to allow failed hosts to be replaced during the upgrade. In this mode the CMS has no members and each peer maintains its own ClusterMetadata independently. This metadata is intitialised at startup from system tables and gossip is used to propagate the permitted metadata changes. When the operator is ready, one node is chosen for promotion to the initial CMS, which is done manually via nodetool. At this point, the candidate node will propose itself as the initial CMS and attempt to gain consensus from the rest of the cluster. If successful, it verifies that all peers have an identical view of cluster metadata and initialises the distributed log with a snapshot of that metadata. Once this process is complete all future cluster metadata updates are performed via the CMS using the global log and reverting to the previous method of metadata management is not supported. Further members can and should be added to the CMS via the nodetool command. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Minimal changes to IEndpointSnitch implementations to have them pull location info from Directory. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Alter CassandraDaemon intialization to accomodate TCM and replay of the cluster metadata log. This is something of a WIP and there is clearly scope to further clean up this part of the code. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Updates the existing unit and and dtests to work with TCM. In the vast majority of cases, this just means changes to initialization or to slightly updated method signatures. In CEP-21 generally, the intention has been not to modify existing public interfaces at all and to limit any changes to code on the boundaries of internal subsystems. In addition, care has been taken to only make minimal modifications to existing tests, and to preserve their invariants. So although this commit is fairly large in terms of number of files, it's changes should be semantically quite light. Co-authored-by: Marcus Eriksson <marcuse@apache.org> Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
patch by Marcus Eriksson; reviewed by Alex Petrov and Sam Tunnicliffe for CASSANDRA-18409
patch by Marcus Eriksson; reviewed by Alex Petrov and Sam Tunnicliffe for CASSANDRA-18410
patch by Marcus Eriksson; reviewed by Alex Petrov and Sam Tunnicliffe for CASSANDRA-18412
patch by Marcus Eriksson; reviewed by Alex Petrov and Sam Tunnicliffe for CASSANDRA-18414
…r decommission patch by Alex Petrov; reviewed by Marcus Eriksson and Sam Tunnicliffe for CASSANDRA-18416
…lterSchema transformations Only when a coordinator is preparing to submit an AlterSchemaStatement to the CMS.
Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org> Co-authored-by: Marcus Eriksson <marcus_eriksson@apple.com>
Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com>
…ntly on CI, it consistently fails locally.
… startup * Don't try to connect to them with StartupClusterConnectivityChecker * Don't pre-emptively mark them as DOWN in Gossiper::waitToSettle
Co-authored-by: Sam Tunnicliffe <samt@apache.org>
Co-authored-by: Alex Petrov <oleksandr.petrov@gmail.com> Co-authored-by: Sam Tunnicliffe <samt@apache.org>
| // pause capture and resume after in applying the schema change. | ||
| schemaTransformation.enterExecution(); | ||
| if (!isReplay) | ||
| schemaTransformation.enterExecution(); |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I screwed this; will fix it
2211ddf to
ece8e96
Compare
c6a6822 to
1c5c548
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.