Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

Merged
merged 2 commits into from Apr 25, 2023

Conversation

mattl-netflix
Copy link
Contributor

Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

…have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.
Copy link
Contributor

@chengw-netflix chengw-netflix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

private boolean newTokenIsADuplicate(String newToken, ImmutableSet<PriamInstance> instances) {
for (PriamInstance priamInstance : instances) {
if (newToken.equals(priamInstance.getToken())) {
Preconditions.checkState(!myInstanceInfo.getRegion().equals(priamInstance.getDC()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a WARN log here for better understanding the cause of the duplicate tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

It will crash and log on failure regardless. But you bring up a great point: the failure because two region Strings don't match is totally confusing without looking at the code. I have updated the code to print a clearer error message.

});
int instanceCount = membership.getRacCount() * membership.getRacMembershipSize();
String newToken = tokenManager.createToken(mySlot, instanceCount, myRegion);
while (newTokenIsADuplicate(newToken, allIds)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to your PR but: do we need to worry about atomicity between the check here and the create below? E.g. if this is backed by Cassandra, do we need to use LWTs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally we don't use LWTs, but Contention is accounted for by "acquiring a lock" in a separate table before writing. Note that contention is managed based on the token's relative position within the region rather than the token value itself. That prevents this change introducing the possibility of multiple tokens that are one apart in the same region.

private boolean newTokenIsADuplicate(String newToken, ImmutableSet<PriamInstance> instances) {
for (PriamInstance priamInstance : instances) {
if (newToken.equals(priamInstance.getToken())) {
if (myInstanceInfo.getRegion().equals(priamInstance.getDC())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear as to why we throw an IllegalStateException in this case vs the case where the DCs are different - IIUC, duplicates across DCs never happen because of the region hash offset? So it feels that that's the case we should throw vs return false

Also, I'm having trouble tracing the entire code path given the multiple impls, but is it possible that InstanceInfo#getRegion() will return us-east-1 for us-east-1 while PriamInstance#getDC will return us-east?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, duplicates across DCs never happen because of the region hash offset?

Duplicates across DCs can happen if the region hash is off by one and the tokens between regions are off by one in the opposite direction. That scenario can result if the cluster has been doubled using Priam's doubling facilities. The purpose of this PR is to adapt to this scenario. That said, there should never be in-region duplicates and if there are we need to fail immediately. In that case we should not increment until there are no duplicates because that will lead to a situation where tokens are one apart within the same region. That would cause a data imbalance.

is it possible that InstanceInfo#getRegion() will return us-east-1 for us-east-1 while PriamInstance#getDC will return us-east?

This is an interesting concern, but I don't believe it is a threat. The only place we ever convert away from AWS region names is in an internal tuning use case. I can go into greater depth about that this morning if you're interested.

@mattl-netflix mattl-netflix merged commit 457bd3a into 3.11 Apr 25, 2023
1 check passed
mattl-netflix added a commit that referenced this pull request Jul 13, 2023
…have been applying manually. (#1048)

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

* Improve error message on intra-regional duplicate token.
mattl-netflix added a commit that referenced this pull request Sep 6, 2023
…have been applying manually. (#1048)

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

* Improve error message on intra-regional duplicate token.
mattl-netflix added a commit that referenced this pull request Sep 15, 2023
* Remove redundant interfaces and swap log and notification lines (#1019)

* Remove EventGenerator Interface.

* Remove EventObserver Interface.

* Remove BackupEvent Interface.

* Send SNS notification on backup after logging to account for the possibility of an Exception while trying to notify.

* Use synchronized list for thread-safety (#1018)

This list of PartETag is modified on multiple threads, so it
needs to be thread-safe.
S3FileSystem already uses a synchronized list, so do the
same here.

* Log backup failures rather than ignoring them. (#1025)

* Update CHANGELOG in advance of 3.11.95

* Print cleaner stack trace on failure to upload. (#1027)

* Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility

* Update CHANGELOG.md

* Reveal property to enable auto_snapshot. (#1031)

* Fix backup verification race condition causing missing notifications (#1034)

* Remove metaproxy validation it is never null in practice.

* Remove DateRange validation. It is never null in practice.

* Remove debug logging.

* Remove latest backup metadata validation. It is never null in practice.

* Consolidate repeated code into private verifyBackup.

* Change method names to better reflect what they do.

* Update latestResult wherever possible.

* Rewrite logic in findLatestVerfiedBackup to make it look more like verifyBackupsInRange.

* Change signature of BackupNotificationMgr.notify to not depend on BackupVerificationResult.

* Return all verified BackupMetadata instead of BackupVerificationResult when verifying en masse. It has enough information to skip the call to find the most recently verified backup.

Also, fix some tests that broke in this process: remove the check for the snapshot time in TestBackupVerification that only makes sense when the Path is for a file that does not exist. Also, mock the appropriate functions in MockBackupVerification in TestBackupVerificationTask.

* Rename findLatestVerifiedBackup responding to review comments.

* Reveal hook to allow operators to restore just to the most recent snapshot (#1035)

* Remove unused code.

* Remove redundant comments and vertical whitespace.

* Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style.

* Use final where applicable and remove it where redundant.

* Remove redundant BackupRestoreException from getIncrementals method signature.

* Split getting incremental files and snapshot files into separate methods.

* Reveal hook to allow operators to restore to the last valid snapshot.

* Remove added non-shaded Guava dependency pursuant to review comments.

* minor code modifications to simplify the nfpriam spring boot migration

* Update CHANGELOG.md

* Update CHANGELOG.md

* make the constructor public

* Update CHANGELOG.md

* remove the instance info from the DI (#1042)

* Update CHANGELOG.md

* Always TTL backups. (#1038)

* Fix Github CI by explicitly creating necessary directories. (#1045)

* Change the interface of PriamScheduler (#1049)

Change the interface of PriamScheduler

* minor name change (#1051)

* Update CHANGELOG.md

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. (#1048)

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

* Improve error message on intra-regional duplicate token.

* Update CHANGELOG in advance of 3.11.101

* Rollback #1042: Change the interface of EC2RoleAssumptionCredential (#1052)

* Fix snapshot location regression in SNS messages. (#1054)

* Update CHANGELOG in advance of 3.11.103

* change the CassandraMonitor to public (#1056)

* Update CHANGELOG.md

* Add new constructor (#1064)

* Update CHANGELOG.md

* Add disk_failure_policy config (#1065)

* Update CHANGELOG.md

* fix Gson serilization issue (#1067)

* Update CHANGELOG.md

* Make block_for_peers_timeout_in_secs a first-class tunable. (#1069)

* Update CHANGELOG in advance of 3.11.108

* Fix TokenRetrieverTest

---------

Co-authored-by: Ammar Khaku <akhaku@users.noreply.github.com>
Co-authored-by: Cheng Wang <chengw@netflix.com>
Co-authored-by: Cheng Wang <107727158+chengw-netflix@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants