Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

mattl-netflix · 2023-04-20T22:04:57Z

Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

…have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

chengw-netflix

LGTM.

chengw-netflix · 2023-04-21T17:00:37Z

priam/src/main/java/com/netflix/priam/identity/token/TokenRetriever.java

+    private boolean newTokenIsADuplicate(String newToken, ImmutableSet<PriamInstance> instances) {
+        for (PriamInstance priamInstance : instances) {
+            if (newToken.equals(priamInstance.getToken())) {
+                Preconditions.checkState(!myInstanceInfo.getRegion().equals(priamInstance.getDC()));


maybe add a WARN log here for better understanding the cause of the duplicate tokens?

Done.

It will crash and log on failure regardless. But you bring up a great point: the failure because two region Strings don't match is totally confusing without looking at the code. I have updated the code to print a clearer error message.

akhaku · 2023-04-24T22:44:43Z

priam/src/main/java/com/netflix/priam/identity/token/TokenRetriever.java

+                                });
+        int instanceCount = membership.getRacCount() * membership.getRacMembershipSize();
+        String newToken = tokenManager.createToken(mySlot, instanceCount, myRegion);
+        while (newTokenIsADuplicate(newToken, allIds)) {


Not strictly related to your PR but: do we need to worry about atomicity between the check here and the create below? E.g. if this is backed by Cassandra, do we need to use LWTs?

Internally we don't use LWTs, but Contention is accounted for by "acquiring a lock" in a separate table before writing. Note that contention is managed based on the token's relative position within the region rather than the token value itself. That prevents this change introducing the possibility of multiple tokens that are one apart in the same region.

akhaku · 2023-04-24T22:50:02Z

priam/src/main/java/com/netflix/priam/identity/token/TokenRetriever.java

+    private boolean newTokenIsADuplicate(String newToken, ImmutableSet<PriamInstance> instances) {
+        for (PriamInstance priamInstance : instances) {
+            if (newToken.equals(priamInstance.getToken())) {
+                if (myInstanceInfo.getRegion().equals(priamInstance.getDC())) {


I'm a little unclear as to why we throw an IllegalStateException in this case vs the case where the DCs are different - IIUC, duplicates across DCs never happen because of the region hash offset? So it feels that that's the case we should throw vs return false

Also, I'm having trouble tracing the entire code path given the multiple impls, but is it possible that InstanceInfo#getRegion() will return us-east-1 for us-east-1 while PriamInstance#getDC will return us-east?

IIUC, duplicates across DCs never happen because of the region hash offset?

Duplicates across DCs can happen if the region hash is off by one and the tokens between regions are off by one in the opposite direction. That scenario can result if the cluster has been doubled using Priam's doubling facilities. The purpose of this PR is to adapt to this scenario. That said, there should never be in-region duplicates and if there are we need to fail immediately. In that case we should not increment until there are no duplicates because that will lead to a situation where tokens are one apart within the same region. That would cause a data imbalance.

is it possible that InstanceInfo#getRegion() will return us-east-1 for us-east-1 while PriamInstance#getDC will return us-east?

This is an interesting concern, but I don't believe it is a threat. The only place we ever convert away from AWS region names is in an internal tuning use case. I can go into greater depth about that this morning if you're interested.

…have been applying manually. (#1048) * Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together. * Improve error message on intra-regional duplicate token.

* Remove redundant interfaces and swap log and notification lines (#1019) * Remove EventGenerator Interface. * Remove EventObserver Interface. * Remove BackupEvent Interface. * Send SNS notification on backup after logging to account for the possibility of an Exception while trying to notify. * Use synchronized list for thread-safety (#1018) This list of PartETag is modified on multiple threads, so it needs to be thread-safe. S3FileSystem already uses a synchronized list, so do the same here. * Log backup failures rather than ignoring them. (#1025) * Update CHANGELOG in advance of 3.11.95 * Print cleaner stack trace on failure to upload. (#1027) * Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility * Update CHANGELOG.md * Reveal property to enable auto_snapshot. (#1031) * Fix backup verification race condition causing missing notifications (#1034) * Remove metaproxy validation it is never null in practice. * Remove DateRange validation. It is never null in practice. * Remove debug logging. * Remove latest backup metadata validation. It is never null in practice. * Consolidate repeated code into private verifyBackup. * Change method names to better reflect what they do. * Update latestResult wherever possible. * Rewrite logic in findLatestVerfiedBackup to make it look more like verifyBackupsInRange. * Change signature of BackupNotificationMgr.notify to not depend on BackupVerificationResult. * Return all verified BackupMetadata instead of BackupVerificationResult when verifying en masse. It has enough information to skip the call to find the most recently verified backup. Also, fix some tests that broke in this process: remove the check for the snapshot time in TestBackupVerification that only makes sense when the Path is for a file that does not exist. Also, mock the appropriate functions in MockBackupVerification in TestBackupVerificationTask. * Rename findLatestVerifiedBackup responding to review comments. * Reveal hook to allow operators to restore just to the most recent snapshot (#1035) * Remove unused code. * Remove redundant comments and vertical whitespace. * Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style. * Use final where applicable and remove it where redundant. * Remove redundant BackupRestoreException from getIncrementals method signature. * Split getting incremental files and snapshot files into separate methods. * Reveal hook to allow operators to restore to the last valid snapshot. * Remove added non-shaded Guava dependency pursuant to review comments. * minor code modifications to simplify the nfpriam spring boot migration * Update CHANGELOG.md * Update CHANGELOG.md * make the constructor public * Update CHANGELOG.md * remove the instance info from the DI (#1042) * Update CHANGELOG.md * Always TTL backups. (#1038) * Fix Github CI by explicitly creating necessary directories. (#1045) * Change the interface of PriamScheduler (#1049) Change the interface of PriamScheduler * minor name change (#1051) * Update CHANGELOG.md * Increment cross regional duplicate tokens to replicate the policy we have been applying manually. (#1048) * Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together. * Improve error message on intra-regional duplicate token. * Update CHANGELOG in advance of 3.11.101 * Rollback #1042: Change the interface of EC2RoleAssumptionCredential (#1052) * Fix snapshot location regression in SNS messages. (#1054) * Update CHANGELOG in advance of 3.11.103 * change the CassandraMonitor to public (#1056) * Update CHANGELOG.md * Add new constructor (#1064) * Update CHANGELOG.md * Add disk_failure_policy config (#1065) * Update CHANGELOG.md * fix Gson serilization issue (#1067) * Update CHANGELOG.md * Make block_for_peers_timeout_in_secs a first-class tunable. (#1069) * Update CHANGELOG in advance of 3.11.108 * Fix TokenRetrieverTest --------- Co-authored-by: Ammar Khaku <akhaku@users.noreply.github.com> Co-authored-by: Cheng Wang <chengw@netflix.com> Co-authored-by: Cheng Wang <107727158+chengw-netflix@users.noreply.github.com>

mattl-netflix force-pushed the feature/block_duplicate_token branch from 4eec578 to c7b57f4 Compare April 20, 2023 22:09

Increment cross regional duplicate tokens to replicate the policy we …

82dce00

…have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

mattl-netflix force-pushed the feature/block_duplicate_token branch from c7b57f4 to 82dce00 Compare April 20, 2023 22:10

mattl-netflix requested review from akhaku and chengw-netflix April 20, 2023 22:55

chengw-netflix approved these changes Apr 21, 2023

View reviewed changes

Improve error message on intra-regional duplicate token.

ea014f5

akhaku reviewed Apr 24, 2023

View reviewed changes

mattl-netflix merged commit 457bd3a into 3.11 Apr 25, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

mattl-netflix commented Apr 20, 2023

chengw-netflix left a comment

chengw-netflix Apr 21, 2023

mattl-netflix Apr 24, 2023

akhaku Apr 24, 2023

mattl-netflix Apr 25, 2023

akhaku Apr 24, 2023

mattl-netflix Apr 25, 2023

Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

Increment cross regional duplicate tokens to replicate the policy we have been applying manually. #1048

Conversation

mattl-netflix commented Apr 20, 2023

chengw-netflix left a comment

Choose a reason for hiding this comment

chengw-netflix Apr 21, 2023

Choose a reason for hiding this comment

mattl-netflix Apr 24, 2023

Choose a reason for hiding this comment

akhaku Apr 24, 2023

Choose a reason for hiding this comment

mattl-netflix Apr 25, 2023

Choose a reason for hiding this comment

akhaku Apr 24, 2023

Choose a reason for hiding this comment

mattl-netflix Apr 25, 2023

Choose a reason for hiding this comment