Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster.Tools.Singleton: singleton moves as soon as node with higher AppVersion joins cluster? #7196

Open
Aaronontheweb opened this issue May 22, 2024 · 3 comments · May be fixed by #7197

Comments

@Aaronontheweb
Copy link
Member

Version Information
Version of Akka.NET? v1.5.0
Which Akka.NET Modules? Akka.Cluster.Tools

Describe the bug

Chasing down and issue for a production support customer - they have a custom pbm command for being able to track the location of cluster singletons. They confirmed the singleton was on a specific node and decided to replace that one last during a version upgrade. What they observed was: the singleton moved onto the newest node with the highest AppVersion even before that oldest node was downed!

Expected behavior

As I wrote back to the customer originally, the singleton should only move onto a new node AFTER the node it's currently on begins to leave the cluster. This leads me to believe that the following code might have a bug in how we compute the sort order for who the most suitable location is for a singleton:

/// <summary>
/// Creates a new instance of the <see cref="OldestChangedBuffer"/>.
/// </summary>
/// <param name="role">The role for which we're watching for membership changes.</param>
/// <param name="considerAppVersion">Should cluster AppVersion be considered when sorting member age</param>
public OldestChangedBuffer(string role, bool considerAppVersion)
{
_role = role;
_memberAgeComparer = considerAppVersion
? MemberAgeOrdering.DescendingWithAppVersion
: MemberAgeOrdering.Descending;
_membersByAge = ImmutableSortedSet<Member>.Empty.WithComparer(_memberAgeComparer);
SetupCoordinatedShutdown();
}

In fact, I'm almost certain that this is the case.

@Aaronontheweb
Copy link
Member Author

Marking this bug as critical - one of the major side effects from this issue is that we can create split brains with all cluster singletons during deployments when the AppVersion is getting bumped. That can result in problems such as #6973

@Aaronontheweb
Copy link
Member Author

So this bug likely affected less people than I initially thought as

Has been set to false this whole time and that's also the default value from the HOCON extractors when this configuration isn't available. That's good news, but it still needed to be fixed.

@Aaronontheweb
Copy link
Member Author

Looks like the original issue reported by the end user wasn't even caused by the AppVersion, but this feature is definitely a footgun and probably needs to be removed.

@Aaronontheweb Aaronontheweb modified the milestones: 1.5.21, 1.5.22 May 28, 2024
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue May 31, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.22, 1.5.23, 1.5.24, 1.5.25 Jun 3, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.25, 1.5.26 Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant