Clustering fails after update to v15 - `No active session exists`/`Invalid token or session expired.`

Hello, and thanks for the release.

After updating to v15, my two-node Podman cluster fails to be reachable from one another. I've tried many steps including a reinstallation, and found the following reproduction step to show the issue in details:

1. Set up two nodes from scratch with the official Docker image. For each node, I set up proper Let's Encrypt certs + domains, and HTTPS on port :53443
2. Initialize the cluster on the primary, and join from the secondary with certificate validations ticked. The initial join works fine and syncs all data as intended. For simplicity, I only configured one IPv4 address per node.
3. However, after a while, remote nodes become marked as Unreachable from one another, and the cluster fail to function. Triggering a Resync would also show this error more quickly. Various symptoms include:

    - The primary node shows that the secondary is "Unreachable" right after first sync. The secondary node will show the primary being "Unreachable" after a while.

    - For the Cluster Catalog Zone and its member zones, manually resyncing them works fine. They work under a different, non-HTTPS mechanism so I suppose that part is still intact. There seems to be a problem with IPv6 but I'll look into it later.

    - Most importantly, it is impossible to switch to another node using the context menu. When switching to the primary from the secondary, `No active session exists. Please login and try again.` is shown as an error. When switching to the secondary from the primary, I am returned to the login screen and got stuck there after multiple logins, probably due to a failing loop. So switching context will brick the entire browser session from the primary node's domain.
     
    - In both nodes, I receive spurious amounts of logs of the following nature:

        ```
        [timestamp UTC] Heartbeat failed for Secondary node 'secondary.example.com (192.168.53.2)'. DnsServerCore.HttpApi.InvalidTokenHttpApiClientException: Invalid token or session expired.
           at DnsServerCore.HttpApi.HttpApiClient.CheckResponseStatus(JsonElement rootElement) in Z:\Technitium\Projects\DnsServer\DnsServerCore.HttpApi\HttpApiClient.cs:line 147
           at DnsServerCore.HttpApi.HttpApiClient.GetClusterStateAsync(Boolean includeServerIpAddresses, Boolean includeNodeCertificates, CancellationToken cancellationToken) in Z:\Technitium\Projects\DnsServer\DnsServerCore.HttpApi\HttpApiClient.cs:line 394
           at DnsServerCore.Cluster.ClusterNode.GetClusterStateAsync(CancellationToken cancellationToken) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Cluster\ClusterNode.cs:line 517
           at DnsServerCore.Cluster.ClusterNode.HeartbeatTimerCallbackAsync(Object state) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Cluster\ClusterNode.cs:line 224
        ```

       They fail at the exact same lines, the only difference being the node addresses/IPs.

    - Lastly, gracefully leaving/removing a node from the cluster is untenable, as it will show an `Invalid token or session expired.` error. The only way is to force leave/force remove the node itself.

This issue seems to have been reported via Reddit [here too](https://www.reddit.com/r/technitium/comments/1sv94f1/comment/oi833v8/). The problem seems specific to the HTTPS API and some kind of tokens being utilised. For now, I'll keep running the Cluster in "desynced mode".

Are there any extra things to be configured for Clustering after the v15 upgrade? If so, please let me know, as I couldn't find anything of note in the changelog and the blogpost. Also let me know if you can reproduce this outside of Docker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clustering fails after update to v15 - `No active session exists`/`Invalid token or session expired.` #1848

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Clustering fails after update to v15 - No active session exists/Invalid token or session expired. #1848

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Clustering fails after update to v15 - `No active session exists`/`Invalid token or session expired.` #1848