Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Akka (remoting) concurrent connections #23959

Closed
ecelgp opened this issue Nov 10, 2017 · 6 comments
Closed

Issue with Akka (remoting) concurrent connections #23959

ecelgp opened this issue Nov 10, 2017 · 6 comments
Labels
0 - new Ticket is unclear on it's purpose or if it is valid or not t:cluster

Comments

@ecelgp
Copy link

ecelgp commented Nov 10, 2017

Hello experts,

We see sporadic issue of member not joining the cluster after it restarts in OpenDaylight project (opendaylight.org), fyi we consume Akka 2.4.18 for our clustering solution:

After some debugging on multiple runs, it looks like the problem reproduces when restarting member and cluster leader establishes Akka TCP connection almost at the same time (1 connection is normally enough to setup the Akka channel). Please note the restarting member initiates the connection as part of the boot process while the cluster leader tries to connect the lost member every 5 secs, so this explains the sporadic behavior.

My question is has anyone reported/experienced the same issue with Akka concurrent connections? Do you have any advice to mitigate this?

Here is an example of the akka config we are using in opendaylight:

odl-cluster-data {
  akka {
    remote {
      artery {
        enabled = off
        canonical.hostname = "10.29.15.187"
        canonical.port = 2550
      }
      netty.tcp {
        hostname = "10.29.15.187"
        port = 2550
      }
    }

    cluster {
      seed-nodes = ["akka.tcp://opendaylight-cluster-data@10.29.15.187:2550",
				"akka.tcp://opendaylight-cluster-data@10.29.12.66:2550",
				"akka.tcp://opendaylight-cluster-data@10.29.15.35:2550"]

      roles = ["member-1"]

    }

    persistence {
      journal {
        leveldb {
        }
      }
    }
  }
}

More logs and discussion here:
https://jira.opendaylight.org/projects/CONTROLLER/issues/CONTROLLER-1751

Thanks in advance
Luis

@ktoso ktoso changed the title Issue with AKKA concurrent connections Issue with Akka concurrent connections Nov 10, 2017
@ktoso ktoso changed the title Issue with Akka concurrent connections Issue with Akka (remoting) concurrent connections Nov 13, 2017
@ktoso ktoso added the 0 - new Ticket is unclear on it's purpose or if it is valid or not label Nov 13, 2017
@ktoso
Copy link
Member

ktoso commented Dec 13, 2017

We're not quite sure what you mean here, could you provide a minimal reproducer?
What is "concurrent" about it?

@ktoso
Copy link
Member

ktoso commented Dec 13, 2017

Could you also upgrade to 2.5 since 2.4.x is heading towards it's end of life (this month actually)

@ecelgp
Copy link
Author

ecelgp commented Jan 4, 2018

Concurrent means that 2 AKKA connections between same 2 nodes are active at the same time, for example:

tcp6 0 431 10.30.170.136:58204 10.30.170.104:2550 ESTABLISHED 12106/java
tcp6 0 0 10.30.170.136:42848 10.30.170.146:2550 ESTABLISHED 12106/java
tcp6 0 0 10.30.170.136:2550 10.30.170.146:40140 ESTABLISHED 12106/java

Member 1 (.136) above has simple unidirectional connection with member 2 (.104) but double bidirectional connection with member 3 (.146) after member 3 restarts.

This double or concurrent connection is always present when we see a restarting member failing to join the cluster so we wonder whether there is a relation between the double connection and the failure.

@ecelgp
Copy link
Author

ecelgp commented Jan 4, 2018

BTW we plan to upgrade to 2.5 in the next OpenDaylight release.

@ecelgp
Copy link
Author

ecelgp commented Jun 8, 2018

This issue has been fixed when we updated to akka_2.12-2.5.11, scala 2.12.5.

@ecelgp ecelgp closed this as completed Jun 8, 2018
@raboof
Copy link
Member

raboof commented Jun 8, 2018

That's great to hear! Thanks for looping this back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - new Ticket is unclear on it's purpose or if it is valid or not t:cluster
Projects
None yet
Development

No branches or pull requests

3 participants