-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get connection id for node in OTP 21.0 #70
Comments
This seems like suspect number one to me:
Relevant areas of code: Erlang OTP/21 : Libcluster: libcluster/lib/strategy/strategy.ex Line 36 in 9236fb3
|
For those tuning in, looks like this is due to a regression of some kind in OTP 21 with the new auto-connect behavior. I need to do some investigating to see if I can get a repro case to open a bug, or figure out what, if anything, we need to do differently in libcluster. |
I'm not able to reproduce this locally, @beardedeagle do you have a minimal working example I can try to reproduce with? I was going to try and open a bug today, but I haven't been able to find a way to trigger the error yet. |
Let me get something thrown together |
Got busy at work, I'll have to make the example tonight. The app I am showing this error in uses Swarm and Mnesiam as well. Which both require libcluster to be started before them. They are a part of |
Yeah, now that I've had some sleep. I'm gonna go with ... this is not a problem in libcluster or swarm. That error log message comes directly from net_kernel and looking back over the code there it's pretty blatant, it just can't connect to the node, in this case itself. As to why, could be a lot of reasons, but I don't think they are going to be related libcluster and friends. Auto-connect: Explicit connect: Of course, it's always possible that libcluster or swarm is prematurely triggering an auto-connect. But I would rule out other things first... Try firing up an plain node first (no libcluster, no swarm). |
https://github.com/beardedeagle/test_app ^ I am still, personally, duplicating the issue locally. Curiously a separate issue popped up using the Dockerfile. libcluster just plain fails to connect:
|
This is mostly an undocumented breaking change in regards to the new auto_connect behavior: |
Hey guys, From the message above from @beardedeagle I can see that name is not properly configured, maybe that can be a problem? =====
I assume that some strategies can try to connect to itself. In the DNSPoll strategy, |
I dunno... but I'm curious... I'm wondering what's the actual use case for connecting with yourself via node_connect/1 is? |
Ah, good catch on the self-connect thing. @starbelly this is usually unintentional, but results from sharing a config with a list of nodes which all need to connect to each other; if you don't explicitly remove |
I've pushed a change which will prevent libcluster from ever connecting/disconnecting the current node, so hopefully that addresses the exception being raised in OTP 21. There is still the issue of Swarm needing an update to allow starting in your own supervision tree the same way libcluster does, but that's more an issue for Swarm than libcluster. @beardedeagle I'll hold off on closing until you've had a chance to test |
I'll pull it here shortly and test @bitwalker |
Output from original app experiencing issue:
Looks good to me. I'd say it's safe to close this issue @bitwalker and I'll get back to testing it some more. |
Me I Got the same error when i had a node with host name that is not routable that is, some thing like helper@192.168.45.169 where 192.168.45.169 is an ip that cannot be ping .. but when i changed the host to a correct ip i stopped having the error. |
but giving an arbitrary name to a node like helper@machinename everything works fine |
I guess I am getting the same error here.
This logs are from the 10.42.5.58 node. So libcluster is trying to connect to self(). updatethis was my fault, the |
hi there, i'm seeing a similar error in DataDog logs, everything works as expected, nodes are connected but it logs the following message, i also noted that it is trying to connect to itself because the log comes from the same node Elixir: 1.11.3 config :libcluster,
topologies: [
k8s: [
strategy: Cluster.Strategy.Kubernetes,
config: [
mode: :ip,
# elixir node name
kubernetes_node_basename: "cluster",
# k8s label
kubernetes_selector: "app=cluster",
polling_interval: 10_000,
# which API to query to get the running pods
kubernetes_ip_lookup_mode: :pods,
kubernetes_service_name: "cluster"
]
]
]
|
@sescobb27 can you tell me how did you resolve your issue? |
Creating issue for tracking.
Given OTP 21.0, Elixir 1.6.6 and libcluster 3.0.1:
Given OTP 20.3.8, Elixir 1.6.6 and libcluster 3.0.1:
The text was updated successfully, but these errors were encountered: