Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null service address on initial discovery #17

Closed
zambien opened this issue Mar 24, 2017 · 3 comments
Closed

null service address on initial discovery #17

zambien opened this issue Mar 24, 2017 · 3 comments
Labels

Comments

@zambien
Copy link

zambien commented Mar 24, 2017

Hello,

At discovery time the healthy nodes are found however the host (service.getServiceAddress()) is missing. This seems to be causing the registration to fail.

I am running the master branch of this repo and Consul v0.7.5.

Problem area seems to be here. Note that port 5701 shows up but there is no host which results in a null a little later:

Mar 24, 2017 4:25:55 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: :5701
Mar 24, 2017 4:25:55 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: :5701

...

WARNING: [10.250.130.160]:5701 [dev] [3.6.7] Wrong bind request from Address[10.250.130.160]:5701! This node is not requested endpoint: Address[null]:5701
Mar 24, 2017 4:25:55 AM com.hazelcast.nio.tcp.TcpIpConnection

Full log output

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Mar 24, 2017 4:30:08 AM com.hazelcast.config.XmlConfigLocator
INFO: Loading configuration /opt/idam/orchis-sessionstore/bin/hazelcast.xml from System property 'hazelcast.config'
Mar 24, 2017 4:30:08 AM com.hazelcast.config.XmlConfigLocator
INFO: Using configuration file at /opt/idam/orchis-sessionstore/bin/hazelcast.xml
Mar 24, 2017 4:30:09 AM com.hazelcast.instance.DefaultAddressPicker
INFO: [LOCAL] [dev] [3.6.7] Prefer IPv4 stack is true.
Mar 24, 2017 4:30:10 AM com.hazelcast.instance.DefaultAddressPicker
INFO: [LOCAL] [dev] [3.6.7] Picked Address[10.250.130.160]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Mar 24, 2017 4:30:10 AM com.hazelcast.system
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Hazelcast 3.6.7 (20161208 - 3eec5a1) starting at Address[10.250.130.160]:5701
Mar 24, 2017 4:30:10 AM com.hazelcast.system
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Copyright (c) 2008-2016, Hazelcast, Inc. All Rights Reserved.
Mar 24, 2017 4:30:10 AM com.hazelcast.system
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Configured Hazelcast Serialization version : 1
Mar 24, 2017 4:30:10 AM com.hazelcast.spi.OperationService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Backpressure is disabled
Mar 24, 2017 4:30:10 AM com.hazelcast.spi.impl.operationexecutor.classic.ClassicOperationExecutor
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Starting with 2 generic operation threads and 2 partition operation threads.
Mar 24, 2017 4:30:11 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Using ConsulRegistrator: org.bitsofinfo.hazelcast.discovery.consul.DoNothingRegistrator
Mar 24, 2017 4:30:13 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Registered our service instance w/ Consul OK.. delaying Hazelcast discovery, sleeping: 10000ms
Mar 24, 2017 4:30:23 AM com.hazelcast.instance.Node
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Activating Discovery SPI Joiner
Mar 24, 2017 4:30:23 AM com.hazelcast.core.LifecycleService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Address[10.250.130.160]:5701 is STARTING
Mar 24, 2017 4:30:23 AM com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThreadingModel
INFO: [10.250.130.160]:5701 [dev] [3.6.7] TcpIpConnectionManager configured with Non Blocking IO-threading model: 3 input threads and 3 output threads
Mar 24, 2017 4:30:24 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: :5701
Mar 24, 2017 4:30:24 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: :5701
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.InitConnectionTask
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Connecting to localhost/127.0.0.1:5701, timeout: 0, bind-any: true
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.SocketAcceptorThread
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Accepting socket connection from /127.0.0.1:51051
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Established socket connection between /127.0.0.1:51051 and localhost/127.0.0.1:5701
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Established socket connection between /127.0.0.1:5701 and /127.0.0.1:51051
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
WARNING: [10.250.130.160]:5701 [dev] [3.6.7] Wrong bind request from Address[10.250.130.160]:5701! This node is not requested endpoint: Address[null]:5701
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.TcpIpConnection
NFO: [10.250.130.160]:5701 [dev] [3.6.7] Connection [/127.0.0.1:51051] lost. Reason: Socket explicitly closed
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter
WARNING: [10.250.130.160]:5701 [dev] [3.6.7] hz._hzInstance_1_dev.IO.thread-out-1 Closing socket to endpoint Address[]:5701, Cause:java.nio.channels.CancelledKeyException
java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
	at com.hazelcast.nio.tcp.nonblocking.AbstractHandler.unregisterOp(AbstractHandler.java:74)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.unschedule(NonBlockingSocketWriter.java:295)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.handle(NonBlockingSocketWriter.java:344)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketWriter.run(NonBlockingSocketWriter.java:423)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.executeTask(NonBlockingIOThread.java:293)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.processTaskQueue(NonBlockingIOThread.java:286)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.selectLoop(NonBlockingIOThread.java:230)
	at com.hazelcast.nio.tcp.nonblocking.NonBlockingIOThread.run(NonBlockingIOThread.java:187)

Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.TcpIpConnection
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Connection [Address[]:5701] lost. Reason: java.io.EOFException[Remote socket closed!]
Mar 24, 2017 4:30:24 AM com.hazelcast.nio.tcp.nonblocking.NonBlockingSocketReader
WARNING: [10.250.130.160]:5701 [dev] [3.6.7] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint Address[]:5701, Cause:java.io.EOFException: Remote socket closed!

Any guess as to what is going on or what I should do to debug further?

Thanks!

@bitsofinfo
Copy link
Owner

Check these threads about the config options for public address etc and interfaces

hazelcast/hazelcast#9219
hazelcast/hazelcast#10132
hazelcast/hazelcast#4537

@zambien
Copy link
Author

zambien commented Mar 24, 2017

Thanks, I read through those and some other threads. They don't seem to explain why the server would be coming back as null though. It seems that the consul discovery spi code is finding the nodes:

INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: :5701

After looking more closely at my consul output I can see that it ServiceAddress is blank.Here is the output of my curl command:

curl http://localhost:8500/v1/catalog/service/orchis-hazelcast-replication

[{
	"Node": "ip-10-250-130-160",
	"Address": "10.250.130.160",
	"TaggedAddresses": {
		"lan": "10.250.130.160",
		"wan": "10.250.130.160"
	},
	"NodeMeta": {},
	"ServiceID": "orchis-hazelcast-replication",
	"ServiceName": "orchis-hazelcast-replication",
	"ServiceTags": ["orchis-hazelcast-replication", "test1"],
	"ServiceAddress": "",
	"ServicePort": 5701,
	"ServiceEnableTagOverride": false,
	"CreateIndex": 25278,
	"ModifyIndex": 25278
}, {
	"Node": "ip-10-250-130-6",
	"Address": "10.250.130.6",
	"TaggedAddresses": {
		"lan": "10.250.130.6",
		"wan": "10.250.130.6"
	},
	"NodeMeta": {},
	"ServiceID": "orchis-hazelcast-replication",
	"ServiceName": "orchis-hazelcast-replication",
	"ServiceTags": ["orchis-hazelcast-replication", "test1"],
	"ServiceAddress": "",
	"ServicePort": 5701,
	"ServiceEnableTagOverride": false,
	"CreateIndex": 25275,
	"ModifyIndex": 25275
}]

According to the consul doc, if ServiceAddress is empty node address should be used:

https://www.consul.io/docs/agent/http/catalog.html#ServiceAddress

I can see where the ServiceAddress is being used so I think that an empty string check is needed here to react to Consul's behavior here:

for (CatalogService service : response.getResponse()) {
	toReturn.add(new SimpleDiscoveryNode(
		new Address(service.getServiceAddress(), service.getServicePort())));
	getLogger().info("Discovered healthy node: " + service.getServiceAddress()+":"+service.getServicePort());
}

If you agree I can work on a PR.

@zambien
Copy link
Author

zambien commented Mar 24, 2017

Please see the PR above for the fix...

INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: 10.250.130.160:5701
Mar 24, 2017 5:42:07 PM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.250.130.160]:5701 [dev] [3.6.7] Discovered healthy node: 10.250.130.6:5701
Mar 24, 2017 5:42:12 PM com.hazelcast.cluster.ClusterService

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants