Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network binding for ipv4/ipv6 #12942

Closed
wants to merge 9 commits into from
Closed

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Aug 17, 2015

Elasticsearch doesn't work well today with modern machines that might support both ipv4 and ipv6. The worst problem here that curl http://localhost... does not work e.g. on mac (#12906), because we aren't bound to any v6 address.

Equally bad, we only bind to loopback by default for 2.0, so if you want to connect to the network "for real" you might provide an interface name, but that always tends to do the wrong thing too (#12915), e.g. pick link local or some other useless address.

This is a compromise fix that tries to keep things simple:

  • we bind to multiple addresses when a specified host/interface name has multiple addresses. If you dont like this, then specify a single address.
  • we still only publish by default to one (this would require more work).
  • no changes for ipv6 multicast or anything like that yet.

The default for which address to publish when bound to multiple addresses is pretty simple, we prefer ipv4 by default (java.net.preferIPv4Stack) and I think we should until things like multicast are fixed to work correct over ipv6. Otherwise we prefer "real addresses" > site local > link local and so on.

Some things are still a bit messy, because real cleanups need to not be right before a beta release.

Closes #12906
Closes #12915

@clintongormley clintongormley added >bug :Distributed/Network Http and internode communication implementations labels Aug 17, 2015
@dakrone
Copy link
Member

dakrone commented Aug 17, 2015

So, an interesting byproduct of this is that if I have an ES node bind to the public interface with bin/elasticsearch -Des.network.host=_en0_ and then start up another with no options (so it binds to loopback interfaces only), the loopback one detects and joins a cluster with the public facing one.

I'm not sure if this is desired or not, I guess I would expect the loopback one only to be able to form clusters with other nodes bound to the loopback interface. What do you think?

@dakrone
Copy link
Member

dakrone commented Aug 17, 2015

Also, this is an OSX-only issue, Linux doesn't have this loopback <-> non-loopback communication weirdness.

@rmuir
Copy link
Contributor Author

rmuir commented Aug 17, 2015

That is not related to this change! That is #12914 and I deferred it because its Mac OS X specific, and because its not related to these changes.

@dakrone
Copy link
Member

dakrone commented Aug 17, 2015

That is not related to this change!

Okay, I totally agree about the deferring it. Just wanted to make sure it was something we are aware of (which of course there is the other issue open for it).

bindHost = settings.get(GLOBAL_NETWORK_BINDHOST_SETTING, settings.get(GLOBAL_NETWORK_HOST_SETTING));
}
if (bindHost == null) {
bindHost = defaultValue2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer the double-if? If not I guess you could do:

if (bindHost == null) {
  bindHost = settings.get(GLOBAL_NETWORK_BINDHOST_SETTING, settings.get(GLOBAL_NETWORK_HOST_SETTING, defaultValue2));
}
return resolveInetAddress(bindHost);

Instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changes coming in a commit. Basically it was too hard for me to see the various fallbacks for the default case (null).

@dakrone
Copy link
Member

dakrone commented Aug 17, 2015

Left a few comments but generally LGTM. I tested this locally on OSX and Linux. Would be great to have someone manually test on Windows as well just for a sanity check.

@@ -50,16 +50,16 @@ h3. Indexing
Let's try and index some twitter like information. First, let's create a twitter user, and add some tweets (the @twitter@ index will be created automatically):

<pre>
curl -XPUT 'http://127.0.0.1:9200/twitter/user/kimchy' -d '{ "name" : "Shay Banon" }'
curl -XPUT 'http://localhost:9200/twitter/user/kimchy' -d '{ "name" : "Shay Banon" }'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is this shay banon guy after all?


InetSocketAddress boundAddress = (InetSocketAddress) serverChannels.get(0).getLocalAddress();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first looked at this I wondered if this is prone to concurrent modification exception? I wondered if we should make the serverChannels final and use a CopyOnWriteArrayList instead. Reading without sync would be safe. But then looking at the entire file I think we should just make the list final and clear it before we close each channel? the volatile confuses me and I think it's not needd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the concurrency in these methods is super confusing. Its doing really slow stuff like binding to ports, which should not happen all the time. I am happy to fix the concurrency, it will be to make everything synchronized: this is all nuts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - we can do this in a follow up!

@s1monw
Copy link
Contributor

s1monw commented Aug 17, 2015

Left mostly minor comments! This looks great to me! I will run tests with node.mode=network quickly as well on that.

@@ -146,7 +146,7 @@
// node id to actual channel
protected final ConcurrentMap<DiscoveryNode, NodeChannels> connectedNodes = newConcurrentMap();
protected final Map<String, ServerBootstrap> serverBootstraps = newConcurrentMap();
protected final Map<String, Channel> serverChannels = newConcurrentMap();
protected final Map<String, List<Channel>> serverChannels = newConcurrentMap();
protected final Map<String, BoundTransportAddress> profileBoundAddresses = newConcurrentMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be a ConcurrentMap you use putIfAbsent

@s1monw s1monw added the blocker label Aug 17, 2015
@s1monw
Copy link
Contributor

s1monw commented Aug 17, 2015

I ran tests with -Des.node.mode=network and they pass +1 to push here is my LGTM

thanks rob!

@clintongormley
Copy link

Merged in b3a9abb and 68307aa

@rmuir rmuir mentioned this pull request Aug 21, 2015
@jpountz jpountz removed the v2.1.0 label Aug 21, 2015
jaymode added a commit to jaymode/elasticsearch that referenced this pull request Sep 23, 2015
In elastic#12942, the NettyTransport and NettyHttpServerTransport were updated to allow for binding
to multiple addresses. However, the BoundTransportAddress holder only exposed the first address
that the transport was bound to and this object is used to populate the values returned to the user
via our APIs.

This change exposes all of the bound addresses in the BoundTransportAddress holder, which allows
for an accurate representation of all interfaces that elasticsearch is bound to and listening on.
jaymode added a commit that referenced this pull request Sep 23, 2015
In #12942, the NettyTransport and NettyHttpServerTransport were updated to allow for binding
to multiple addresses. However, the BoundTransportAddress holder only exposed the first address
that the transport was bound to and this object is used to populate the values returned to the user
via our APIs.

This change exposes all of the bound addresses in the BoundTransportAddress holder, which allows
for an accurate representation of all interfaces that elasticsearch is bound to and listening on.
jaymode added a commit that referenced this pull request Sep 23, 2015
In #12942, the NettyTransport and NettyHttpServerTransport were updated to allow for binding
to multiple addresses. However, the BoundTransportAddress holder only exposed the first address
that the transport was bound to and this object is used to populate the values returned to the user
via our APIs.

This change exposes all of the bound addresses in the BoundTransportAddress holder, which allows
for an accurate representation of all interfaces that elasticsearch is bound to and listening on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Distributed/Network Http and internode communication implementations v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-Des.network.host=_eth0_ binds to ipv6 link-local address Ensure localhost works on IPv4 and IPv6
5 participants