Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to unicast discovery, with default host list of 127.0.0.1, [::1] #12999

Closed
wants to merge 29 commits into from

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Aug 19, 2015

Multicast discovery has serious issues such as #12993
It is never going to work correctly and securely with a "default to localhost" scenario. It also has numerous issues with operating system and JDK support.

Instead we should encourage unicast discovery by default, since thats what we encourage for production, and configure it to look at "localhost" by default which will give the same experience for startup/development.

This PR changes the default unicast host list (if not specified) to be "127.0.0.1, [::1]", enables unicast by default, disables multicast by default, fixes the parsing of unicast hosts to be strict, fixes it to allow IPV6 addresses unambiguously, and fixes it to work with hostnames that resolve to multiple addresses.

Things that should be deferred to followups (trust me, this is a huge change on its own, and i know a bunch of tests are angry, which i need help with, but the change works):

  • Move multicast zen discovery out to a plugin, possibly deprecate it.
  • Review all uses + ban InetSocketAddress(java.lang.String, int), which is lenient and bad to use. I did some cleanups here already.

If anyone wants to help, let me know I will give commit access to my branch here.

@@ -64,7 +64,7 @@

public static final String ACTION_NAME = "internal:discovery/zen/unicast";

public static final int LIMIT_PORTS_COUNT = 1;
public static final int LIMIT_PORTS_COUNT = 10;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a nocommit. We should only do 10 i think when its the localhost default. Just trying to start simple :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when its localhost we dont need any limiting, not sure, we should look into it. At least any limiting should be "per-address", since a host here could resolve to multiple addresses (e.g. ipv4 and v6, among other possibilities).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! If we merge it I think we can close #8833 ! It will also help in discovery plugins.

@s1monw
Copy link
Contributor

s1monw commented Aug 19, 2015

ok we are down to 1 awaits fix and all other tests pass - there is some cleanup work needed here for sure but this is a great start!!

rmuir and others added 10 commits August 19, 2015 20:59
We used static methods reading sys properties to define the node mode
per cluster. this had lots of problems when tests couldn't cope with
mixed or only local mode. Now we are passing it down to the cluster from the test
which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
consistent node configurations.
We only run multicast tests now when we explicitly state it. A working
multicast env is required which is not always the case.
@s1monw
Copy link
Contributor

s1monw commented Aug 20, 2015

I fixed all the remaining tests. Everything now runs on unicast and passes with network and local. We still have some forbiddenAPI:

[ERROR] Forbidden method invocation: java.net.InetSocketAddress#getHostName() [Use getHostString() instead, which avoids a DNS lookup]
[ERROR]   in org.elasticsearch.common.transport.InetSocketTransportAddress (InetSocketTransportAddress.java:96)
[ERROR] Forbidden method invocation: java.net.InetSocketAddress#getHostName() [Use getHostString() instead, which avoids a DNS lookup]
[ERROR]   in org.elasticsearch.common.transport.InetSocketTransportAddress (InetSocketTransportAddress.java:132)
[ERROR] Scanned 5737 (and 1524 related) class file(s) for forbidden API invocations (in 1.90s), 2 error(s).

but those are the last remaining issues. We can SuppressForbidden them if we wanna push the decision out to a different issue?

@@ -360,16 +360,16 @@ public int hashCode() {
public String toString() {
StringBuilder sb = new StringBuilder();
if (nodeName.length() > 0) {
sb.append('[').append(nodeName).append(']');
sb.append('{').append(nodeName).append('}');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change? did we decide to change our usage of [] to {}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is part of a bracketed ipv6 address specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this change though RFCs are a good thing to follow here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when an IPv6 address is combined with a port number, you really need them. Its needed to make things unambiguous, its needed to work inside a user's browser, its needed to conform with numerous RFCs on the display of this stuff. If you need more justification, see ones like https://tools.ietf.org/html/rfc5952

On the other hand, the brackets in ES logging provide NOTHING. Part of me wanted to remove them across all logging across the board.

@s1monw s1monw added blocker v2.0.0-beta1 :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure labels Aug 20, 2015
@@ -64,7 +64,9 @@

public static final String ACTION_NAME = "internal:discovery/zen/unicast";

public static final int LIMIT_PORTS_COUNT = 1;
public static final int LIMIT_FOREIGN_PORTS_COUNT = 1;
public static final int LIMIT_LOCAL_PORTS_COUNT = 25;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but 25 ports feels a bit high - we will try to connect to all these ports all the time, per local address. Maybe lower to 5? note that people would still be able to start more than 5 nodes (but if they kill the first 5, the cluster will not form again).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The most I have seen in dev/test was 4 nodes.
The most I have seen in PROD is 3 (2 data nodes + 1 master node on 128gb machines)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dadoonet this is only for default config without any configured unicast hosts. I don't think this is used in production. I can change it to 5 for now I don't have strong feelings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have strong feelings. I dont want this lowered unless its per address.

Wake up guys, computers have more than one address these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing this to per-address. Otherwise someone is just going to follow up with some other PR and break dual-stack machines.

@s1monw
Copy link
Contributor

s1monw commented Aug 20, 2015

I am +1 on pushing this - it looks good and has alll the fixes needed. We can do all otehr things in folllowups

@rmuir
Copy link
Contributor Author

rmuir commented Aug 20, 2015

Summarizing the current tradeoffs (versus multicast) so we are all aware:

  • out of box bin/elasticsearch will discover nodes on the local machine by default just like multicast did.
  • adding "localhost" as a default unicast discovery host has some advantages over multicast: for example it works with ipv6-only configurations.
  • however, unlike multicast, as soon as you start messing with configuration (changing port numbers, etc), you are going to need to deal with discovery. in most cases, multicast would continue to work all the way into production, regardless.

@jaymode
Copy link
Member

jaymode commented Aug 20, 2015

+1, this looks good to me

@rmuir rmuir closed this in e2ab625 Aug 20, 2015
rmuir added a commit that referenced this pull request Aug 20, 2015
Fix unicast discovery to work when a host has multiple addresses.
Ban dangerous methods in java.net with forbidden APIs.
Fix ipv6 bugs and formatting of network addresses everywhere.

Closes #12999
Closes #12993

Squashed commit of the following:

commit 6c1aa00
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 14:25:43 2015 -0400

    Fix these to be correct with addresses just in case

commit 6482156
Merge: d00561b 41d8fbe
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 13:23:09 2015 -0400

    Merge branch 'master' into unicast_all_the_way_down

commit d00561b
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:38:50 2015 +0200

    limit local ports to 5 in UnicastZenPing

commit e2e15c5
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 10:32:47 2015 -0400

    fix port limiting

commit 10153cb
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 10:18:37 2015 -0400

    don't serialize scopeids: that's broken

commit 2aa63d4
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:06:51 2015 +0200

    restore @Network

commit c840f1d
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 16:02:30 2015 +0200

     Use NetworkAddress.formatAddress where applicable in plugins

commit 374ce87
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 15:34:06 2015 +0200

    Use NetworkAddress.formatAddress where applicable

commit e7a606d
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 10:17:57 2015 +0200

    Add @multicast annotation to disable multicast tests by default.

    We only run multicast tests now when we explicitly state it. A working
    multicast env is required which is not always the case.

commit 2d7d2d0
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 09:51:28 2015 +0200

    Remove extra check for local mode in InternalTestCluster

commit dda59ac
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 09:37:03 2015 +0200

    Handle node mode across entire test cluster

    We used static methods reading sys properties to define the node mode
    per cluster. this had lots of problems when tests couldn't cope with
    mixed or only local mode. Now we are passing it down to the cluster from the test
    which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
    consistent node configurations.

commit 058197b
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 03:19:14 2015 -0400

    really ban InetSocketAddress's trappy method and break build and go to sleep, sorry

commit ac87791
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 03:16:52 2015 -0400

    Ban methods that might surprisingly cause DNS lookups

commit e64fe3d
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:59:05 2015 -0400

    Add unit test

commit f15434f
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:39:02 2015 -0400

    fix ipv6 formatting bugs

commit 05c2c74
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Aug 20 02:12:05 2015 -0400

    format addresses correctly so I can actually read what comes out of our logs and stats apis

commit 4f9389d
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 21:26:52 2015 -0400

    ban dangerous methods in java.net

commit 6aacd4d
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 20:59:24 2015 -0400

    ban lenient method

commit f466a84
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:29:00 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 0de007a
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:10:07 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 539f6ca
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Aug 20 00:02:01 2015 +0200

    fix tests to not mix local transport and zen unicast disco

commit 004c288
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:51:45 2015 -0400

    Fix multinode

commit 54113af
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:36:45 2015 -0400

    fix integration tests

commit 0156a77
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 23:32:18 2015 +0200

    enable multicast in MulticastZenPingIT.java

commit 1791caa
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 17:23:16 2015 -0400

    Fix constant

commit 22820b5
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:59:09 2015 +0200

    give it some extra ids for local transport crazyness

commit b2138fa
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:51:42 2015 +0200

    pass on local addresses from configured transport rather than hard code IP addresses

commit 1bf5de1
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Aug 19 22:04:31 2015 +0200

    fix PluggableTransportModuleIT.java to use local disco and detach port limit for node local disco

commit b6706ed
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Aug 19 14:16:03 2015 -0400

    Default to unicast discovery, with default host list of 127.0.0.1, [::1]
rjernst added a commit to rjernst/elasticsearch that referenced this pull request Aug 20, 2015
Multicast has known issues (see elastic#12999 and elastic#12993). This change moves
multicast into a plugin, and deprecates it in the docs.  It also allows
for plugging in multiple zen ping implementations.

closes elastic#13019
This was referenced Aug 21, 2015
@brwe brwe removed the v2.0.0 label Aug 21, 2015
@s1monw s1monw deleted the unicast_all_the_way_down branch August 21, 2015 19:25
karmi added a commit to elastic/elasticsearch-ruby that referenced this pull request Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >breaking :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants