Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.2.0 fails to start on docker-swarm #410

Closed
rong0312 opened this issue Jul 1, 2019 · 33 comments
Closed

7.2.0 fails to start on docker-swarm #410

rong0312 opened this issue Jul 1, 2019 · 33 comments

Comments

@rong0312
Copy link

@rong0312 rong0312 commented Jul 1, 2019

Problem description

Yesterday I have started working on ELK version upgrade (6.5.4 to 7.2.0).
Sadly, I came across a lot of problems.
I am deploying this solution on docker-swarm, making the relevant changes
(like: discovery.zen.ping.unicast.hosts: tasks.elasticsearch , discovery.type: zen).

I have noticed that some mandatory configurations were added after version 7 like 'cluster.initial_master_nodes', making me a hard time to work on swarm mode.
I discovered that when I have received this message:

"master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered []... "

This results with: 'MasterNotDiscoveredException: null'.

Which then led me to Bootstrapping a cluster article, where this error is mentioned.

I can't find a smart way to make my ELK cluster working in swarm mode at 7.2.0 version.
Did anyone make that upgrade and managed to stay alive?

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 1, 2019

I don't think this is easily doable if you use the default hostname, which is a random ID.
You could maybe try to use predictable hostnames as described here. The available values are documented here.

Example:

services:
  elasticsearch:
    environment:
      node.name: "{{.Task.Name}}"
      discovery.type: zen
      discovery.seed_hosts: tasks.elasticsearch
      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3

This works for me ☝️

This repository uses fixed hostnames as a workaround.

@rong0312

This comment has been minimized.

Copy link
Author

@rong0312 rong0312 commented Jul 1, 2019

even with your snippet, i am getting the same error:

"message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch-0, elk_elasticsearch-1, elk_elasticsearch-2] to **bootstrap a cluster: have discovered []; discovery will continue using [10.0.39.6:9300, 10.0.39.7:9300] from hosts providers and** [{elk_elasticsearch.9x3j6damrcxfhm2om576vkp3f.xs60j3s5bo0bzq1hjroaib8nj}{13omqskbQv-cPpQluiSFhQ}{hMyXoE2zQS6T5hYn0lcewQ}{10.0.39.8}{10.0.39.8:9300}{ml.machine_memory=8371490816, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

seems like it won't find via 'cluster.initial_master_nodes' in my case and I don't know why.
Are you using swarm mode as well?

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 1, 2019

Sorry, the correct format is elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3. I updated my message.

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 1, 2019

Actually I'm also having issues with the names. It seems like Swarm adds another random id to the task name (e.g. "elk_elasticsearch.1.p8d7aufb80h8f8dwtfcmsyzy4" instead of "elk_elasticsearch.1"):

elk_elasticsearch.1.p8d7aufb80h8@docker-desktop    | {"type": "server", "timestamp": "2019-07-01T12:16:51,541+0000", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.1.p8d7aufb80h8f8dwtfcmsyzy4",  "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.2.ofkoacc3wg7a94ahh5nkcma0x}{UfShM0o_TUK28lBtE1Bt-g}{ytSjfbozRKqdSUXNuFdb-Q}{10.0.2.22}{10.0.2.22:9300}{ml.machine_memory=6247051264, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.0.2.22:9300] from hosts providers and [{elk_elasticsearch.1.p8d7aufb80h8f8dwtfcmsyzy4}{H2PJbPUDTX2dcUQ2PoZJLg}{e-RvOi-gRuOKT8GNpWppnw}{10.0.2.23}{10.0.2.23:9300}{ml.machine_memory=6247051264, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"  }

I have no idea how to deal with these :(

@rong0312

This comment has been minimized.

Copy link
Author

@rong0312 rong0312 commented Jul 1, 2019

Yea I was about to write that:(
First of all, thank you for your efforts.
Secondly, do you think this is going to be a problem from now on in this project? (swarm mode)

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 1, 2019

It's definitely going to become a problem for scaling, unless there is another way to assign predictable names to Swarm tasks.

There is a long discussion about the DNS name at docker/swarmkit#1242, but it doesn't seem to be getting a lot of attention.

@rong0312

This comment has been minimized.

Copy link
Author

@rong0312 rong0312 commented Jul 1, 2019

Ok so if I understand you correctly the alternative is to split my global mode 'elasticsearch' service to 3 separated elasticsearch services.
If you don't mind, I will keep this issue opened- in case of future fixes?

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 1, 2019

I think that's the only alternative for this use case yes, sadly. A quick Google search returns me the exact same recommendation (example).

We can keep this open for some time but there is currently nothing we can do to fix Swarm's behaviour on our side. A better alternative is probably to use Nomad or Kubernetes.

@saifat29

This comment has been minimized.

Copy link

@saifat29 saifat29 commented Jul 30, 2019

The following docker-compose.yml file worked perfectly for me in both deploy mode: replicated and deploy mode: global for elasticsearch:7.2.0 on Docker Swarm-

version: '3.2'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
    restart: always
    environment:
      - node.name={{.Node.Hostname}}
      - discovery.seed_hosts=elasticsearch
      - cluster.name=docker-cluster
      - cluster.initial_master_nodes=node1
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      mode: global
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - es_data:/usr/share/elasticsearch/data

volumes:
  es_data:

Reference -

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Jul 30, 2019

Setting node.name to the host name is not a bad idea but will only work if you can be 100% sure there is no more than 1 ES instance per Swarm node.

discovery.seed_hosts=elasticsearch, I had no success with this because as far as I understood the option requires a comma delimited list of ES node names. Did I get it wrong?

Regarding cluster.initial_master_nodes, how can you guarantee that your will have an instance created on "node1" if you have a larger cluster?

@saifat29

This comment has been minimized.

Copy link

@saifat29 saifat29 commented Jul 31, 2019

Setting node.name to the host name works for me because I run only a single ES instance per Swarm mode. It will obviously won't work with multiple ES instances on a single node.

ES documentation about discovery.seed_hosts-
When you want to form a cluster with nodes on other hosts, you must use the discovery.seed_hosts setting to provide a list of other nodes in the cluster that are master-eligible and likely to be live and contactable in order to seed the discovery process. This setting should normally contain the addresses of all the master-eligible nodes in the cluster. This setting contains either an array of hosts or a comma-delimited string.
Source - https://www.elastic.co/guide/en/elasticsearch/reference/current/discovery-settings.html

discovery.seed_hosts=elasticsearch works in Swarm mode because maybe Docker's internal DNS resolves the service name elasticsearch to all the other ES instances running in Swarm.
However you are right, the documentation specifically says that it should be a list of master-eligible nodes specified using either an array or a comma-delimited string.

About cluster.initial_master_nodes, there is no guarantee that a specific instance will be created on "node1". But this works for me as I have a very small cluster of only 2-3 nodes.
I start my cluster with just an instance on the master swarm node and then only after it is created I join other swarm nodes to the cluster and scale accordingly. This ensures that "node1" is created on the master swarm node.

This method is obviously not for large clusters and is fine for small use cases until some
elegant solution is found.

@rong0312

This comment has been minimized.

Copy link
Author

@rong0312 rong0312 commented Jul 31, 2019

@saifat29
discovery.seed_hosts=elasticsearch gets resolved for sure otherwise how can we explain this behavior ;)
I will wait for some elegant solution as you mentioned or swap to k8s if not possible.

@deviantony

This comment has been minimized.

Copy link
Owner

@deviantony deviantony commented Jul 31, 2019

As @antoineco stated above, I think it makes more sense to use tasks.elasticsearch instead of elasticsearch in the discovery.seed_hosts property. Even better, issuing a DNS lookup on tasks.elasticsearch and use that as the property value.

By using the service name elasticsearch it will use the virtual IP associated to the service, returning the IP associated to one of the service task (load balanced).

See the Container discovery section in https://docs.docker.com/network/overlay/

@saifat29

This comment has been minimized.

Copy link

@saifat29 saifat29 commented Aug 1, 2019

I just found a very weird result.
I used the following docker-compose.yml file that is based on using
{{.Task.Name}} as node.name
tasks.elasticsearch as discovery.seed_hosts and
{{.Task.Name}}.1, {{.Task.Name}}.2, ... as cluster.initial_master_nodes

version: '3.2'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
    restart: always
    environment:
      - node.name={{.Task.Name}}
      - discovery.seed_hosts=tasks.elasticsearch
      - cluster.name=docker-cluster
      - cluster.initial_master_nodes=es_stack_elasticsearch.1,es_stack_elasticsearch.2
      - network.host=0.0.0.0
      - bootstrap.memory_lock=false
      - "ES_JAVA_OPTS=-Xms3g -Xmx3g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      mode: replicated
      replicas: 1
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - es_data:/usr/share/elasticsearch/data

volumes:
  es_data:

Deployed the stack using the following commands-

docker stack deploy -c docker-compose-1.yml es_stack
docker service scale es_stack_elasticsearch=2

(Notice the name of the service es_stack to match the cluster.initial_master_nodes)

I ran a two node Swarm cluster on https://labs.play-with-docker.com
and it worked perfectly without any catch.
Here is the relevant output of http://x.x.x.x/:9200/_nodes url-

{
	"_nodes": {
		"total": 2,
		"successful": 2,
		"failed": 0
	},
	"cluster_name": "docker-cluster",
	"nodes": {
		"d2T9BoM7TLuj7I18ZgNBaw": {
			"name": "es_stack_elasticsearch.1.3ogomu8v0koz6otr1h0exsk2x",
			"transport_address": "10.0.3.3:9300",
			"host": "10.0.3.3",
			"ip": "10.0.3.3",
			"version": "7.2.0"
		},
		"4zsXxpPdSEaAc5gYYPBBjA": {
			"name": "es_stack_elasticsearch.2.lvqntcfs3saqw5f7n11ghcod7",
			"transport_address": "10.0.3.5:9300",
			"host": "10.0.3.5",
			"ip": "10.0.3.5",
			"version": "7.2.0"
		}
	}
}

I tried the exact same thing on a private VPS using the exact same docker version but surprisingly it failed with the same error as @antoineco (#410 (comment)).

I do not understand how exactly this is possible.
You can try for yourself on https://labs.play-with-docker.com

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Aug 1, 2019

And the cluster formed even if your nodes are named something like es_stack_elasticsearch.2.lvqntcfs3saqw5f7n11ghcod7 instead of the expected es_stack_elasticsearch.2? That's awesome and puzzling at the same time 🤔

@ranjithvaddepally

This comment has been minimized.

Copy link

@ranjithvaddepally ranjithvaddepally commented Sep 14, 2019

Has anyone solved the issue, facing same issue with version > 7.0

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Sep 14, 2019

@ranjithvaddepally it works if you create multiple elasticsearch services inside your stack file (es1, es2 and es3 for example), and set the discovery.seed_hosts and cluster.initial_master_nodes properties to use those names.

There hasn't been any change on the Swarm side, so it is currently the only option I know about.

@ranjithvaddepally

This comment has been minimized.

Copy link

@ranjithvaddepally ranjithvaddepally commented Sep 18, 2019

@antoineco thank you for the reply, we noticed this behaviour with version 7.0, scaling would be not easier, hence we decided to not to upgrade for now until we have a solution for this.

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Nov 7, 2019

A possible solution to this issue, which I haven't tested yet:

services:
  elasticsearch:
    environment:
      node.name: "elk_elasticsearch.{{.Task.Slot}}"
      discovery.type: zen
      discovery.seed_hosts: tasks.elasticsearch
      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3

{{.Task.Slot}} supposedly contains the indice part of {{.Task.Name}}. E.g. 2 when Task.Name == elk_elasticsearch.2.p8d7aufb80h.

I'll try to find time to validate this today or tomorrow.


edit: It works! 🎉
cc @rong0312 @saifat29 @ranjithvaddepally

elk_elasticsearch.1.qrjnhmnk0qu4@manny    | {
   "type":"server",
   "timestamp":"2019-11-07T21:08:11,271Z",
   "level":"INFO",
   "component":"o.e.c.s.MasterService",
   "cluster.name":"docker-cluster",
   "node.name":"elk_elasticsearch.1",
   "message":"elected-as-master ([2] nodes joined)[{elk_elasticsearch.1}{_PQh5XQuTW6CPgsItFXyow}{NbiZHwd4TFOmWBZ_K5HqCA}{10.0.0.19}{10.0.0.19:9300}{dilm}{ml.machine_memory=15347986432, xpack.installed=true, ml.max_open_jobs=20} elect leader, {elk_elasticsearch.2}{ngjSTrJ6RyaIvfjntl1VTg}{A91_kAAsRUCHQqmU6RJavQ}{10.0.0.17}{10.0.0.17:9300}{dilm}{ml.machine_memory=15347986432, ml.max_open_jobs=20, xpack.installed=true} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 2, version: 1, reason: master node changed {previous [], current [{elk_elasticsearch.1}{_PQh5XQuTW6CPgsItFXyow}{NbiZHwd4TFOmWBZ_K5HqCA}{10.0.0.19}{10.0.0.19:9300}{dilm}{ml.machine_memory=15347986432, xpack.installed=true, ml.max_open_jobs=20}]}, added {{elk_elasticsearch.2}{ngjSTrJ6RyaIvfjntl1VTg}{A91_kAAsRUCHQqmU6RJavQ}{10.0.0.17}{10.0.0.17:9300}{dilm}{ml.machine_memory=15347986432, ml.max_open_jobs=20, xpack.installed=true},}"
}

Screenshot_2019-11-07 Stack Monitoring - docker-cluster - Elasticsearch - Nodes

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Nov 7, 2019

Wiki updated. Closing.

Thanks everyone for the help and feedback!

@antoineco antoineco closed this Nov 7, 2019
@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

Wiki isn't set to discovery_type: zen, and i'm facing same issue with 7.4.2

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.1
    ports:
      - "9200:9200"
      - "9300:9300"
    configs:
      - source: elastic_config
        target: /usr/share/elasticsearch/config/elasticsearch.yml
    environment:
      # https://github.com/deviantony/docker-elk/wiki/Elasticsearch-cluster#swarm-mode
      node.name: elk_elasticsearch.{{.Task.Slot}}
      discovery.type: zen
      discovery.seed_hosts: tasks.elasticsearch
      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3
      node.max_local_storage_nodes: '9'
      ES_JAVA_OPTS: "-Xmx1G -Xms1G"
      ELASTIC_PASSWORD: changeme
    networks:
      - elk
    deploy:
      mode: replicated
      replicas: 3

Tried both with and without zen discovery_type
I get

elk_elasticsearch.3.5yxdtwrczs91@tspeda-swarm-worker1    | {"type": "server", "timestamp": "2019-12-05T13:31:46,769Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.3", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.3}{t86tzFSnRt-HRCNy8M23eA}{pBC5kfN1SYiCjwPB4m9oLA}{10.0.0.240}{10.0.0.240:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.21.6:9300, 10.0.21.7:9300, 10.0.21.8:9300] from hosts providers and [{elk_elasticsearch.3}{t86tzFSnRt-HRCNy8M23eA}{pBC5kfN1SYiCjwPB4m9oLA}{10.0.0.240}{10.0.0.240:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Dec 5, 2019

@eoli3n There is no zen discovery anymore in ES. It still technically work to set discovery.type: zen, but this option is not supported, that's why it's not in the Wiki.

The Elasticsearch config file contains discovery.type: single-node. If you omit the option in your environment, discovery.type falls back to the option from the config file. You either have to remove the option from the config file, or disable single-node discovery by using the value discovery.type: '' (as mentioned in the wiki).

 # disable single-node discovery
 discovery.type: ''
@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

That's what i mean by with and without zen discovery_type.
I get the same msg, with discovery_type: ''.

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Dec 5, 2019

You need to disable those explicit 9x00:9x00 port mappings if you run multiple instances on the same node.
https://github.com/deviantony/docker-elk/wiki/Elasticsearch-cluster#port-mapping

@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

That did not the trick

❯ docker service inspect --pretty elk_elasticsearch

ID:		m3qg9hjsgqjk5hm7fz5julxv5
Name:		elk_elasticsearch
Labels:
 com.docker.stack.image=docker.elastic.co/elasticsearch/elasticsearch:7.4.1
 com.docker.stack.namespace=elk
Service Mode:	Replicated
 Replicas:	3
Placement:
UpdateConfig:
 Parallelism:	1
 On failure:	pause
 Monitoring Period: 5s
 Max failure ratio: 0
 Update order:      stop-first
RollbackConfig:
 Parallelism:	1
 On failure:	pause
 Monitoring Period: 5s
 Max failure ratio: 0
 Rollback order:    stop-first
ContainerSpec:
 Image:		docker.elastic.co/elasticsearch/elasticsearch:7.4.1@sha256:88c2ee30115f378b8f7e66662ec26bca0c8778c69096bee6b161128ce833585f
 Env:		ELASTIC_PASSWORD=changeme ES_JAVA_OPTS=-Xmx1G -Xms1G cluster.initial_master_nodes=elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3 discovery.seed_hosts=tasks.elasticsearch discovery.type= node.max_local_storage_nodes=9 node.name=elk_elasticsearch.{{.Task.Slot}} 
Mounts:
 Target:	/usr/share/elasticsearch/data
  Source:	/data/elasticsearch
  ReadOnly:	false
  Type:		bind
Configs:
 Target:	/usr/share/elasticsearch/config/elasticsearch.yml
  Source:	elk_elastic_config
Resources:
Networks: elk_elk 
Endpoint Mode:	vip
Ports:
 PublishedPort = 30000
  Protocol = tcp
  TargetPort = 9200
  PublishMode = ingress
 PublishedPort = 30001
  Protocol = tcp
  TargetPort = 9300
  PublishMode = ingress 

I continue to have

elk_elasticsearch.1.rfc5pwl830sl@tspeda-swarm-worker2    | {"type": "server", "timestamp": "2019-12-05T13:54:09,897Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.1", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.1}{uReji52PTImpHPRM3JETFw}{zunMOe-QQQKyZBH9enYAXw}{10.0.0.41}{10.0.0.41:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.22.9:9300, 10.0.22.10:9300, 10.0.22.11:9300] from hosts providers and [{elk_elasticsearch.1}{uReji52PTImpHPRM3JETFw}{zunMOe-QQQKyZBH9enYAXw}{10.0.0.41}{10.0.0.41:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

And i think that this is not important in swarm mode, you can expose 9200:9200, replicas handle this by default

@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

See : https://docs.docker.com/engine/swarm/ingress/ : Publish a port for a service

@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Dec 5, 2019

Actually you're right, setting the discovery mode to an emtpy string is not enough, it is necessary to remove the option entirely from the config file.

java.lang.IllegalArgumentException: setting [cluster.initial_master_nodes] is not allowed when [discovery.type] is set to [single-node]",

This works for me.

diff --git a/elasticsearch/config/elasticsearch.yml b/elasticsearch/config/elasticsearch.yml
index 39bfd40..b25774e 100644
--- a/elasticsearch/config/elasticsearch.yml
+++ b/elasticsearch/config/elasticsearch.yml
@@ -8,7 +8,7 @@ network.host: 0.0.0.0
 ## Use single node discovery in order to disable production mode and avoid bootstrap checks
 ## see https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html
 #
-discovery.type: single-node
+#discovery.type: single-node

 ## X-Pack settings
 ## see https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html

diff --git a/docker-stack.yml b/docker-stack.yml
index 4883a27..bb15c95 100644
--- a/docker-stack.yml
+++ b/docker-stack.yml
@@ -5,19 +5,28 @@ services:
   elasticsearch:
     image: docker.elastic.co/elasticsearch/elasticsearch:7.4.1
     ports:
-      - "9200:9200"
-      - "9300:9300"
+      - "9200"
+      - "9300"
     configs:
       - source: elastic_config
         target: /usr/share/elasticsearch/config/elasticsearch.yml
     environment:
+      # set a predictable node name
+      node.name: elk_elasticsearch.{{.Task.Slot}}
+      # disable single-node discovery
+      discovery.type: ''
+      # use internal Docker round-robin DNS for unicast discovery
+      discovery.seed_hosts: tasks.elasticsearch
+      # define initial masters, assuming a cluster size of at least 3
+      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3
+      node.max_local_storage_nodes: '3'
       ES_JAVA_OPTS: "-Xmx256m -Xms256m"
       ELASTIC_PASSWORD: changeme
     networks:
       - elk
     deploy:
       mode: replicated
-      replicas: 1
+      replicas: 3

   logstash:
     image: docker.elastic.co/logstash/logstash:7.4.1
$ sudo sysctl vm.max_map_count=262144
vm.max_map_count = 262144
$ docker stack deploy -c docker-stack.yml elk
$ docker stack services elk
ID                  NAME                MODE                REPLICAS            IMAGE                                                 PORTS
06k72tup86v4        elk_elasticsearch   replicated          3/3                 docker.elastic.co/elasticsearch/elasticsearch:7.4.1   *:30000->9200/tcp, *:30001->9300/tcp
ahrniojf8tkh        elk_logstash        replicated          1/1                 docker.elastic.co/logstash/logstash:7.4.1             *:5000->5000/tcp, *:9600->9600/tcp
ea2s74opk0za        elk_kibana          replicated          1/1                 docker.elastic.co/kibana/kibana:7.4.1                 *:5601->5601/tcp
$ curl -D- 'http://127.0.0.1:30000/_cluster/health?pretty' -u elastic:changeme
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 469

{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 11,
  "active_shards" : 22,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
antoineco added a commit that referenced this issue Dec 5, 2019
ref. #410
antoineco added a commit that referenced this issue Dec 5, 2019
ref. #410
@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

~/infra-docker-swarm/docker-elk-test master* root@tspeda-swarm-manager1
❯ git log | head
commit 7591ea2f1996956259794abd1c86eed961535692
Author: Antoine Cotten <hello@acotten.com>
Date:   Thu Dec 5 15:26:53 2019 +0100

    Set discovery.type option in Compose file for easier override
    
    ref. #410

commit d7f5deb6ff308436715aa0c79f81938659e2a07e
Author: Antoine Cotten <hello@acotten.com>

~/infra-docker-swarm/docker-elk-test master* root@tspeda-swarm-manager1
❯ git --no-pager diff
diff --git a/docker-stack.yml b/docker-stack.yml
index 42f20b5..62c45a0 100644
--- a/docker-stack.yml
+++ b/docker-stack.yml
@@ -5,12 +5,21 @@ services:
   elasticsearch:
     image: docker.elastic.co/elasticsearch/elasticsearch:7.4.1
     ports:
-      - "9200:9200"
-      - "9300:9300"
+      - "9200"
+      - "9300"
     configs:
       - source: elastic_config
         target: /usr/share/elasticsearch/config/elasticsearch.yml
     environment:
+      # set a predictable node name
+      node.name: elk_elasticsearch.{{.Task.Slot}}
+      # disable single-node discovery
+      discovery.type: ''
+      # use internal Docker round-robin DNS for unicast discovery
+      discovery.seed_hosts: tasks.elasticsearch
+      # define initial masters, assuming a cluster size of at least 3
+      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3
+      node.max_local_storage_nodes: '3'
       ES_JAVA_OPTS: "-Xmx256m -Xms256m"
       ELASTIC_PASSWORD: changeme
       # Use single node discovery in order to disable production mode and avoid bootstrap checks
@@ -20,7 +29,7 @@ services:
       - elk
     deploy:
       mode: replicated
-      replicas: 1
+      replicas: 3
 
   logstash:
     image: docker.elastic.co/logstash/logstash:7.4.1
diff --git a/elasticsearch/config/elasticsearch.yml b/elasticsearch/config/elasticsearch.yml
index 86822dd..b06c1d2 100644
--- a/elasticsearch/config/elasticsearch.yml
+++ b/elasticsearch/config/elasticsearch.yml
@@ -8,6 +8,6 @@ network.host: 0.0.0.0
 ## X-Pack settings
 ## see https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html
 #
-xpack.license.self_generated.type: trial
+xpack.license.self_generated.type: basic
 xpack.security.enabled: true
 xpack.monitoring.collection.enabled: true

And i now get the msg (that i didn't have before)

elk_elasticsearch.3.9hvnpa0zpmh7@tspeda-swarm-worker1    | "Caused by: java.lang.IllegalArgumentException: setting [cluster.initial_master_nodes] is not allowed when [discovery.type] is set to [single-node]"
@antoineco

This comment has been minimized.

Copy link
Collaborator

@antoineco antoineco commented Dec 5, 2019

I think you need to update the ES config (push it to Swarm). It seems like it still contains the old single-node line.

@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

Yep, i forgot a line. Now, that's ok but still

~/infra-docker-swarm/docker-elk-test master* root@tspeda-swarm-manager1
❯ git --no-pager diff
diff --git a/docker-stack.yml b/docker-stack.yml
index 42f20b5..bb15c95 100644
--- a/docker-stack.yml
+++ b/docker-stack.yml
@@ -5,22 +5,28 @@ services:
   elasticsearch:
     image: docker.elastic.co/elasticsearch/elasticsearch:7.4.1
     ports:
-      - "9200:9200"
-      - "9300:9300"
+      - "9200"
+      - "9300"
     configs:
       - source: elastic_config
         target: /usr/share/elasticsearch/config/elasticsearch.yml
     environment:
+      # set a predictable node name
+      node.name: elk_elasticsearch.{{.Task.Slot}}
+      # disable single-node discovery
+      discovery.type: ''
+      # use internal Docker round-robin DNS for unicast discovery
+      discovery.seed_hosts: tasks.elasticsearch
+      # define initial masters, assuming a cluster size of at least 3
+      cluster.initial_master_nodes: elk_elasticsearch.1,elk_elasticsearch.2,elk_elasticsearch.3
+      node.max_local_storage_nodes: '3'
       ES_JAVA_OPTS: "-Xmx256m -Xms256m"
       ELASTIC_PASSWORD: changeme
-      # Use single node discovery in order to disable production mode and avoid bootstrap checks
-      # see https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html
-      discovery.type: single-node
     networks:
       - elk
     deploy:
       mode: replicated
-      replicas: 1
+      replicas: 3
 
   logstash:
     image: docker.elastic.co/logstash/logstash:7.4.1
diff --git a/elasticsearch/config/elasticsearch.yml b/elasticsearch/config/elasticsearch.yml
index 86822dd..b06c1d2 100644
--- a/elasticsearch/config/elasticsearch.yml
+++ b/elasticsearch/config/elasticsearch.yml
@@ -8,6 +8,6 @@ network.host: 0.0.0.0
 ## X-Pack settings
 ## see https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html
 #
-xpack.license.self_generated.type: trial
+xpack.license.self_generated.type: basic
 xpack.security.enabled: true
 xpack.monitoring.collection.enabled: true

~/infra-docker-swarm/docker-elk-test master* root@tspeda-swarm-manager1
❯ docker service logs elk_elasticsearch | grep 'master not'
elk_elasticsearch.2.ugwwfflv7zlg@tspeda-swarm-manager1    | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
elk_elasticsearch.1.7gozxvl9fuqf@tspeda-swarm-worker1    | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
elk_elasticsearch.2.ugwwfflv7zlg@tspeda-swarm-manager1    | {"type": "server", "timestamp": "2019-12-05T14:47:48,305Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.2", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.32.7:9300, 10.0.32.8:9300, 10.0.32.6:9300] from hosts providers and [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
elk_elasticsearch.2.ugwwfflv7zlg@tspeda-swarm-manager1    | {"type": "server", "timestamp": "2019-12-05T14:47:58,308Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.2", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.32.7:9300, 10.0.32.8:9300, 10.0.32.6:9300] from hosts providers and [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
elk_elasticsearch.2.ugwwfflv7zlg@tspeda-swarm-manager1    | {"type": "server", "timestamp": "2019-12-05T14:48:08,310Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "elk_elasticsearch.2", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk_elasticsearch.1, elk_elasticsearch.2, elk_elasticsearch.3] to bootstrap a cluster: have discovered [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.32.7:9300, 10.0.32.8:9300, 10.0.32.6:9300] from hosts providers and [{elk_elasticsearch.2}{dfVuWjJ_QpKiDTR3FRlSjQ}{_uoyRa-mSWKak-ajLEBxeA}{10.0.0.186}{10.0.0.186:9300}{dilm}{ml.machine_memory=4092379136, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 5, 2019

I just git cloned it, i did the exact same thing that you did + changing license
I removed some log lines, but i get that message from each of 3 containers

@ob-server83

This comment has been minimized.

Copy link

@ob-server83 ob-server83 commented Dec 9, 2019

Hi
just noticed one thing with docker swarm and elasticsearch. it worked for me on one server and did not work on another.
here is what I noticed on my system:
first the port oppening, what I did on centos7:

firewall-cmd --permanent --add-port=2376/tcp
firewall-cmd --permanent --add-port=2377/tcp
firewall-cmd --permanent --add-port=7946/tcp
firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --permanent --add-port=7946/udp
firewall-cmd --permanent --add-port=4789/udp

then :

docker swarm init

and do the join on other hosts, just an example you will get your token and IP auto generated :

docker swarm join --token SWMTKN-1-someAutoGenTokenHere IP:2377

then on a docker swarm Leader:

docker network create --driver overlay --subnet 10.0.9.0/24 mynet

and run this simple .yml file:
docker-stack.txt

docker stack deploy -c docker-stack.yml elk

so the diff on these hosts was docker versions:
what I did was to uninstall docker version and install old version:

do a yum remove for docker
and installed old version:

yum install docker-ce-19.03.2-3.el7 docker-ce-cli-19.03.2-3.el7 containerd.io-1.2.6-3.3.el7
systemctl start docker

works better for me with this version:

docker --version
Docker version 19.03.2, build 6a30dfc

and this is what i had before(that did not work for me) :

docker --version
Docker version 19.03.5, build 633a0ea

Also noticed that my other host had Docker version 19.03.5 so it seems that for me was needed only on a Leader host:

docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
xcz66f64dwo62qq34cie83ah2 * kis-elk01 Ready Active Leader 19.03.2
m76h9dkrm7xcxpf22llafkkoh kis-elk02 Ready Active 19.03.5
4n9hwuitsrv93ulu6uf5e2hpw kis-elk03 Ready Active 19.03.5

@eoli3n

This comment has been minimized.

Copy link

@eoli3n eoli3n commented Dec 9, 2019

Read that
#455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.