failed to determine master #2

mark-kubacki · 2014-09-14T14:51:35Z

Sep 14 14:35:30 ct-1 systemd[1]: Starting Stampede : Server...
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/systemd/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/blkio/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/freezer/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/devices/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 systemd[1]: cattle-stampede-server.24e04e-a1f13b.service: Supervising process 5405 which is not our child. We'll most likely not notice when it exits.
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/memory/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/cpu,cpuacct/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: Master
Sep 14 14:35:31 ct-1 bash[5322]: Public IP 178.62.254.106
Sep 14 14:35:31 ct-1 bash[5322]: Private IP 10.133.255.234
Sep 14 14:35:31 ct-1 bash[5322]: Failed to determine master
Sep 14 14:37:30 ct-1 systemd[1]: cattle-stampede-server.24e04e-a1f13b.service start operation timed out. Terminating.
Sep 14 14:37:30 ct-1 systemd[1]: Failed to start Stampede : Server.

$ fleetctl list-units                                                                                                                                                                                                                                                                      Sun Sep 14 14:50:55 2014

UNIT                                            MACHINE                         ACTIVE          SUB
cattle-libvirt.a181a6-960088.service            960088b9.../178.62.248.14       active          running
cattle-libvirt.a181a6-a1f13b.service            a1f13ba0.../178.62.254.106      active          running
cattle-libvirt.a181a6-e73dee.service            e73deec2.../178.62.248.13       active          running
cattle-stampede-agent.a0a767-960088.service     960088b9.../178.62.248.14       active          running
cattle-stampede-agent.a0a767-a1f13b.service     a1f13ba0.../178.62.254.106      active          running
cattle-stampede-agent.a0a767-e73dee.service     e73deec2.../178.62.248.13       active          running
cattle-stampede-server.24e04e-960088.service    960088b9.../178.62.248.14       activating      start
cattle-stampede-server.24e04e-a1f13b.service    a1f13ba0.../178.62.254.106      activating      start
cattle-stampede-server.24e04e-e73dee.service    e73deec2.../178.62.248.13       activating      start
cattle-stampede.service                         960088b9.../178.62.248.14       active          running

The text was updated successfully, but these errors were encountered:

mark-kubacki · 2014-09-14T16:34:12Z

startup.sh only works if etcd is publicly available or available from containers.

ibuildthecloud · 2014-09-14T17:29:23Z

@wmark what does your cloud config look like? It does assume etcd is listening on the IP on docker0.

mark-kubacki · 2014-09-14T20:23:14Z

Thank you for looking into this. It basically boils down to having etcd locked down using iptables to prevent malicious containers from messing with the cluster.

To solve this issue you just need to expose etcd ports (from the host) to the container.

Excerpt from the iptables rules for your interest:

  units:
    - name: iptables.service
      enable: true
    - name: iptables-restore.service
      command: start
    - name: ip6tables.service
      enable: true
    - name: ip6tables-restore.service
      command: start

write_files:
  - path: /var/lib/iptables/rules-save
    permissions: 0600
    owner: root:root
    content: |
      *filter
      :INPUT ACCEPT [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      :cluster - [0:0]
      :local-ap - [0:0]

      # NTP and etcd are restricted to servers in the cluster
      -A INPUT -p tcp -m multiport --dports 123,4001,7001 -j cluster
      -A INPUT -p udp -m multiport --dports 123,4001,7001 -j cluster

      -A cluster -s $private_ipv4/16 -j ACCEPT
      -A cluster -j local-ap

      # add private bridges (br0, virbr0, … — NOT docker0…) here
      -A local-ap ! -i lo -p tcp -j REJECT --reject-with tcp-reset
      -A local-ap ! -i lo -p udp -j REJECT --reject-with icmp-port-unreachable

      COMMIT
      # end
  - path: /var/lib/ip6tables/rules-save
    permissions: 0600
    owner: root:root
    content: |
      *filter
      :INPUT ACCEPT [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      :cluster - [0:0]
      :local-ap - [0:0]

      # NTP and etcd are restricted to servers in the cluster
      -A INPUT -p tcp -m multiport --dports 123,4001,7001 -j cluster
      -A INPUT -p udp -m multiport --dports 123,4001,7001 -j cluster

      -A cluster -s 2a03:f00d:f00d::/48 -j ACCEPT
      -A cluster -j local-ap

      # add private bridges (br0, virbr0, … — NOT docker0…) here
      -A local-ap ! -i lo -p tcp -j REJECT --reject-with tcp-reset
      -A local-ap ! -i lo -p udp -j REJECT --reject-with icmp6-port-unreachable

      COMMIT
      # end

ibuildthecloud · 2014-09-15T17:38:43Z

I do consider one of the major hurdles to make stampede production ready is to address the security around etcd and fleet. Currently some things can be done to restrict access, but I think it's either too cumbersome to the user or not sufficient. You do bring up a good point. It's really not good for stampede to assume that etcd is available to the containers as that has security implications. I think I will change it such that etcd is only accessed from outside the containers in the host OS/namespace. @wmark Would that approach work better for you?

mark-kubacki · 2014-11-19T10:55:24Z

Darren, thanks a lot for taking the time to analyze the issue and your suggestion.

I believe that due to etcd-io/etcd#91 we can close the issue here.

ibuildthecloud · 2014-11-19T14:16:32Z

@wmark I encourage you to check out rancher.io. Rancher.io is a continuation of work that started with stampede, but now we have a company and a large amount of resources dedicated to it.

mark-kubacki · 2014-11-19T14:25:55Z

Thanks Darren, I will definitely take a look.

Coming with some years of experience with Linux and Gentoo, I am currently working on a fork of CoreOS which incorporates all the missing items (central logging for example; HW monitoring and reporting, authneticated and encrypted-by-default networking between nodes) and fixing what's not done right yet (software versions, in-place updates using overlays, security settings). It's to run my encrypted email service, but with the goal in mind making it a Gentoo/ChromeOS distribution in the end.

ibuildthecloud · 2014-11-19T14:34:15Z

Sounds really interesting, I'd love to see it.

mark-kubacki closed this as completed Nov 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to determine master #2

failed to determine master #2

mark-kubacki commented Sep 14, 2014

mark-kubacki commented Sep 14, 2014

ibuildthecloud commented Sep 14, 2014

mark-kubacki commented Sep 14, 2014

ibuildthecloud commented Sep 15, 2014

mark-kubacki commented Nov 19, 2014

ibuildthecloud commented Nov 19, 2014

mark-kubacki commented Nov 19, 2014

ibuildthecloud commented Nov 19, 2014

failed to determine master #2

failed to determine master #2

Comments

mark-kubacki commented Sep 14, 2014

mark-kubacki commented Sep 14, 2014

ibuildthecloud commented Sep 14, 2014

mark-kubacki commented Sep 14, 2014

ibuildthecloud commented Sep 15, 2014

mark-kubacki commented Nov 19, 2014

ibuildthecloud commented Nov 19, 2014

mark-kubacki commented Nov 19, 2014

ibuildthecloud commented Nov 19, 2014