Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to determine master #2

Closed
mark-kubacki opened this issue Sep 14, 2014 · 8 comments
Closed

failed to determine master #2

mark-kubacki opened this issue Sep 14, 2014 · 8 comments

Comments

@mark-kubacki
Copy link

Sep 14 14:35:30 ct-1 systemd[1]: Starting Stampede : Server...
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/systemd/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/blkio/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/freezer/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/devices/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 systemd[1]: cattle-stampede-server.24e04e-a1f13b.service: Supervising process 5405 which is not our child. We'll most likely not notice when it exits.
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/memory/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: 2014/09/14 14:35:31 Moving pid 5405 to /sys/fs/cgroup/cpu,cpuacct/system.slice/cattle-stampede-server.24e04e-a1f13b.service/cgroup.procs
Sep 14 14:35:31 ct-1 bash[5322]: Master
Sep 14 14:35:31 ct-1 bash[5322]: Public IP 178.62.254.106
Sep 14 14:35:31 ct-1 bash[5322]: Private IP 10.133.255.234
Sep 14 14:35:31 ct-1 bash[5322]: Failed to determine master
Sep 14 14:37:30 ct-1 systemd[1]: cattle-stampede-server.24e04e-a1f13b.service start operation timed out. Terminating.
Sep 14 14:37:30 ct-1 systemd[1]: Failed to start Stampede : Server.
$ fleetctl list-units                                                                                                                                                                                                                                                                      Sun Sep 14 14:50:55 2014

UNIT                                            MACHINE                         ACTIVE          SUB
cattle-libvirt.a181a6-960088.service            960088b9.../178.62.248.14       active          running
cattle-libvirt.a181a6-a1f13b.service            a1f13ba0.../178.62.254.106      active          running
cattle-libvirt.a181a6-e73dee.service            e73deec2.../178.62.248.13       active          running
cattle-stampede-agent.a0a767-960088.service     960088b9.../178.62.248.14       active          running
cattle-stampede-agent.a0a767-a1f13b.service     a1f13ba0.../178.62.254.106      active          running
cattle-stampede-agent.a0a767-e73dee.service     e73deec2.../178.62.248.13       active          running
cattle-stampede-server.24e04e-960088.service    960088b9.../178.62.248.14       activating      start
cattle-stampede-server.24e04e-a1f13b.service    a1f13ba0.../178.62.254.106      activating      start
cattle-stampede-server.24e04e-e73dee.service    e73deec2.../178.62.248.13       activating      start
cattle-stampede.service                         960088b9.../178.62.248.14       active          running
@mark-kubacki
Copy link
Author

startup.sh only works if etcd is publicly available or available from containers.

@ibuildthecloud
Copy link
Contributor

@wmark what does your cloud config look like? It does assume etcd is listening on the IP on docker0.

@mark-kubacki
Copy link
Author

Thank you for looking into this. It basically boils down to having etcd locked down using iptables to prevent malicious containers from messing with the cluster.

To solve this issue you just need to expose etcd ports (from the host) to the container.

Excerpt from the iptables rules for your interest:

  units:
    - name: iptables.service
      enable: true
    - name: iptables-restore.service
      command: start
    - name: ip6tables.service
      enable: true
    - name: ip6tables-restore.service
      command: start

write_files:
  - path: /var/lib/iptables/rules-save
    permissions: 0600
    owner: root:root
    content: |
      *filter
      :INPUT ACCEPT [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      :cluster - [0:0]
      :local-ap - [0:0]

      # NTP and etcd are restricted to servers in the cluster
      -A INPUT -p tcp -m multiport --dports 123,4001,7001 -j cluster
      -A INPUT -p udp -m multiport --dports 123,4001,7001 -j cluster

      -A cluster -s $private_ipv4/16 -j ACCEPT
      -A cluster -j local-ap

      # add private bridges (br0, virbr0, … — NOT docker0…) here
      -A local-ap ! -i lo -p tcp -j REJECT --reject-with tcp-reset
      -A local-ap ! -i lo -p udp -j REJECT --reject-with icmp-port-unreachable

      COMMIT
      # end
  - path: /var/lib/ip6tables/rules-save
    permissions: 0600
    owner: root:root
    content: |
      *filter
      :INPUT ACCEPT [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      :cluster - [0:0]
      :local-ap - [0:0]

      # NTP and etcd are restricted to servers in the cluster
      -A INPUT -p tcp -m multiport --dports 123,4001,7001 -j cluster
      -A INPUT -p udp -m multiport --dports 123,4001,7001 -j cluster

      -A cluster -s 2a03:f00d:f00d::/48 -j ACCEPT
      -A cluster -j local-ap

      # add private bridges (br0, virbr0, … — NOT docker0…) here
      -A local-ap ! -i lo -p tcp -j REJECT --reject-with tcp-reset
      -A local-ap ! -i lo -p udp -j REJECT --reject-with icmp6-port-unreachable

      COMMIT
      # end

@ibuildthecloud
Copy link
Contributor

I do consider one of the major hurdles to make stampede production ready is to address the security around etcd and fleet. Currently some things can be done to restrict access, but I think it's either too cumbersome to the user or not sufficient. You do bring up a good point. It's really not good for stampede to assume that etcd is available to the containers as that has security implications. I think I will change it such that etcd is only accessed from outside the containers in the host OS/namespace. @wmark Would that approach work better for you?

@mark-kubacki
Copy link
Author

Darren, thanks a lot for taking the time to analyze the issue and your suggestion.

I believe that due to etcd-io/etcd#91 we can close the issue here.

@ibuildthecloud
Copy link
Contributor

@wmark I encourage you to check out rancher.io. Rancher.io is a continuation of work that started with stampede, but now we have a company and a large amount of resources dedicated to it.

@mark-kubacki
Copy link
Author

Thanks Darren, I will definitely take a look.

Coming with some years of experience with Linux and Gentoo, I am currently working on a fork of CoreOS which incorporates all the missing items (central logging for example; HW monitoring and reporting, authneticated and encrypted-by-default networking between nodes) and fixing what's not done right yet (software versions, in-place updates using overlays, security settings). It's to run my encrypted email service, but with the goal in mind making it a Gentoo/ChromeOS distribution in the end.

@ibuildthecloud
Copy link
Contributor

Sounds really interesting, I'd love to see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants