New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd-member unit becomes ready before accepting requests #2286

Closed
discordianfish opened this Issue Dec 13, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@discordianfish

discordianfish commented Dec 13, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1520.8.0
VERSION_ID=1520.8.0
BUILD_ID=2017-10-26-0342
PRETTY_NAME="Container Linux by CoreOS 1520.8.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

AWS

Expected Behavior

When services with After=etcd-member.service get started, the etcd api should be ready to serve requests.

Actual Behavior

Sometimes etcd isn't ready at this point, causing depending units to fail.
Specifically I'm running kubeadm in a unit file with After=etcd-member.service, yet sometimes it get started too early and fails with:

Dec 13 12:52:01 ip-172-20-144-6 kubeadm[858]:         [ERROR ExternalEtcdVersion]: couldn't parse external etcd version "": Version string empty

Reproduction Steps

It's a bit racy, so not that reliable to reproduce but it boils down to starting etcd followed by something that talks to the API.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Dec 13, 2017

Member

Thanks for reporting this. According to kubeadm source, when that error occurs the pre-flight check got a non-error response from etcd. However, it is unable to parse that, seemingly because it is empty.

etcd-member.service is a type=notify unit, and etcd source seems to correctly notify its start once everything else is done.

I suggest you to file a ticket against kubeadm or etcd directly, to investigate why the version endpoint seems to return a positive but empty response. On the OS side, everything looks as I would expect on the surface.
Just for completeness, you could check the journal around kubeadm starting time, to verify that the etcd unit is marked as started before the kubeadm one is starting.

Member

lucab commented Dec 13, 2017

Thanks for reporting this. According to kubeadm source, when that error occurs the pre-flight check got a non-error response from etcd. However, it is unable to parse that, seemingly because it is empty.

etcd-member.service is a type=notify unit, and etcd source seems to correctly notify its start once everything else is done.

I suggest you to file a ticket against kubeadm or etcd directly, to investigate why the version endpoint seems to return a positive but empty response. On the OS side, everything looks as I would expect on the surface.
Just for completeness, you could check the journal around kubeadm starting time, to verify that the etcd unit is marked as started before the kubeadm one is starting.

@discordianfish

This comment has been minimized.

Show comment
Hide comment
@discordianfish

discordianfish Dec 13, 2017

Thanks @lucab, you're right this appears to be rather a etcd issue.

discordianfish commented Dec 13, 2017

Thanks @lucab, you're right this appears to be rather a etcd issue.

@bgilbert bgilbert closed this May 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment