Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

Errors returned by etcd_set_default are ignored #2214

Closed
gabrtv opened this issue Oct 21, 2014 · 5 comments · Fixed by #2308
Closed

Errors returned by etcd_set_default are ignored #2214

gabrtv opened this issue Oct 21, 2014 · 5 comments · Fixed by #2308
Assignees
Labels
Milestone

Comments

@gabrtv
Copy link
Member

gabrtv commented Oct 21, 2014

We have seen cases in the wild where etcd_set_default returns an error, resulting in required keys never being seeded. We swallow the errors with || true in many cases. The result is components waiting on confd due to required keys not existing.

For example, the controller hanging due to:

REGISTRY_URL = '<no value>://10.21.2.93:5000'  # noqa
@carmstrong
Copy link
Contributor

Curious... we've been using etcd_set_default for many many moons. Wondering if something changed recently with etcd that would cause this to sometimes fail. The --no-sync should guarantee we're always just communicating with the local etcd, which should never fail to set a value.

@gabrtv gabrtv modified the milestones: 0.15, 0.16 Oct 22, 2014
@carmstrong
Copy link
Contributor

We can't remove || true because we expect the etcdctl mk to fail every time except the original call. The only time we are incorrectly swallowing the error is on the initial create, which never was an issue before. We may need to rewrite this logic, although this seemed like the most straightforward way. Perhaps we could check for a specific exit code (i.e. does "key already exists" have a specific error code? we could swallow in that case).

@carmstrong
Copy link
Contributor

@gabrtv What's a reliable way to reproduce this? I haven't seen any occurrences of this when provisioning Deis recently.

@carmstrong carmstrong modified the milestones: 0.16, 0.15 Oct 27, 2014
@carmstrong
Copy link
Contributor

So the results of some testing:

core@deis-1 ~ $ etcdctl get /foo
Error:  100: Key not found (/foo) [44]
core@deis-1 ~ $ echo $?
4
core@deis-1 ~ $ sudo systemctl stop etcd && etcdctl get /foo
Error:  Cannot sync with the cluster using peers 127.0.0.1:4001
core@deis-1 ~ $ echo $?
2

So I'm thinking a reasonable approach would be to swallow exit code 4, but no others.

@carmstrong
Copy link
Contributor

In testing #2308, I'm seeing the builder have an etcd every single time on startup. This would explain why others are seeing builder enter a "wait-for-confd" loop because some keys are not set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants