Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: add watchdog checker #209

Merged
merged 2 commits into from Nov 9, 2016
Merged

daemon: add watchdog checker #209

merged 2 commits into from Nov 9, 2016

Conversation

guilhem
Copy link
Contributor

@guilhem guilhem commented Nov 7, 2016

Add a function SdWatchdogEnabled() who check if service have to report its status
This is inspirated from the behavior of sd_watchdog_enabled

@ghost
Copy link

ghost commented Nov 7, 2016

Can one of the admins verify this patch?

@guilhem
Copy link
Contributor Author

guilhem commented Nov 7, 2016

I may need help to write tests on this feature

@squeed
Copy link
Contributor

squeed commented Nov 8, 2016

Can you add documentation for the method? Thanks!

@guilhem
Copy link
Contributor Author

guilhem commented Nov 8, 2016

@squeed should be done :)

@guilhem
Copy link
Contributor Author

guilhem commented Nov 8, 2016

@squeed
Copy link
Contributor

squeed commented Nov 8, 2016

Great, thanks.
I think you can test this pretty simply by just calling os.Setenv in your testing code. No need to get too complicated. Just store the variables somewhere when you start, and restore them when you finish.

// (0, err) - an error happened (e.g. error converting time).
// (time, nil) - watchdog is enabled and we can send ping.
// time is delay before inactive service will be killed.
func SdWatchdogEnabled() (time.Duration, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sd_watchdog_enabled() takes an additional unset_environment bool which will unconditionally unset the two environment variables. I think it is worth to have it here too.

Copy link
Contributor Author

@guilhem guilhem Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it is fine if the two are not aligned. @jonboulle @squeed may have a different opinion though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a reason to introduce the inconsistency; perhaps we could land this and follow up adding that variable? (or is it somehow more useful for this call than for sdnotify?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think @jonboulle is right - might as well add the parameter. Sorry for the runaround.

@lucab
Copy link
Contributor

lucab commented Nov 8, 2016

@guilhem thanks for PR. It looks fine overall and I think that you can follow @squeed suggestion regarding testing it. Additionally, I've left inline a small request which you may consider for the next iteration of this PR.

@guilhem
Copy link
Contributor Author

guilhem commented Nov 8, 2016

@squeed tests are there :)

}
s, err := strconv.Atoi(wusec)
if err != nil {
return 0, fmt.Errorf("Error converting WATCHDOG_USEC: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please lower-case error strings (e.g. "error converting...")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return 0, fmt.Errorf("Error converting WATCHDOG_USEC: %s", err)
}
if s <= 0 {
return 0, fmt.Errorf("Error WATCHDOG_USEC should be positive")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Error WATCHDOG_USEC should be positive/WATCHDOG_USEC must be a positive number/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Add a function SdWatchdogEnabled() who check if service have to report its status
This is inspirated from the behavior of sd_watchdog_enabled
Copy link

@euank euank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions around the testing.

The implementation code looks okay to me.

// (time, nil)
err := os.Setenv("WATCHDOG_USEC", "100")
if err != nil {
panic(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Fatal preferred over panic to halt a test early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually disagree on that one - my usual classification is:

  • t.Error for test failures related to user code that can continue
  • t.Fatal for test failures related to user code that cannot continue
  • panic for test failures unrelated to user code e.g. in setup etc (very explicit that it's an environment failure/solar flare and not a code bug)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with @jonboulle here, let's keep it as is.


delay, err := SdWatchdogEnabled()
if delay == 0 || err != nil {
t.Errorf("TEST: Watchdog enabled FAILED")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're okay with this test being 1.7+ only, we could use subtests to cleanup these repetitive tests and still have nice error reporting.

I don't know how much this project needs compatibility with older go versions.

Failing at using subtests, I'd still rather have a slice of cases to loop through than this, though It's not a blocking concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtests are perhaps a bit of a stretch, I agree on having the test-cases in a better tabular form if possible.

@guilhem
Copy link
Contributor Author

guilhem commented Nov 9, 2016

And so?.... 👼

@squeed
Copy link
Contributor

squeed commented Nov 9, 2016

Tests look great, thanks!

@lucab
Copy link
Contributor

lucab commented Nov 9, 2016

It looks like we agree this is mostly ok as it is. The only remaining concerns here are:

  • implementing unset_env for watchdog and notify
  • re-arranging the tests to be in a tabular form

Neither of those are blockers, so we can either do another iteration here or land this first and followup with a cleaning PR, depending on @guilhem availability. I'd suggest to at least re-arrange the tests in the scope of this PR, and be ready for a change in the function signature happening soon later. @guilhem sounds ok?

@guilhem
Copy link
Contributor Author

guilhem commented Nov 9, 2016

@lucab I'm not very familiar with go tests and don't have many ideas to improve those.
But with new github feature, any maintainer is free to add more commits in my watchdog branch to improve this PR ^^

@jonboulle
Copy link
Contributor

I need to do something mind-numbing after the US election so I'll take a shot.

@jonboulle
Copy link
Contributor

@lucab ptal

@lucab
Copy link
Contributor

lucab commented Nov 9, 2016

LGTM, thanks everybody. I'll follow up with the unset_environment breaking change.

@lucab lucab changed the title Add watchdog checker daemon: add watchdog checker Nov 9, 2016
@lucab lucab merged commit d7387fd into coreos:master Nov 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants