New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conf int with reload #1817

Merged
merged 67 commits into from Jul 26, 2016

Conversation

Projects
None yet
2 participants
@kylebrandt
Member

kylebrandt commented Jul 9, 2016

wip:

# kbrandt at grove in ~ on git:master x [16:19:15]
$ curl -XPOST 'http://localhost:8080/api/config/alert' -d "{
    \"Name\" : \"meowNew\", \"AlertText\" : \"alert meowNew {\n\twarn = 0 + 6\n}\", \"Hash\": \"\"
}"
reloaded%
# kbrandt at grove in ~ on git:master x [16:19:12]
$ curl -XDELETE 'http://localhost:8080/api/config/alert/meowNew'
meowNew deleted%

@kylebrandt kylebrandt referenced this pull request Jul 9, 2016

Closed

wip: config reloading #1777

@kylebrandt

This comment has been minimized.

Member

kylebrandt commented Jul 9, 2016

@kylebrandt

This comment has been minimized.

Member

kylebrandt commented Jul 13, 2016

@kylebrandt

This comment has been minimized.

Member

kylebrandt commented Jul 17, 2016

Example sys conf

Hostname = "bosun.example.com"
HTTPListen = ":8080"
TimeAndDate = [ 202, 75, 179, 136 ]
ShortURLKey = "aKey"
CommandHookPath = "/Users/kbrandt/src/hook/hook"
RuleFilePath = "/Users/kbrandt/src/testProdRepo/prod.conf"

[OpenTSDBConf]
    Host = "ny-tsdb01:4242"
    Version = 2.2
    ResponseLimit = 25000000

#Test comment

[ElasticConf]
    Hosts = ["http://ny-lselastic01.example.com:9200", "http://ny-lselastic02.example.com:9200","http://ny-lselastic03.example.com:9200"]

[DBConf]
    RedisHost = "localhost:6389"

[SMTPConf]
    EmailFrom = "bosun@example.com"
    Host = "mail.example.com"

StateFile string
LedisDir string
LedisBindAddr string
type SystemConfProvider interface {

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

I think the interface adds complexity here we don't really need. Why not just a struct without Getter Functions? Are we afraid some code will change it without going through the conf package? I'm not really super worried about that.

This comment has been minimized.

@kylebrandt

kylebrandt Jul 18, 2016

Member

I'm okay with removing the systemconf interface.

}
func ValidateSystemConf(sc SystemConfProvider) error {
if sc.GetSMTPHost() == "" && sc.GetEmailFrom() == "" {

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

I'm not sure about this check. What if someone is only using post notifications?

This comment has been minimized.

@kylebrandt

kylebrandt Jul 18, 2016

Member

bad logic, the previous was ||, but what I mean was "if one, than the other". Will fix

RedisDb int
RedisPassword string
type RuleConfProvider interface {
RuleConfWriter // Wrong place for now

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

Why is this wrong? I think its pretty cool.

MinGroupSize int
GetAlerts() map[string]*Alert
GetAlert(string) *Alert
SetAlert(string, string) (string, error)

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

Set/Delete belong on RuleConfWriter, but you may already know that.

AnnotateElasticHosts []string // CSV of Elastic Hosts, currently the only backend in annotate
AnnotateIndex string // name of index / table
GetMacro(string) *Macro

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

No GetMacros or GetNotifications? Are those only looked at when an alert references them?

This comment has been minimized.

@kylebrandt

kylebrandt Jul 18, 2016

Member

GetMacro isn't actually used anywhere. Will remove for now

BulkEdit(BulkEditRequest) error
GetRawText() string
GetHash() string
SaveRawText(rawConf, diff, user, message string, args ...string) error

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

I'm ok leaving this for now, but the rawText abstraction may get broken if/when we add an alternate backend

s = strings.TrimSpace(s)
n := c.GetNotification(s)
if n == nil {
continue // TODO error here?

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

I think this is the correct behavior for now. But we will need to think about what happens if a config provider fails. All of those methods will likely need to return an error in the future.

}
type Macro struct {
Text string
Pairs interface{} // this is BAD TODO

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

Is this your TODO?

This comment has been minimized.

@kylebrandt

kylebrandt Jul 18, 2016

Member

Before it was non exported, nodePair, Pairs []nodePair, so this was a shortcut to get it working when I moved the rule stuff to a different package

}
type sectionType int
type LocationType int

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

What value does LocationType add? Wondering if we just leave it out for now.

@@ -0,0 +1,221 @@
package rule

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

I'm a little shaky on calling this package rule. Not sure what would be better though. Maybe alerts?

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

Or is this really the file provider? Not sure.

if err != nil {
slog.Fatal(err)
}
ruleProvider.SetSaveHook(cmdHook)

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

Why does the ruleProvider need to be given the save hook. Can't the reload function below just execute it?

This comment has been minimized.

@kylebrandt

kylebrandt Jul 19, 2016

Member

The file is saved, and then the hook is called. If the hook fails, the orginal file is restored:

if c.saveHook != nil {
        err := c.callSaveHook(c.Name, user, message, args...)
        if err != nil {
            sErr := c.SaveConf(c)
            restore := "successful"
            if sErr != nil {
                restore = sErr.Error()
            }
            return fmt.Errorf("failed to call save hook: %v. Restoring config: %v", err, restore)
        }
    }
    err = c.reload()

Also the hook is passed arguments that reload doesn't need to be aware of. But maybe there is a better way?

reload = func() error {
select {
case reloading <- true:
fmt.Println("got lock")

This comment has been minimized.

@captncraig

captncraig Jul 18, 2016

Contributor

fmt

// ValidateSystemConf runs sanity checks on the system configuration
func ValidateSystemConf(sc SystemConfProvider) error {
if sc.GetSMTPHost() == "" && sc.GetEmailFrom() != "" || sc.GetSMTPHost() != "" && sc.GetEmailFrom() == "" {

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

parens here please. I can never remember priorities.

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

or how about:

hasHost := sc.GetSMTPHost() != ""
hasFrom := sc.GetEmailFrom() != ""
if hasFrom != hasHost{ERROR!}
type Template struct {
Text string
Vars
// A Lookup is used to return values based on the tags of a response. It

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

all of these comments are really helpful btw. Thanks.

newRawConf.WriteString("\n")
return newRawConf.String()
}
newRawConf.WriteString(orginalRaw[:getLocationStart(l)])

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

I love the emoji, but these incantations are repeated a few times. How about a helper on the Location type?

textBefore(originalRaw, l)
textAfter(originalRaw, l)

may not be worth it.

// NewSystemConf retruns a system conf with default values set
func newSystemConf() *SystemConf {
return &SystemConf{
CheckFrequency: Duration{Duration: time.Minute * 5},

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

I still feel like some of these fields should be in the rule conf. Is there a technical reason they are here, or is it a choice?

CheckFrequency DefaultRunEvery MinGroupSize PingDuration(maybe) and UnknownThreshold all seem like good candidates for rule config.

This comment has been minimized.

@kylebrandt

kylebrandt Jul 20, 2016

Member

That was my original thought as well. The reason I put them in system was more of a permissions. This is because editing those will change the way all alerts in Bosun will run. So having those in the system conf seemed safer to me.

}
// LoadSystemConfig is like LoadSystemConfigFile but loads the config from a string
func LoadSystemConfig(conf string) (*SystemConf, error) {

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

Can we reduce duplication here? Maybe LoadFile uses ioutil.ReadFile then calls this method?

results *expr.Results
error error
}
rc := make(chan res)

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

Trying to reason about orphaned goroutines here. I think this chan should be buffered with size 1.

results *expr.Results
error error
}
rc := make(chan res)

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

buffer of 1

t := a.Unknown
if t == 0 {
t = s.Conf.CheckFrequency * 2 * time.Duration(a.RunEvery)
runEvery := s.SystemConf.GetDefaultRunEvery()

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

I think we should go back to precalculating runevery per alert. This also looks to me like a reason default run every and check freq should be moved to rule conf again.

if s.Conf.SkipLast {
func (s *Schedule) Close(reload bool) {
s.cancelChecks()
s.checksRunning.Wait()

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

I feel like there should be a timeout here, but I'm not sure what we would do. I don't like the idea of a long running expression being unkillable. Maybe I'm worrying too much?

This comment has been minimized.

@kylebrandt

kylebrandt Jul 21, 2016

Member

The expressions are cancelled while running. So the only thing that we wait for to complete is the run history.

func MakeIncidentSummary(c conf.RuleConfProvider, s SilenceTester, is *models.IncidentState) IncidentSummaryView {
alert := c.GetAlert(is.AlertKey.Name())
if alert == nil {
panic("should not be called on nonexistant alert, for now at least TODO?")

This comment has been minimized.

@captncraig

captncraig Jul 20, 2016

Contributor

I hate panics. Return an err maybe? Return nil maybe? Is this only called from web requests?

@captncraig

This comment has been minimized.

Contributor

captncraig commented Jul 20, 2016

That's all I have for now.

@kylebrandt

This comment has been minimized.

Member

kylebrandt commented Jul 21, 2016

@captncraig Do you see any potential issues with dispatchNotifications which is run from schedule.Run() in alertRunner.go? I'm wonder if on reload (and currently in a shutdown) notifications could potentially be lost and never sent - that would be bad.

@@ -0,0 +1,52 @@
Thoughts:

This comment has been minimized.

@captncraig

captncraig Jul 26, 2016

Contributor

Lets go ahead and delete this at this point.

@kylebrandt kylebrandt merged commit 554d588 into master Jul 26, 2016

2 checks passed

bosun All checks Passed!
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@kylebrandt kylebrandt deleted the confIntWithReload branch Aug 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment