Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete a reminder if the corresponding healthcheck is blacklisted #201

Merged
merged 1 commit into from Oct 15, 2017

Conversation

Gerrrr
Copy link
Collaborator

@Gerrrr Gerrrr commented Oct 7, 2017

Solves #198.

Problem

Currently, if the health check is blacklisted and it fails, there is no alert or reminder created. If the health check fails and then it is blacklisted, there can exist a reminder that will keep alerting.

Steps to reproduce:

 $ consul agent -dev
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
           Version: 'v0.7.5'
           Node ID: 'fb2170a8-257d-3c64-b14d-bc06cc94e34c'
         Node name: 'Aleksandrs-MacBook-Pro.local'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:
...

$ nomad agent -dev
    No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

                 Atlas: <disabled>
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 0.5.6

==> Nomad agent started! Log data will stream in below:
...

Consul-alerts config:

consul-alerts/config/notif-profiles/log_with_reminders:

{
  "Interval": 1,
  "NotifList": {
    "log":true,
    "email":false
  }
}

consul-alerts/config/notif-selection/hosts/Aleksandrs-MacBook-Pro.local:

log_with_reminders

To create a reminder I blocked the port 4646. On OS X I added the following line to the file /private/etc/pf.conf:

block drop quick on lo0 proto tcp from any to any port = 4646

Then reloaded the configuration:

sudo pfctl -f /etc/pf.conf

Then started Consul-alerts and enabled pf:

$ sudo pfctl -e

At this stage Nomad Server HTTP Check and Nomad Client HTTP Check were in the critical state, consul-alerts wrote notification messages to the log file and created reminders (expected behavior).

Then I blacklisted the node by creating consul-alerts/config/checks/blacklist/nodes/Aleksandrs-MacBook-Pro.local key in Consul. The reminders kept writing alert messages in the log file.

Solution

Delete blacklisted reminders if the corresponding health check is blacklisted. The appropriate place is in CheckProcessor.reminderRun because this is the only place that regularly check reminders, renews them and queues them for notification.


@fusiondog Can you please review?

@Gerrrr Gerrrr requested a review from fusiondog October 7, 2017 19:29
@djenriquez
Copy link

Awesome!

@fusiondog fusiondog merged commit e7a911f into AcalephStorage:master Oct 15, 2017
dagvl added a commit to vimond/consul-alerts that referenced this pull request Jul 5, 2018
* upstream/branch/master: (68 commits)
  End-to-end tests in Travis against Consul 1.0.x (AcalephStorage#226) (AcalephStorage#228)
  Fix typo in README
  Notify Mattermost via Incoming Webhooks (AcalephStorage#206)
  Test in travis against multiple versions of Consul (AcalephStorage#202)
  update reminder state with CAS instead of Put (AcalephStorage#193)
  Delete a reminder if the corresponding healthcheck is blacklisted (AcalephStorage#201)
  Trim config parameters (AcalephStorage#203)
  Giving clearer examples on slack setup
  Add tests for SlackNotifier unmarshalling from json
  Enable cluster name override on slack integration
  Update README.md (AcalephStorage#196)
  ACL example (AcalephStorage#194)
  Add cluster-name to awssns notifier
  Add templating to awssns notifier
  Add test for aws-sns-notifier
  Extract templating so it can be used from multiple notifiers
  bump version (AcalephStorage#189)
  Travis runs tests now (AcalephStorage#188)
  README entry for the specific tresholds
  Tests for consul.GetChangeThreshold
  ...

# Conflicts:
#	Dockerfile
#	notifier/opsgenie-notifier.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants