Go JavaScript CSS Makefile
Clone or download
Latest commit 7cb0e1f Jul 10, 2017
Permalink
Failed to load latest commit information.
alerter pushed orphaned check notices to event queue Apr 30, 2017
api fixing access token bug Jul 11, 2017
assets Updating makefile (to support semver); sprucing up readme Feb 25, 2017
base fake for icomponent Apr 17, 2017
cfgutil cfgutil.ParseData seems unnecessary now so we're removing it Mar 14, 2017
cluster updated tests due to event queue signature changes Apr 30, 2017
config some overwatch tests Apr 17, 2017
dal updated a whole lot more log statements to be a bit more uniform Apr 27, 2017
director pushed orphaned check notices to event queue Apr 30, 2017
docs it is now possible to have empty warning/alerter configs Apr 27, 2017
event pushed orphaned check notices to event queue Apr 30, 2017
fakes updated event queue fake Apr 30, 2017
manager pushed orphaned check notices to event queue Apr 30, 2017
monitor pushed orphaned check notices to event queue Apr 30, 2017
overwatch updated a whole lot more log statements to be a bit more uniform Apr 27, 2017
state pushed orphaned check notices to event queue Apr 30, 2017
statik egh, not sure about this implementation Mar 6, 2017
ui Add loader to Status page Feb 5, 2017
util fixing access token bug Jul 11, 2017
vendor Add access token support to API requests; update rye Apr 13, 2017
.gitignore ignore etcd data Mar 6, 2017
.travis.yml re-enabling travis on 1.7 and tip Mar 21, 2017
Dockerfile Forgot --no-cache for the apk add Mar 13, 2017
LICENSE adding license + added go report card - woo Dec 12, 2016
Makefile no need for this anymore Apr 24, 2017
README.md Update README.md Mar 6, 2017
docker-compose.yml modify startup flags/envs Mar 7, 2017
main.go updated a whole lot more log statements to be a bit more uniform Apr 27, 2017

README.md

9volt

Build Status Go Report Card

A modern, distributed monitoring system written in Go.

Another monitoring system? Why?

While there are a bunch of solutions for monitoring and alerting using time series data, there aren't many (or any?) modern solutions for 'regular'/'old-skool' remote monitoring similar to Nagios and Icinga.

9volt offers the following things out of the box:

  • Single binary deploy
  • Fully distributed
  • Incredibly easy to scale to hundreds of thousands of checks
  • Uses etcd for all configuration
  • Real-time configuration pick-up (update etcd - 9volt immediately picks up the change)
  • Support for assigning checks to specific (groups of) nodes
    • Helpful for getting around network restrictions (or requiring certain checks to run from a specific region)
  • Interval based monitoring (ie. run check XYZ every 1s, 1y, 1d or even 1ms)
  • Natively supported monitors:
    • TCP
    • HTTP
    • Exec
    • DNS
  • Natively supported alerters:
    • Slack
    • Pagerduty
    • Email
  • RESTful API for querying current monitoring state and loaded configuration
  • Comes with a built-in, react based UI that provides another way to view and manage the cluster
  • Comes with a built-in monitor and alerter config management util (that parses and syncs YAML-based configs to etcd)
    • ./9volt cfg --help

Usage

  • Install/setup etcd
  • Download latest 9volt release
  • Start server: ./9volt server -e http://etcd-server-1.example.com:2379 -e http://etcd-server-2.example.com:2379 -e http://etcd-server-3.example.com:2379
  • Optional: use 9volt cfg for managing configs
  • Optional: add 9volt to be managed by supervisord, upstart or some other process manager
  • Optional: Several configuration params can be passed to 9volt via env vars

... or, if you prefer to do things via Docker, check out these docs.

H/A and scaling

Scaling 9volt is incredibly simple. Launch another 9volt service on a separate host and point it to the same etcd hosts as the main 9volt service.

Your main 9volt node will produce output similar to this when it detects a node join:

node join

Checks will be automatically divided between the all 9volt instances.

If one of the nodes were to go down, a new leader will be elected (if the node that went down was the previous leader) and checks will be redistributed among the remaining nodes.

This will produce output similar to this (and will be also available in the event stream via the API and UI):

node-leave

API

API documentation can be found here.

Minimum requirements (can handle ~1,000-3,000 <10s interval checks)

  • 1 x 9volt instance (1 core, 256MB RAM)
  • 1 x etcd node (1 core, 256MB RAM)

Note In the minimum configuration, you could run both 9volt and etcd on the same node.

Recommended (production) requirements (can handle 10,000+ <10s interval checks)

  • 3 x 9volt instances (2+ cores, 512MB RAM)
  • 3 x etcd nodes (2+ cores, 1GB RAM)

Configuration

While you can manage 9volt alerter and monitor configs via the API, another approach to config management is to use the built-in config utility (9volt cfg <flags>).

This utility allows you to scan a given directory for any YAML files that resemble 9volt configs (the file must contain either 'monitor' or 'alerter' sections) and it will automatically parse, validate and push them to your etcd server(s).

By default, the utility will keep your local configs in sync with your etcd server(s). In other words, if the utility comes across a config in etcd that does not exist locally (in config(s)), it will remove the config entry from etcd (and vice versa). This functionality can be turned off by flipping the --nosync flag.

cfg run

You can look at an example of a YAML based config here.

Docs

Read through the docs dir.

Suggestions/ideas

Got a suggestion/idea? Something that is preventing you from using 9volt over another monitoring system because of a missing feature? Submit an issue and we'll see what we can do!