0.5.0

@captncraig captncraig released this Jun 21, 2016

0.5.0 Release Notes

Bosun

This is our first non-preview release in 9 months and includes significant changes since 0.4.0. Future releases should be at a quicker pace since this release involved a complete refactor of Bosun's internal storage.

  • We have moved Bosun's internal storage from purely in-memory (that was seralized to bolt) to redis/ledis and refactored the code to be more incident based. In 0.4.0 the dashboard could take 10-30 seconds to load at times. It now should not take more than a second under normal conditions, and should be even faster in a future release. This also results in faster startup times for Bosun and other performance improvements
  • Deprecated the logstash queries and replaced them with more generic elastic functions. This supports different time formats, index naming schemes, and adds more search possibilities
  • Added support for basic series operations. Before with seriesSets you could only use operators on them by combining them with scalars or numberSets. Now you can do operations like q(..) * q(...)
  • Added various functions to the expression language:
    • merge, shift, and over: Combines series sets, shift the time, show time over time graphs querying OpenTSDB
    • series: Manually construct a series -- useful for testing and drawing lines on graphs
    • month: Get the start of the calendar month, useful for alerts that follow the calendar such as bandwidth billing
    • tod: Turns a number into a duration string, so you can do your query durations based on duration math
    • linelr: Draws a line to visualize the result of forecastlr
  • Support for OpenTSDB 2.2 filters which allow you to aggregate a subset of tagvalues for a tagkey.
  • Added annotations to Bosun which are stored in Elastic (designed to possibly support other backends in the future). Bosun annotations have a start and end time, so you can use them to capture outages and maintenance windows
  • Added Grafana integration via the new Bosun Grafana Plugin
  • Added a route for OpServer integration (Preview)
  • Added bar graphs to the expression page when the result is a numberSet
  • Added forceClose and Purge actions
  • Improved incident filters (now supports AND, OR, !, and () grouping)

Upgrading

Before upgrading to this version, you should decide if you want to use a dedicated redis instance (recommended for production use), or the embedded ledisdb instance (default behavior). Instructions for configuring redis/ledis can be found on our website. The first time bosun starts up with this version, it will migrate all data from the old boltdb file into the new redis store. After that the bolt file should not be needed any more. You should back up your bolt state file before doing this operation, and note that it may take several minutes for Bosun to start while it does the migration.

Breaking Changes you need to be aware of

  • When using OpenTSDB 2.2, wildcard expansion in queries is now done by OpenTSDB instead of Bosun. This is cleaner and results in better performance, but mixing wildcards and alternation is no longer supported until OpenTSDB supports it. For example *foo*|*baz*. Bosun will not warn you about this so if you are using this alerts may silently fail. Be sure to look for these in your config before upgrading.
  • The graphite backend now rejects tags that are not valid Bosun tags (which have the same restrictions as OpenTSDB tags) since some of them would cause panics. The restrictions are "Only the following characters are allowed: a to z, A to Z, 0 to 9, -, _, ., / or Unicode letters". Some graphite based alerts may require updating

In Progress for Future Release

  • Config reloading without requiring the bosun process to be restarted
  • (Experiment) Last data available through the expression language so some alerts could still work if opentsdb is down
  • (Experiment) Working with distributions in addition to series (i.e. histograms)
  • Better post notifications

scollector

  • New collectors:
    • systemd service collector
    • varnish
    • Oracle
    • status.io pages
    • Fastly api stats
    • Extrahop
    • Elastic v2
    • Google Analytics collector
    • Nexpose collector
    • Cisco IOS BGP information (via snmp)
    • Fortinet SNMP collector
    • cadvisor
    • Support for MSSQL Named Instances
  • Add a local listener so datapoints can be pased to scollector via http
  • Bad datapoints in a batch no longer invalidate the entire batch
  • Better filtering options for excluding collectors and/or specific metrics

TSDBRelay

  • Added "external counters" for infrequent or sporadic metrics. These are counters that can receive increments from multiple sources.

Following is the autogenerated release notes:

other:

  • Not allowing invalid datapoints to ruin entire batches #1779
  • Add trafficSource tracking for detailed GA metrics. #1780
  • document lookupSeries. fix #1035 #1760
  • List open incidents #1764
  • revendor annotate after repo move #1708
  • Add html function to templates #1721
  • Adds resource reference to bosun_emitter #1727
  • Remove unnecessary go get in travis #1706
  • Fix merge #1673
  • Small utility to clean up search data for a metric. #1632
  • tod(scalar) was returning minutes as hours #1677
  • Annotate edit view #1636
  • moving version to _version #1637
  • working party of elastic.v3 #1562
  • Elastic v2 support and elastic expr refactor #1561
  • Escape soapLogin credentials #1612
  • Fixed link and Go version #1609

scollector:

  • fixing redis counters collector to work with ledis or redis #1732
  • fastly collector #1728
  • fastly status.io monitoring and status.io lib #1735
  • Additional functionality into ExtraHop collector #1698
  • Make MaxMem kill switch configurable #1652
  • fix error: interval.go:64: c_google_analytics: #1641
  • negative -f filters and total_time metric for httpunit collector #1630
  • add metadata to redis collector #1622
  • Fix for unset MaxQueueLen #1596

docs:

  • Documenting force close and purge actions. #1697
  • Fixing of typos and avoiding a potential ambiguity #1691

bosun/expr:

  • fix crash on invalid graphite tags #1663

collect:

  • Flush purges internal collections as well. #1552

bosun:

  • squelched keys don't go unknown. #1790
  • add missing redis connection close #1755
  • expr.execute refactor #1775
  • fill in unknown subject in incident view #1772
  • Fix ungroup to actually return a scalar #1744
  • Allow series func to create empty group #1745
  • move graphite and tsdb funcs to their own files #1742
  • series operations #1672
  • support png on egraph api route #1712
  • fix braced variable expansion when used in macros #1722
  • skiplast cmdline switch for development #1725
  • add month func to get end of or start of month #1740
  • stopping notifications that should no longer fire #1716
  • fix query links on expr page #1675
  • link multiple queries from graph UI #1676
  • clear filters when changing metrics in graph view #1685
  • support actions by incident id #1696
  • fix issue removing last annotation from graph view #1639
  • don't autocomplete old metrics on graph page #1660
  • func to turn seconds (scalars) to duration string #1669
  • series func to create series from scalars #1667
  • add Ledis bind address config option #1651
  • convert datastore to Ledisdb/redis implementation. #1332
  • don't panic on opentsdb version #1629
  • adding "unknownIsNormal" flag to alerts to convert unknown events into normal ones. #1620
  • push annotations to top on graph page #1619
  • add remote mac addresses to the host API #1551
  • add UnmarshalJSON() to Status type #1555
  • 2x API routes: Metadata for all metrics, metrics per tagk #1560
  • denormalized metadata should resolve to parent metric #1519
  • Show bar graph on the expr page if type == number #1011
  • annotate support #1610
  • fixing ledis error saving temp config. #1614
  • Auto-Closing open alerts if alert doesn't exist anymore. #1604
  • angular to from 1.2.x to 1.5 #1603
  • Implementing purge and forceCLose actions #1599
  • Add linelr func #1602
  • optimizing redis access to get tag sets #1575
  • serving temporary configs from redis. #1593
  • Notifications moved to redis #1592
  • fixing bug with chained notifications. #1584
  • fixing empty email body, and handling absence better #1588
  • fix migration from new install #1581
  • performing a preliminary save on new incidents to keep templates consistent #1605
  • Don't create filter in Opentsdb v2.1 #1569
  • Add over, shift, and merge funcs #1598

Downloads