Skip to content

Commit

Permalink
Finishing up docs, adding seasonality search
Browse files Browse the repository at this point in the history
  • Loading branch information
hl-tstrawbridge committed May 30, 2023
1 parent 8f13ac0 commit 56c5738
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,5 @@ https://brokenhosts.hurricanelabs.com
- `min_count`
- `search_additions`
- `wineventlog_index`
- `bh_volume_alerting_indexes`
- You can also configure the requirement of a ticket number being in comments when updating the table on the Configure Broken Hosts Lookup page. This configuration is availabe on the Setup page in the app.
35 changes: 35 additions & 0 deletions default/savedsearches.conf
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,38 @@ search = index=summary source=bh_stats_gen orig_index IN (`bh_volume_alerting_in
| eval moving_perc = round(moving_perc, 2)."%" \
| table orig_index, last_count, avg_hour_count, perc_low_count, moving_average, moving_perc, zscore\
| rename orig_index AS Index, last_count AS "Current Count", avg_hour_count AS "Average Count", perc_low_count AS "First Percentile Low Count", moving_average AS "Moving Average", moving_perc AS "Current Count Percentage of Moving Average", zscore AS "Standard Score"

[Broken Hosts Alert Search - Volume Alerting with Seasonality]
action.webhook.enable_allowlist = 0
alert.suppress = 0
alert.track = 1
counttype = number of events
cron_schedule = */60 * * * *
dispatch.earliest_time = -30d@d
dispatch.latest_time = @d
display.general.type = statistics
display.page.search.tab = statistics
enableSched = 1
quantity = 0
relation = greater than
request.ui_dispatch_app = broken_hosts
request.ui_dispatch_view = search
search = index=summary source=bh_stats_gen orig_index IN (`bh_volume_alerting_indexes`)\
| bin _time span=1h\
| stats sum(count) AS hour_count by orig_index, _time\
| eval hour=strftime(_time,"%H")\
| eval month_day=strftime(_time, "%m-%d")\
| eval weekday=strftime(_time,"%a")\
| eval is_holiday=case(month_day=="01-01",1,month_day=="07-04",1,month_day=="11-25",1,month_day=="12-24",1,month_day=="12-31",1,1=1,0)\
| eval weekday_weekend=if((weekday=="Sun" OR weekday=="Sat"), "weekend", "weekday")\
| eval biz_hours=if((((hour < 8) OR (hour > 18)) OR (weekend_weekday=="weekend") OR is_holiday=1), "No", "Yes")\
| streamstats window=36 avg(hour_count) as moving_average\
| eval diff = hour_count - moving_average\
| eval moving_perc = (hour_count/moving_average) * 100\
| stats avg(hour_count) AS avg_hour_count, stdev(hour_count) AS stdev_hour_count, perc1(hour_count) AS perc_low_count, latest(hour_count) AS last_count latest(moving_average) as moving_average, latest(diff) AS diff, latest(moving_perc) AS moving_perc by orig_index, biz_hours\
| eval low_perc_ratio = perc_low_count/avg_hour_count\
| eval zscore = (last_count-avg_hour_count)/stdev_hour_count\
| where zscore < -3 OR (last_count < perc_low_count AND low_perc_ratio<.33) OR moving_perc < 2\
| eval moving_perc = round(moving_perc, 2)."%" \
| table orig_index, biz_hours, last_count, avg_hour_count, perc_low_count, moving_average, moving_perc, zscore\
| rename orig_index AS Index, last_count AS "Current Count", avg_hour_count AS "Average Count", perc_low_count AS "First Percentile Low Count", moving_average AS "Moving Average", moving_perc AS "Current Count Percentage of Moving Average", zscore AS "Standard Score", biz_hours AS "Within Business Hours"
1 change: 1 addition & 0 deletions docs/architecture/lookups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,4 @@ follows:
23. Entries where index=\* AND sourcetype=\* AND lateSecs is permanently modified
24. Entries where index=\* AND host=\* AND lateSecs is permanently modified
25. Entries where sourcetype=\* AND host=\* AND lateSecs is permanently modified
26. Default entries
8 changes: 8 additions & 0 deletions docs/architecture/macros.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,11 @@ The ``default_expected_time`` macro is used to set a default ``lateSecs`` value
defined in the lookup. The ``lateSecs`` value tells Broken Hosts how long a specific source of data
is allowed to go without sending data before an alert should be triggered. This setting is in
seconds, and defaults to 14400 (4 hours).

bh_volume_alerting_indexes
--------------------------

The ``bh_volume_alerting_indexes`` macro is used in the searches
``Broken Hosts Alert - Volume Alerting`` and
``Broken Hosts Alert - Volume Alerting with Seasonality``. It contains a comma separated list of
indexes.
17 changes: 17 additions & 0 deletions docs/architecture/searches.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,20 @@ none is defined in the lookup table.
If you're coming from an older version of Broken Hosts and choose to implement this search, we'd
still recommend you review the new ``Broken Hosts Alert Search`` as you may find additional uses
from it that were difficult or impossible in previous versions of the app.

Broken Hosts Alert - Volume Alerting
------------------------------------

``Broken Hosts Alert - Volume Alerting`` and
``Broken Hosts Alert - Volume Alerting with Seasonality`` are two example searches that can be used
to generate alerts on indexes that may have stopped ingesting data properly while still generating
some amount of logs. Both searches use a combination of standard score (z-score), moving averages,
and percentiles to determine whether or not log volume is anamalously low for that index.
``Broken Hosts Alert - Volume Alerting with Seasonality`` additionally factors in the time of day,
day of the week, and whether the day is a holiday to determine normal logging activity for indexes
whose volume may be sensitive to user activity.

The macro ``bh_volume_alerting_indexes`` is used to designate which indexes should be alerted on.
If both ``Broken Hosts Alert - Volume Alerting`` and
``Broken Hosts Alert - Volume Alerting with Seasonality`` are needed, a new macro can be created
and used to designate the indexes that should be used for each search.

0 comments on commit 56c5738

Please sign in to comment.