Skip to content

Commit

Permalink
Import docs into repo
Browse files Browse the repository at this point in the history
See merge request splunk/broken_hosts!14
  • Loading branch information
Tim Baldwin committed Nov 16, 2018
2 parents c551e8c + e34bed6 commit a0f1ce3
Show file tree
Hide file tree
Showing 20 changed files with 856 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ local/
.DS_Store
metadata/local.meta
lookups/expectedTime.csv
docs/_build
21 changes: 21 additions & 0 deletions docs/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2018 Hurricane Labs, LLC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
19 changes: 19 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/_static/broken_hosts_app_diagram.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/broken_hosts_dashboard.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/configure_broken_hosts_lookup.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/investigation_dashboard.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions docs/architecture/advanced.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Advanced Configuration
======================

In addition to all of the Splunk-native configurations, the Broken Hosts app has additional
internal configuration. These items are considered "advanced" and may or may not be useful to you.
These settings can be found in ``bh.conf``.

[validation]
------------

- ``comments_must_have_ticket_number`` (boolean) - Primarily intended for Hurricane Labs managed
Splunk customers. Enforces a restriction on the ``comment`` field of the Broken Hosts Lookup
requiring a 5-or-more digit number to be entered for change management purposes (in the format
``#12345``). The default value for this setting is ``false``.
37 changes: 37 additions & 0 deletions docs/architecture/dashboards.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Dashboards
==========

Broken Hosts Dashboard
~~~~~~~~~~~~~~~~~~~~~~

.. image:: ../_static/broken_hosts_dashboard.png

The ``Broken Hosts`` dashboard is the main overview dashboard for the Broken Hosts app. It provides
you with a quick glance into hosts that are not sending data, hosts that are sending data with a
timestamp in the future, Eventtype Aggregations and Suppressions, and your "suppressed" items
(note: suppressed here refers to items with a ``lateSecs`` value of ``0``, meaning to never alert).
Clicking on any of the broken or future hosts will take you to the ``Investigation Dashboard``
where you can get additional information in order to troubleshoot the data.

Investigation Dashboard
~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ../_static/investigation_dashboard.png

The ``Investigation`` dashboard can be used to troubleshoot why data has stopped coming in for a
particular ``index``/``sourcetype``/``host`` combination. The filters let you select the data you
are interested in, and you can also select the field to aggregate by. This is useful to
determine whether a particular host or source is having issues. You can also identify the
frequency at which data comes into Splunk in order to determine an appropriate ``lateSecs`` value,
and quickly see whether Splunk, or the host itself, was recently stopped or restarted.

Configure Broken Hosts Lookup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ../_static/configure_broken_hosts_lookup.png

The ``Configure Broken Hosts Lookup`` dashboard is where you configure the lateSecs for a
particular ``index``/``sourcetype``/``host`` combination. You can also provide comments and an
expiration time for the configuration (if, for example, you have a maintenance window for a
firewall and it is expected to be offline and not sending logs for a certain period of time). You
can also set the ``contact`` field if you're using the ``Broken Hosts Alert - by contact`` search.
53 changes: 53 additions & 0 deletions docs/architecture/eventtypes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Eventtype Aggregations
======================

Using eventtypes to aggregate data
----------------------------------

A common request for users of older versions of Broken Hosts was to be able to aggregate certain
types of data together. For example, if any of the ``WinEventLog`` sourcetypes are coming in from
a particular Windows host, that's usually enough to feel comfortable that things are working as
expected. While this was possible in those versions of Broken Hosts thanks to the
``search_additions`` macro, that macro would quickly become complex and hard to manage. Starting
with Broken Hosts 4.0, however, there's now an easier mechanism for defining complex aggregations.

Eventtype Aggregations provide a simple, Splunk-native way to define these complex aggregations.
Eventtype Aggregations are eventtypes named in a specific format:
``bh_aggregate-$index,$sourcetype,$host``. The ``$index``, ``$sourcetype``, and ``$host`` here can
be replaced by either a field placeholder (``%orig_index%``, for example) or with a static value.
Using a static value, such as ``WinEventLog`` for ``$sourcetype``, allows you to group matching
data sources together. It is important to note

This concept is likely best illustrated by an example: Imagine you have a pfSense firewall, along
with the pfSense TA. This means the syslog from your firewall is coming into Splunk, and is split
into several different sourcetypes. However, pfSense has one stream of syslog, and if any of these
sourcetypes is working, it is generally safe to assume that the syslog function in pfSense is
operational. To aggregate these sourcetypes together, you could use an eventtype similar to the
following:

::

orig_sourcetype=pfsense*

You would then name this eventtype something like
``bh_aggregate-%orig_index%,pfsense,%orig_host%``. Once you have this created, you can add a single
line to the Broken Hosts Lookup, using the actual index and host, but using "pfsense" for the
sourcetype. Now, for each pfSense firewall you have, you will receive one alert if **all** of the
sourcetypes stop coming in for that firewall.

Suppressions
------------

In addition to setting ``lateSecs`` to ``0`` in the Broken Hosts Lookup, the Broken Hosts app also
supports an eventtype-based suppression mechanism. This allows you to access all of the fields
available in the summary data, including the ``date_*`` fields, allowing you to create some very
complex suppressions using eventtypes that would otherwise be impossible with just the lookup. The
naming scheme for these eventtypes is ``bh_suppress-label``, where label can be any arbitrary text
(assuming it produces a valid eventtype name).

For example, if you wanted to suppress events in your proxy index off-hours, you could create an
eventtype called ``bh_suppress-proxy_off_hours`` similar to the following:

::

orig_index=proxy date_wday="saturday" OR date_wday="sunday" OR date_hour<8 OR date_hour>17
12 changes: 12 additions & 0 deletions docs/architecture/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Architecture
============

.. toctree::
:maxdepth: 2

searches
macros
eventtypes
lookups
dashboards
advanced
69 changes: 69 additions & 0 deletions docs/architecture/lookups.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Broken Hosts Lookup
===================

::
implemented in kvstore
describe how ordering matters
case matching
wildcards
how editing works through the dashboard

Using the Broken Hosts Lookup
-----------------------------

There are seven fields in this lookup table (all fields are case *insensitive*):

- ``index`` - The index for the data that you would like to match - this field does accept
wildcards - this field is required
- ``sourcetype`` - The sourcetype for the data that you would like to match - this field does
accept wildcards - this field is required
- ``host`` - The host for the data that you would like to match - this field does accept
wildcards - this field is required
- ``lateSecs`` - The amount of time (in seconds) that the index/sourcetype/host combination is
allowed to be late before it alerts - this field is required
- ``suppressUntil`` - Alerts for the index/sourcetype/host combination will be suppressed until
this date - since we use the “convert auto()” function for this field, you can use any date
format that converts to a number - we recommend: “MM/DD/YYYY HH:MM:SS” or epoch time - this
field is optional
- ``contact`` - The email address where you would like the alert to be sent - if this is blank,
the email address from the default_contact macro will be used - this field is optional
- ``comments`` - Any comments that you would like to add for that line of the lookup table. This
information is not used in the alert. This field is typically used to record information about
why the entry is needed, when it was added, who added it, or any other details. This field is
optional

Ordering
--------

Ordering of entries in the Broken Hosts Lookup is important, but the Broken Hosts App ships with
a saved search that will re-order the lookup table in a logical way. As a result of several years
analyzing expected behavior across our customers, we've determined that the following order is as
follows:

1. Entries where index=\* AND sourcetype=\* AND alerting is temporarily suppressed
2. Entries where sourcetype=\* AND alerting is temporarily suppressed
3. Entries where index=\* AND alerting is temporarily suppressed
4. Entries where host=\* AND alerting is temporarily suppressed
5. Entries where index=\* AND host=\* AND alerting is temporarily suppressed
6. Entries where sourcetype=\* AND host=\* AND alerting is temporarily suppressed
7. Entries where alerting is temporarily suppressed
8. Entries where index=\* AND sourcetype=\* AND alerting is permanently suppressed
9. Entries where lateSecs is temporarily modified
10. Entries where sourcetype=\* AND lateSecs is temporarily modified
11. Entries where index=\* AND lateSecs is temporarily modified
12. Entries where host=\* AND lateSecs is temporarily modified
13. Entries where index=\* AND sourcetype=\* AND lateSecs is temporarily modified
14. Entries where index=\* AND host=\* AND lateSecs is temporarily modified
15. Entries where sourcetype=\* AND host=\* AND lateSecs is temporarily modified
16. Entries where alerting is permanently suppressed
17. Entries where lateSecs is permanently modified, or host=\* AND alerting is permanently
suppressed, or host=\* AND lateSecs is permanently modified, or sourcetype=\* AND host=\* AND
alerting is permanently suppressed
18. Entries where index=\* AND host=\* AND alerting is permanently suppressed
19. Entries where sourcetype=\* AND alerting is permanently suppressed
20. Entries where index=\* AND alerting is permanently suppressed
21. Entries where sourcetype=\* AND lateSecs is permanently modified
22. Entries where index=\* AND lateSecs is permanently modified
23. Entries where index=\* AND sourcetype=\* AND lateSecs is permanently modified
24. Entries where index=\* AND host=\* AND lateSecs is permanently modified
25. Entries where sourcetype=\* AND host=\* AND lateSecs is permanently modified
48 changes: 48 additions & 0 deletions docs/architecture/macros.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
.. _macros:

Macros
======

bh_stats_gen_constraints
------------------------

The ``bh_stats_gen_constraints`` macro is used to control what data is examined by the
``bh_stats_gen`` search when generating the metrics used by the alerting searches. The default
behavior is to exclude all data in the summary index, and all data from the stash sourcetype, but
include all other data.

**NOTE**: This macro is used within a ``tstats`` command, and therefore the macro's must be valid
``tstats`` syntax.

bh_stats_gen_additions
----------------------

The ``bh_stats_gen_additions`` macro is used to insert arbitrary SPL into the ``bh_stats_gen``
search in order to transform data before it is written to the summary index.

Example: use ``eventstats`` and ``eval`` statements to calculate custom metrics to be stored in
the summary data.

bh_alert_additions
------------------

The ``bh_alert_additions`` macro is used to insert arbitrary SPL into the alerting searches, in
order to transform data before it is written to the summary index.

Example: Apply subsearch logic from a monitoring system to automatically exclude hosts that are
known to be offline

default_contact
---------------

The ``default_contact`` macro is used only for the ``Broken Hosts Alert - by contact`` search. It
is used to set the default email address for items that don’t have a separate contact listed in
the ``contact`` column of the lookup table.

default_expected_time
---------------------

The ``default_expected_time`` macro is used to set a default ``lateSecs`` value for things not
defined in the lookup. The ``lateSecs`` value tells Broken Hosts how long a specific source of data
is allowed to go without sending data before an alert should be triggered. This setting is in
seconds, and defaults to 14400 (4 hours).
55 changes: 55 additions & 0 deletions docs/architecture/searches.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. _searches:

Saved Searches
==============

bh_stats_gen
------------

The ``bh_stats_gen`` search is responsible for generating statistics about data coming into Splunk.
The results are written to the ``summary`` index, to be picked up and read by other searches for
alerting purposes. It can be fine-tuned using the ``bh_stats_gen_contraints`` and
``bh_stats_gen_additions`` macros.

Broken Hosts - Auto Sort
------------------------

The ``Broken Hosts - Auto Sort`` search was implemented in order to optimize the ordering of the
Broken Hosts Lookup. Because the lookup is evaluated in a first-match fashion, the ordering of the
lookup is critical to preventing incorrect matches. You can view more information about the
ordering of the lookup in the :ref:`searches` documentation.

This search modifies the Broken Hosts Lookup in the following ways:

1. Entries are reordered based on the ordering rules defined in the :ref:`searches` documentation.
2. All fields are converted to lower case, as the lookup is case insensitive.

Broken Hosts Alert Search
-------------------------

``Broken Hosts Alert Search`` is the recommended way to get started building your own custom
alerting search. This search produces a single output row for each broken item, and ignores the
``contact`` field from the lookup completely. There are no alert actions defined on this search,
so you are free to configure them as needed. A few examples include:

- Add an email alert action to send a tuning report to your Splunk admins
- Add a webhook alert action to create tickets in your ticketing system

You can also create clones of this search to enable different alerting for different types of data.
For example, you may want to send email notifications to your Windows server admins when a server
stops sending ``WinEventLog:Security`` but want to trigger a ticket to your helpdesk when your
anti-virus system stops sending logs. You can even run a version of this search on your
``Enterprise Security`` search head to generate notable events.

Broken Hosts Alert - by contact
-------------------------------

``Broken Hosts Alert - by contact`` is primarily intended for anyone upgrading from an older
version of Broken Hosts. This search groups the alert lines by the ``contact`` field from the
lookup, and each contact will receive one email (the email action is configured by default on this
search). This search also relies on the ``default_contact`` macro to populate the contact when
none is defined in the lookup table.

If you're coming from an older version of Broken Hosts and choose to implement this search, we'd
still recommend you review the new ``Broken Hosts Alert Search`` as you may find additional uses
from it that were difficult or impossible in previous versions of the app.

0 comments on commit a0f1ce3

Please sign in to comment.