Skip to content

Commit

Permalink
Merge pull request #436 from earthgecko/SNAB
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
earthgecko committed Aug 2, 2021
2 parents 70b8eea + 81b8786 commit d6a4763
Show file tree
Hide file tree
Showing 48 changed files with 1,591 additions and 569 deletions.
52 changes: 47 additions & 5 deletions docs/alerts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
Alerts
======

Alerts in Skyline are configured via two sets of settings. There are the
Alerts in Skyline are configured via multiple sets of settings. There are the
Analyzer (Mirage and Ionosphere) related settings and the Boundary related
alert settings. This is due to the two classes of alerts being different,
with Analyzer, Mirage and Ionosphere alerts being related to anomalies and
Boundary alerts being related to breaches of the static and dynamic thresholds
defined for Boundary metrics.
defined for Boundary metrics. Further to this there are alert related settings
for each alert output route, namely smtp, slack, pagerduty and sms.

Required smtp alerter for Analyzer and Mirage metrics
=====================================================
Expand Down Expand Up @@ -36,30 +37,50 @@ and :mod:`settings.SMTP_OPTS` would need to look like this.
.. code-block:: python
ALERTS = (
('carbon', 'smtp', 3600), # Analyzer metric, only analysed at FULL_DURATION
('carbon', 'smtp', 3600), # Analyzer metric, only analysed at FULL_DURATION - Mirage NOT enabled
('skyline', 'smtp', 3600, 168), # Mirage enabled as 168 is passed as SECOND_ORDER_RESOLUTION_HOURS
('stats', 'smtp', 1800, 168), # Mirage enabled as 168 is passed as SECOND_ORDER_RESOLUTION_HOURS
('telegraf', 'smtp', 3600, 168), # Mirage enabled as 168 is passed as SECOND_ORDER_RESOLUTION_HOURS
# smtp alert tuples MUST be declared first
('carbon', 'slack', 3600),
('skyline', 'slack', 3600),
('stats', 'slack', 1800),
('telegraf', 'slack', 3600),
)
SMTP_OPTS = {
# This specifies the sender of email alerts.
'sender': 'no_email',
# recipients is a dictionary mapping metric names
# (exactly matching those listed in ALERTS) to an array of e-mail addresses
'recipients': {
'stats': ['no_email'],
'skyline': ['no_email'],
'carbon': ['no_email'],
'skyline': ['no_email'],
'stats': ['no_email'],
'telegraf': ['no_email'],
},
# This is the default recipient which acts as a catchall for alert tuples
# that do not have a matching namespace defined in recipients
'default_recipient': ['no_email'],
'embed-images': True,
}
Smart alerting
==============

Due to the nature of analysing lots of time series, there are times when LOTS
of events do happen, something changes and causes lots of change, things like
cloud provider failures or reboots or a bad deployment, these things happen.
When they do lots and lots of metrics can be submitted for further analysis,
although Skyline is fast, getting through lots of complex analysis on 100s of
metrics takes the time it takes. At times like these Skyline starts waterfall
alerting, this means if Analyzer sent 100 metrics to Mirage to check and after
5 minutes there are still 30 pending, Analyzer will remove the items from the
Mirage queue and just alert on them. The same is true for checks submitted to
Ionosphere by Mirage, any check sent upstream will be alerted on by the parent
app if the result of further analysis is not available after a defined period.
This way alerts are not missed, although under these conditions, there will be
some false positives.

Alert settings
==============

Expand All @@ -71,6 +92,8 @@ the alerter:
- :mod:`settings.PAGERDUTY_ENABLED`
- :mod:`settings.SLACK_ENABLED`
- :mod:`settings.HTTP_ALERTERS_ENABLED`
- :mod:`settings.AWS_SNS_SMS_ALERTS_ENABLED`
- :mod:`settings.SMS_ALERT_OPTS`

Analyzer, Mirage and Ionosphere related alert settings (anomaly detection) are:

Expand All @@ -90,6 +113,12 @@ Analyzer, Mirage and Ionosphere related alert settings (anomaly detection) are:
alerts to a http endpoint
- :mod:`settings.MIRAGE_ENABLE_ALERTS` - must be set to `True` to enable alerts
from Mirage
- :mod:`settings.AWS_SNS_SMS_ALERTS_ENABLED` - must be set to `True` if you want
to send alerts via SMS. boto3 also needs to be set up and AWS/IAM resource
that boto3 uses needs permissions to publish to AWS SNS. See boto3
documentation - https://github.com/boto/boto3)
- :mod:`settings.SMS_ALERT_OPTS` - must be defined if you want to send SMS
alerts.

Boundary related alert settings (static and dynamic thresholds) are:

Expand All @@ -107,6 +136,19 @@ Boundary related alert settings (static and dynamic thresholds) are:
Slack
- :mod:`settings.BOUNDARY_HTTP_ALERTERS_OPTS` - must be defined if you want to
push alerts to a http endpoint
- :mod:`settings.AWS_SNS_SMS_ALERTS_ENABLED` - must be set to `True` if you want
to send alerts via SMS. boto3 also needs to be set up and AWS/IAM resource
that boto3 uses needs permissions to publish to AWS SNS. See boto3
documentation - https://github.com/boto/boto3)
- :mod:`settings.SMS_ALERT_OPTS` - must be defined if you want to send SMS
alerts.

SMS alerts
==========

Skyline can send SMS via AWS SNS. The set up of AWS SNS is beyond the scope of
this documentation, see AWS SNS documentation (
https://docs.aws.amazon.com/sns/latest/dg/sms_publish-to-phone.html).

http_alerter alerts
===================
Expand Down
2 changes: 1 addition & 1 deletion docs/algorithms/custom-algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Custom algorithms
=================

**EXPERIMENTAL - py3 only**
**py3 only**

This section describes the process, steps and resources required to run custom
algorithms in Skyline. Adding a custom algorithm or algorithms is easier and
Expand Down
1 change: 1 addition & 0 deletions docs/algorithms/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ Algorithms
.. toctree::

custom-algorithms.rst
matrixprofile.rst
124 changes: 124 additions & 0 deletions docs/algorithms/matrixprofile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
.. role:: skyblue
.. role:: red

`matrixprofile`_
================

The Matrix Profile algorithm was pioneered by Professor Eamonn Keogh at the
University of California-Riverside and Professor Abdullah Mueen at the
University of New Mexico in 2015. When it came onto the scene in 2015 it made
some remarkable claims, unfortunately its initial implementation was in matlab
but the claims made by Eamonn Keogh et al could not be ignored, the algorithm
was set to change the face of time series mining and it has. Those claims were
not unfounded and thanks to the efforts of the matrixprofile.org team the
`matrixprofile`_ algorithm is now available in Python and if you want a CUDA
enabled version there is STUMPY.

`matrixprofile`_ uses the `MASS`_ similarity search method to identify motifs (
shapelets, a short subsequence of a time series) that are discords (not similar
or least similar to anything else in the time series) and these are anomalies.

Skyline can use `matrixprofile`_ as additional algorithm in the analysis pipeline.
The use of `matrixprofile`_ massively reduces false positive detections compared to
the basic three-sigma based algorithm methods on time series of 7 days in length.
However as the saying goes, all magic comes with a price.

The price in the case of `matrixprofile`_ is a few false negatives. Where
three-sigma based algorithms would detect an instance as anomalous and you would
agree it was, rarely, but occasionally `matrixprofile`_ will identify it as not
anomalous. A comparison of three-sigma vs. `matrixprofile`_ after the review of
662 real production data events are as follows:

+-------------------+------+------+------+------+
| algorithm_group | tP | fP | tN | fN |
+===================+======+======+======+======+
| three-sigma | 69 | 790 | N/A | N/A |
+-------------------+------+------+------+------+
| matrixprofile | 50 | 93 | 697 | 19 |
+-------------------+------+------+------+------+

tN and fN are not scored for three-sigma as it was the baseline. Each
three-sigma triggered instance was sent to SNAB and `matrixprofile`_ was run
against the instance results recorded for comparison.

The logical place for `matrixprofile`_ to sit in Skyline is as a custom algorithm
that is run only by Mirage after three-sigma consensus has been achieved. It is
too computationally and time expensive to run in Analyzer, although it could run
in Analyzer if you only have a few 100 metrics.

Considerations when implementing matrixprofile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default in v3.0.0 matrixprofile is not enabled as an additional algorithm to
run. This is because it is a user decision as to whether they wish to implement
matrixprofile as an algorithm to be added to the analysis pipeline. The
addition of matrixprofile has a significant impact on the amount of events that
are recorded and alerted on. This is due to matrixprofile significantly
decreasing the number of false positive alerts that are sent. **HOWEVER** due
to this being a fundamental change to Skyline, the current Skyline users (
running v2.0.0) must be aware of what this means in practice.

1. matrixprofile will result in a small number of false negatives (~2.48%)
2. matrixprofile will filter out a very large number of false positives (~88.4%)

These two points must be stressed because for any users that are accustomed to
using Skyline as an information stream of changes in your environments, if you
turn on matrixprofile, you may question if Skyline is running properly with
matrixprofile probably result in filtering out ~88.4% of your events.

Although the paneacea of anomaly detection **alerting** is 0 false positives and
0 false negatives, anomaly detection is not only about alerting, it is about the
bad, the good and the unknown. This drastic decrease in events also results in
a reduction of correlation data, training data opportunities and an overall
reduced picture events in your timeline.

If you are a current Skyline user and you implement matrixprofile be prepared
for most of the events you have become accustomed too vanishing. For instance
when you or developers deploy something, those subtle changes that used to
trigger and send an event into slack, probably will not happen any more. Most,
if not all of the events triggered when changes were made to things or some
action was taken will stop. This results in those feedback reinforcements events
that used to trigger in slack vanishing. You will be accustomed to making
changes or doing something and often seeing events being triggered in slack,
that reinforcement feedback mostly gets lost with addition of maxtrixprofile as
an algorithm.

For users of Skyline who have possibly learnt a lot about their systems and
specifically the interactions between the components in those systems,
discovering related things that were unknown, discovering and learning what
metrics they had never heard of and found things that they should watch and
understand more. A lot of our learnings have been as a result of these
events that fired on subtle changes. So make a careful consideration and
assessment before thinking that anomaly **alerting** panacea is what you
really want.

However that said, if you are already running 1000s of metrics in Skyline and
you have training on metrics, you have probably learnt a lot already, so you
would probably want to enable it. Just prepare yourself for a big change.

To further inform your considerations, it must be pointed out that although
matrixprofile is not enabled by default, the use of `MASS`_ in the Ionosphere is
enabled by default. This means that the core functionality of matrixprofile is
implemented in the pattern matching side of things. If you find Skyline very
useful an informational stream and not simply for alerting becasue of the above
stated reasons, carefully consider whether you want to use matrixprofile in
the analysis workflow as you will lose a lot of visibility you have become
accustom to having. The consideration here is that if you do not enable
matrixprofile on the algorithm analysis side, Skyline can still provides its get
core similar search functionality on all your trained features profiles. Maybe
that is the best of both worlds, the decision in yours.

You can consider adding some namespaces to be analyzed by matrixprofile and
assess the results before jumping in with both feet, it really depends if you
are wanting to use Skyline as an alerter only or as an informational stream of
the bad, the good and the unknown, that YOU learn from.

Enabling matrixprofile to run in Mirage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See the current ``'skyline_matrixprofile'`` example in
:mod:`settings.CUSTOM_ALGORITHMS` and change the ``'namespaces':`` parameter as
appropriate.

.. _MASS: https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
.. _matrixprofile: https://github.com/matrix-profile-foundation/matrixprofile
76 changes: 76 additions & 0 deletions docs/anomify-cutting-edge-skyline.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.. role:: skyblue
.. role:: red

Anomify - cutting edge Skyline
==============================

Anomify is a fully managed Skyline application and more. Anomify are offering
free access to Skyline users who sign up now (future integrations will be
charged).

Built by the team behind Skyline, Anomify takes the good parts of open-source
Skyline like the training process and streamlined them. Anomify has also
added things like a modern, user friendly dashboard, supercharged graphs,
contextual alerts, distributed notifications and a well documented API.

Get Started at `anomify.ai/skyline <https://anomify.ai/skyline>`_.

Benefit from a managed service
------------------------------

Skyline is a complex beast which can take a lot of effort and time to get
running well. With Anomify the set up overhead is taken off your hands. Anomify
run the latest Skyline version so you never have to install updates, handle
scaling issues or manage redundant copies of your data. There is no need to
understand how new analysis features apply to your metrics or how to configure
and implement them, that is all done for you. All you have to do is send
metrics and configure alerts.

Benefit from direct message support with the team behind Skyline via Slack and
email.

Anomify is currently free for Skyline users.

Get Started at `anomify.ai/skyline <https://anomify.ai/skyline>`_.

.. figure:: images/anomify.overview.png
:alt: Anomify overview


More eyes on metrics
--------------------

Leave Anomify running alongside your existing Skyline setup and see the results
for yourself. The intuitive dashboard and actionable alerts open monitoring up
to the wider team. After running Anomify for 3 months you will receive fewer
false positive alerts giving you more time to focus on other things. You’ll have
learnt much more about your metrics and you might even have developers
monitoring their own applications!

Why Anomify?
------------

We’ve been managing and developing Skyline for over 5 years, running on our own
production data, and we know just how hard it can be to deploy. So, this got us
thinking... how can we help more users and organisations unlock the power of
Skyline’s anomaly detection on their data? Anomify was created to do just that.

Many Skyline users miss out on key features because they are hard to configure.
We wanted to showcase how well Skyline performs at only identifying true
positive anomalies, when it is trained using semi-supervised learning, so we
built an intuitive UI and improved alerts so the whole team can get involved
with ops.

As devops culture matures and infrastructure moves into the cloud and requires
less ‘ops’ personnel it’s vital that we preserve and teach operations skills.
Anomify does just that, it allows the team to get curious about issues. You’ll
learn more about your metrics behaviours and deepen your understanding of the
systems they track.

Why is Anomify free?
--------------------

There is no such thing as a free lunch so we’re asking for constructive feedback
as payment for using the Anomify service. Many of the features we build are
added to Skyline for use by the open source community. You’ll be able to shape
our roadmap to best serve monitoring professionals.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def setup(app):

# General information about the project.
project = u'Skyline'
copyright = u'2013-2014, Etsy Inc; 2015, Abe Stanway; 2015-2020, Gary Wilson'
copyright = u'2013-2014, Etsy Inc; 2015, Abe Stanway; 2015-2021, Gary Wilson'
author = u'etsy, astanway, earthgecko'

# The version info for the project you're documenting, acts as replacement for
Expand Down
19 changes: 19 additions & 0 deletions docs/deprecated-docs/running-in-python-virtualenv-py2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,25 @@ This is using your **system** pip at this point only to install virtualenv.
pip install 'virtualenv==15.2.0'
openssl (optional)
~~~~~~~~~~~~~~~~~~

Depending on your OS if it is a bit older you may need to compile Python with a
newer version of openssl.

.. code-block:: bash
openssl_version="1.1.1g"
mkdir /opt/openssl
mkdir -p "/usr/local/openssl/openssl-${openssl_version}"
cd /opt/openssl
wget -q "https://www.openssl.org/source/openssl-${openssl_version}.tar.gz"
tar -zxvf "openssl-${openssl_version}.tar.gz"
cd "/opt/openssl/openssl-${openssl_version}"
/opt/openssl/openssl-${openssl_version}/config --prefix=/usr/local/openssl/openssl-${openssl_version} --openssldir=/usr/local/openssl/openssl-${openssl_version}
make
make install
Python version
~~~~~~~~~~~~~~

Expand Down
7 changes: 4 additions & 3 deletions docs/development/building-documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ Build
PYTHON_MAJOR_VERSION="3.8"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="skyline-py383"
PROJECT="skyline-py386"
CURRENT_BRANCH="SNAB"
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
source bin/activate
Expand Down Expand Up @@ -174,8 +175,8 @@ Build
# Usage: build_docs <app_dir>
# e.g.
# cd /opt/python_virtualenv/projects/skyline-py383/
# build_docs /home/gary/sandbox/of/github/earthgecko/skyline/ionosphere/skyline
# cd /opt/python_virtualenv/projects/skyline-py386/
# build_docs /home/gary/sandbox/of/github/earthgecko/skyline/$CURRENT_BRANCH/skyline
Auto generating .rst files
Expand Down

0 comments on commit d6a4763

Please sign in to comment.