Skip to content

Commit

Permalink
Merge pull request #383 from earthgecko/SNAB
Browse files Browse the repository at this point in the history
ANALYZER_CHECK_LAST_TIMESTAMP and metrics_manager - sync_cluster_files
  • Loading branch information
earthgecko committed Dec 14, 2020
2 parents 846af48 + 4dd5705 commit 35a1d10
Show file tree
Hide file tree
Showing 6 changed files with 1,215 additions and 37 deletions.
32 changes: 18 additions & 14 deletions docs/running-multiple-skylines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ required, e.g. Luminosity.
Some organisations have multiple Graphite instances for sharding, geographic or
latency reasons. In such a case it is possible that each Graphite instance
would pickle to a Skyline instance and for there to be multiple Skyline
instances. However...
instances. Or you might want to run mulitple Skyline instances due to the
number of metrics being handled. However...

Running Skyline in a distributed, HA manner is mostly related to running the
components of Skyline in this fashion, for example using Galera cluster to
Expand All @@ -27,7 +28,7 @@ which is beyond the scope of this documentation.
However one word of caution, **do not cluster Redis**. Although some sharded
configuration may work, it is simpler just to use local Redis data. Due to
the metric time series constantly changing in Redis clustering or slaving
results in most the entire Redis data store being shipped constantly, not
results in mostly the entire Redis data store being shipped constantly, not
desired.

That said the actual Skyline modules have certain settings and configurations
Expand All @@ -37,9 +38,9 @@ these.

The following settings pertain to running multiple Skyline instances:

- :mod:`settings.ALTERNATIVE_SKYLINE_URLS` [required]
- :mod:`settings.REMOTE_SKYLINE_INSTANCES` [required]
- :mod:`settings.HORIZON_SHARDS` [optional]
- :mod:`settings.REMOTE_SKYLINE_INSTANCES`
- :mod:`settings.HORIZON_SHARDS`
- :mod:`settings.SNYC_CLUSTER_FILES`

With the introduction of Luminosity a requirement for Skyline to pull the time
series data from remote Skyline instances was added to allow for cross
Expand All @@ -51,22 +52,25 @@ and ensure performance is maintained.

Running Skyline in any form of clustered configuration requires that each
Skyline instance know about the other instances and has access to them via the
appropriate firewall or network rules and via the reverse proxy configuration
(Apache or nginx).
webapp therefore appropriate firewall, network rules and reverse proxy
configuration (Apache or nginx) needs to allow this.

:mod:`settings.ALTERNATIVE_SKYLINE_URLS` is a reequired list of alternative URLs
for the other nodes in the Skyline cluster, so that if a request is made to the
Skyline webapp for a resource it does not have, it can return the other URLs to
the client.
:mod:`settings.REMOTE_SKYLINE_INSTANCES` is a required list of alternative URLs,
username, password and hostname for the other instances in the Skyline cluster
so that if a request is made to the Skyline webapp for a resource it does not
have, it can return the other URLs to the client. This is also used in
:mod:`settings.SNYC_CLUSTER_FILES` so that each instance in the cluster can
sync relevant Ionosphere training data andd features profiles data to itself.

:mod:`settings.REMOTE_SKYLINE_INSTANCES` is similar but is this is used by
Skyline internally to request resources from other Skyline instances to:
It is used by Skyline internally to request resources from other Skyline
instances to:

1. Retrieve time series data and general data for metrics served by the other
Skyline instance/s.
2. To retrieve resources for certain client and API requests to respond with
all the data for the cluster, in terms of unique_metrics, alerting_metrics,
etc.
3. To sync Ionosphere data between the cluster instances.

Read about :mod:`settings.HORIZON_SHARDS` see
`HORIZON_SHARDS <horizon.html#HORIZON_SHARDS>`__ section on the Horizon page.
Expand Down Expand Up @@ -140,4 +144,4 @@ Webapp UI

In terms of the functionality in webapp, the webapp is multiple instance aware.
Where any "not in Redis" UI errors are found, webapp responds to the request
with a 301 redirect to the alternate Skyline URL.
with a 302 redirect to the remote Skyline instance that is assigned the metric.
116 changes: 115 additions & 1 deletion skyline/analyzer/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,12 @@
except:
CHECK_DATA_SPARSITY = True

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
try:
ANALYZER_CHECK_LAST_TIMESTAMP = settings.ANALYZER_CHECK_LAST_TIMESTAMP
except:
ANALYZER_CHECK_LAST_TIMESTAMP = False

# @added 20190522 - Feature #2580: illuminance
# Disabled for now as in concept phase. This would work better if
# the illuminance_datapoint was determined from the time series
Expand Down Expand Up @@ -494,6 +500,25 @@ def spin_process(self, i, unique_metrics):
logger.info('nothing to do, no unique_metrics')
return

# @added 20201214 - Feature #3892: global_anomalies
# TODO
# If an event occurs that cause a near global anomalous event, handle the swamp of work that will result on the systems.
# In terms of analysis, creating alert resources, sending alerts, Graphite requests, etc, etc
# Add a global_anomalies feature in Analyzer and a table in the DB.
# If analyzer encounters x anomalies in a run - start a global_anomalies event:
# Set a global anomaly key to put the system into a global anomaly state
# Do not send to Mirage
# Do not send alerts
# Do send to panorama
# Do send global_anomalies alert
# When analyzer encounters less than x anomalies per run for y runs:
# Set the global anomaly end_timestamp
# Remove the global anomaly key so the system reverts to normal
# global_anomalies can occur per shard in a cluster
global_anomaly_in_progress = False
if global_anomaly_in_progress:
logger.warn('warning :: a global anomaly event is in progress')

# @added 20201007 - Feature #3774: SNAB_LOAD_TEST_ANALYZER
# The number of metrics to load test Analyzer with, testing is only done
# after the normal analysis run.
Expand Down Expand Up @@ -550,6 +575,11 @@ def spin_process(self, i, unique_metrics):
high_priority_assigned_metrics = []
low_priority_assigned_metrics = []

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
# all_stale_metrics is required in ANALYZER_ANALYZE_LOW_PRIORITY_METRICS and
# ANALYZER_CHECK_LAST_TIMESTAMP so set default
all_stale_metrics = []

# @added 20201030 - Feature #3812: ANALYZER_ANALYZE_LOW_PRIORITY_METRICS
determine_low_priority_metrics = False

Expand Down Expand Up @@ -1213,6 +1243,36 @@ def spin_process(self, i, unique_metrics):
logger.error('error :: failed to determine if custom_algorithms are to be run with %s' % (
skyline_app))

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
metrics_last_timestamp_dict = {}
metrics_without_new_data = []
metrics_with_new_data = []
metrics_added_to_last_timestamp_hash_key = []
if ANALYZER_CHECK_LAST_TIMESTAMP:
try:
metrics_last_timestamp_dict = self.redis_conn_decoded.hgetall(metrics_last_timestamp_hash_key)
if metrics_last_timestamp_dict:
logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - got %s metrics and last analysed timestamps from %s Redis hash key' % (
str(len(metrics_last_timestamp_dict)),
metrics_last_timestamp_hash_key))
else:
logger.warn('warning :: ANALYZER_CHECK_LAST_TIMESTAMP enabled but got no data from the %s Redis hash key' % (
metrics_last_timestamp_hash_key))
except:
logger.error(traceback.format_exc())
logger.error('error :: failed to get Redis hash key %s' % (
metrics_last_timestamp_hash_key))
metrics_last_timestamp_dict = {}
if not all_stale_metrics:
# If all_stale_metrics were not determined for ANALYZER_ANALYZE_LOW_PRIORITY_METRICS
# get them
try:
all_stale_metrics = list(self.redis_conn_decoded.smembers('aet.analyzer.stale'))
except:
logger.error(traceback.format_exc())
logger.error('error :: failed to get Redis key aet.analyzer.stale for ANALYZER_CHECK_LAST_TIMESTAMP')
all_stale_metrics = []

# Distill timeseries strings into lists
for i, metric_name in enumerate(assigned_metrics):
self.check_if_parent_is_alive()
Expand Down Expand Up @@ -2222,6 +2282,29 @@ def spin_process(self, i, unique_metrics):
except:
pass

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
analyzer_check_last_last_timestamp_new_timestamp = True
if ANALYZER_CHECK_LAST_TIMESTAMP and metrics_last_timestamp_dict and last_timeseries_timestamp:
try:
last_analyzed_timestamp = metrics_last_timestamp_dict[base_name]
except:
last_analyzed_timestamp = None
if last_analyzed_timestamp:
try:
if int(last_analyzed_timestamp) == int(last_timeseries_timestamp):
analyzer_check_last_last_timestamp_new_timestamp = False
if base_name not in all_stale_metrics:
if int(spin_start) - int(last_timeseries_timestamp) >= settings.STALE_PERIOD:
# Send it for analysis to be classifed as
# stale
analyzer_check_last_last_timestamp_new_timestamp = True
if not analyzer_check_last_last_timestamp_new_timestamp:
metrics_without_new_data.append(base_name)
else:
metrics_with_new_data.append(base_name)
except:
pass

try:

# @added 20200425 - Feature #3508: ionosphere.untrainable_metrics
Expand Down Expand Up @@ -2283,9 +2366,18 @@ def spin_process(self, i, unique_metrics):

# @added 20201018 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
check_for_anomalous = True

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
if ANALYZER_CHECK_LAST_TIMESTAMP:
if not analyzer_check_last_last_timestamp_new_timestamp:
check_for_anomalous = False

# @added 20201018 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
if ANALYZER_MAD_LOW_PRIORITY_METRICS and ANALYZER_ANALYZE_LOW_PRIORITY_METRICS:
try:
if low_priority_metric:
# @modified 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
# if low_priority_metric:
if low_priority_metric and check_for_anomalous:
mad_data = [item[1] for item in timeseries[-ANALYZER_MAD_LOW_PRIORITY_METRICS:]]
# @added 20201020 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
# Handle very sparsely populated metrics with only a
Expand Down Expand Up @@ -2339,6 +2431,16 @@ def spin_process(self, i, unique_metrics):
if LOCAL_DEBUG:
logger.debug('debug :: metric %s - anomalous - %s' % (str(metric_name), str(anomalous)))

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
if ANALYZER_CHECK_LAST_TIMESTAMP and check_for_anomalous:
try:
self.redis_conn.hset(
metrics_last_timestamp_hash_key, base_name,
int_metric_timestamp)
metrics_added_to_last_timestamp_hash_key.append(base_name)
except:
pass

# @added 20200608 - Feature #3566: custom_algorithms
# @modified 20201127 - Feature #3848: custom_algorithms - run_before_3sigma parameter
# Only run DEBUG_CUSTOM_ALGORITHMS if there is a custom
Expand Down Expand Up @@ -3309,6 +3411,18 @@ def spin_process(self, i, unique_metrics):

del low_priority_assigned_metrics

# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
if ANALYZER_CHECK_LAST_TIMESTAMP:
try:
logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - there were %s metrics with no new data that were not analysed' % str(len(metrics_without_new_data)))
logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - there were %s metrics with new data that were analysed' % str(len(metrics_with_new_data)))
logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - %s metrics last analysed timestamps where updated in %s Redis hash key' % (
str(len(metrics_added_to_last_timestamp_hash_key)),
metrics_last_timestamp_hash_key))
except:
logger.error(traceback.format_exc())
logger.error('error :: ANALYZER_CHECK_LAST_TIMESTAMP log error')

# @added 20201111 - Feature #3480: batch_processing
# Bug #2050: analyse_derivatives - change in monotonicity
if not manage_derivative_metrics:
Expand Down

0 comments on commit 35a1d10

Please sign in to comment.