Merge pull request #383 from earthgecko/SNAB

ANALYZER_CHECK_LAST_TIMESTAMP and metrics_manager - sync_cluster_files
earthgecko · Dec 14, 2020 · 35a1d10 · 35a1d10
2 parents 846af48 + 4dd5705
commit 35a1d10
Show file tree

Hide file tree

Showing 6 changed files with 1,215 additions and 37 deletions.
diff --git a/docs/running-multiple-skylines.rst b/docs/running-multiple-skylines.rst
@@ -13,7 +13,8 @@ required, e.g. Luminosity.
 Some organisations have multiple Graphite instances for sharding, geographic or
 latency reasons.  In such a case it is possible that each Graphite instance
 would pickle to a Skyline instance and for there to be multiple Skyline
-instances.  However...
+instances.  Or you might want to run mulitple Skyline instances due to the
+number of metrics being handled.  However...
 
 Running Skyline in a distributed, HA manner is mostly related to running the
 components of Skyline in this fashion, for example using Galera cluster to
@@ -27,7 +28,7 @@ which is beyond the scope of this documentation.
 However one word of caution, **do not cluster Redis**.  Although some sharded
 configuration may work, it is simpler just to use local Redis data.  Due to
 the metric time series constantly changing in Redis clustering or slaving
-results in most the entire Redis data store being shipped constantly, not
+results in mostly the entire Redis data store being shipped constantly, not
 desired.
 
 That said the actual Skyline modules have certain settings and configurations
@@ -37,9 +38,9 @@ these.
 
 The following settings pertain to running multiple Skyline instances:
 
-- :mod:`settings.ALTERNATIVE_SKYLINE_URLS` [required]
-- :mod:`settings.REMOTE_SKYLINE_INSTANCES` [required]
-- :mod:`settings.HORIZON_SHARDS` [optional]
+- :mod:`settings.REMOTE_SKYLINE_INSTANCES`
+- :mod:`settings.HORIZON_SHARDS`
+- :mod:`settings.SNYC_CLUSTER_FILES`
 
 With the introduction of Luminosity a requirement for Skyline to pull the time
 series data from remote Skyline instances was added to allow for cross
@@ -51,22 +52,25 @@ and ensure performance is maintained.
 
 Running Skyline in any form of clustered configuration requires that each
 Skyline instance know about the other instances and has access to them via the
-appropriate firewall or network rules and via the reverse proxy configuration
-(Apache or nginx).
+webapp therefore appropriate firewall, network rules and reverse proxy
+configuration (Apache or nginx) needs to allow this.
 
-:mod:`settings.ALTERNATIVE_SKYLINE_URLS` is a reequired list of alternative URLs
-for the other nodes in the Skyline cluster, so that if a request is made to the
-Skyline webapp for a resource it does not have, it can return the other URLs to
-the client.
+:mod:`settings.REMOTE_SKYLINE_INSTANCES` is a required list of alternative URLs,
+username, password and hostname for the other instances in the Skyline cluster
+so that if a request is made to the Skyline webapp for a resource it does not
+have, it can return the other URLs to the client.  This is also used in
+:mod:`settings.SNYC_CLUSTER_FILES` so that each instance in the cluster can
+sync relevant Ionosphere training data andd features profiles data to itself.
 
-:mod:`settings.REMOTE_SKYLINE_INSTANCES` is similar but is this is used by
-Skyline internally to request resources from other Skyline instances to:
+It is used by Skyline internally to request resources from other Skyline
+instances to:
 
 1. Retrieve time series data and general data for metrics served by the other
   Skyline instance/s.
 2. To retrieve resources for certain client and API requests to respond with
   all the data for the cluster, in terms of unique_metrics, alerting_metrics,
   etc.
+3. To sync Ionosphere data between the cluster instances.
 
 Read about :mod:`settings.HORIZON_SHARDS` see
 `HORIZON_SHARDS <horizon.html#HORIZON_SHARDS>`__ section on the Horizon page.
@@ -140,4 +144,4 @@ Webapp UI
 
 In terms of the functionality in webapp, the webapp is multiple instance aware.
 Where any "not in Redis" UI errors are found, webapp responds to the request
-with a 301 redirect to the alternate Skyline URL.
+with a 302 redirect to the remote Skyline instance that is assigned the metric.
diff --git a/skyline/analyzer/analyzer.py b/skyline/analyzer/analyzer.py
@@ -312,6 +312,12 @@
 except:
     CHECK_DATA_SPARSITY = True
 
+# @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+try:
+    ANALYZER_CHECK_LAST_TIMESTAMP = settings.ANALYZER_CHECK_LAST_TIMESTAMP
+except:
+    ANALYZER_CHECK_LAST_TIMESTAMP = False
+
 # @added 20190522 - Feature #2580: illuminance
 # Disabled for now as in concept phase.  This would work better if
 # the illuminance_datapoint was determined from the time series
@@ -494,6 +500,25 @@ def spin_process(self, i, unique_metrics):
             logger.info('nothing to do, no unique_metrics')
             return
 
+        # @added 20201214 - Feature #3892: global_anomalies
+        # TODO
+        # If an event occurs that cause a near global anomalous event, handle the swamp of work that will result on the systems.
+        # In terms of analysis, creating alert resources, sending alerts, Graphite requests, etc, etc
+        # Add a global_anomalies feature in Analyzer and a table in the DB.
+        # If analyzer encounters x anomalies in a run - start a global_anomalies event:
+        # Set a global anomaly key to put the system into a global anomaly state
+        # Do not send to Mirage
+        # Do not send alerts
+        # Do send to panorama
+        # Do send global_anomalies alert
+        # When analyzer encounters less than x anomalies per run for y runs:
+        # Set the global anomaly end_timestamp
+        # Remove the global anomaly key so the system reverts to normal
+        # global_anomalies can occur per shard in a cluster
+        global_anomaly_in_progress = False
+        if global_anomaly_in_progress:
+            logger.warn('warning :: a global anomaly event is in progress')
+
         # @added 20201007 - Feature #3774: SNAB_LOAD_TEST_ANALYZER
         # The number of metrics to load test Analyzer with, testing is only done
         # after the normal analysis run.
@@ -550,6 +575,11 @@ def spin_process(self, i, unique_metrics):
         high_priority_assigned_metrics = []
         low_priority_assigned_metrics = []
 
+        # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+        # all_stale_metrics is required in ANALYZER_ANALYZE_LOW_PRIORITY_METRICS and
+        # ANALYZER_CHECK_LAST_TIMESTAMP so set default
+        all_stale_metrics = []
+
         # @added 20201030 - Feature #3812: ANALYZER_ANALYZE_LOW_PRIORITY_METRICS
         determine_low_priority_metrics = False
 
@@ -1213,6 +1243,36 @@ def spin_process(self, i, unique_metrics):
                     logger.error('error :: failed to determine if custom_algorithms are to be run with %s' % (
                         skyline_app))
 
+        # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+        metrics_last_timestamp_dict = {}
+        metrics_without_new_data = []
+        metrics_with_new_data = []
+        metrics_added_to_last_timestamp_hash_key = []
+        if ANALYZER_CHECK_LAST_TIMESTAMP:
+            try:
+                metrics_last_timestamp_dict = self.redis_conn_decoded.hgetall(metrics_last_timestamp_hash_key)
+                if metrics_last_timestamp_dict:
+                    logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - got %s metrics and last analysed timestamps from %s Redis hash key' % (
+                        str(len(metrics_last_timestamp_dict)),
+                        metrics_last_timestamp_hash_key))
+                else:
+                    logger.warn('warning :: ANALYZER_CHECK_LAST_TIMESTAMP enabled but got no data from the %s Redis hash key' % (
+                        metrics_last_timestamp_hash_key))
+            except:
+                logger.error(traceback.format_exc())
+                logger.error('error :: failed to get Redis hash key %s' % (
+                    metrics_last_timestamp_hash_key))
+                metrics_last_timestamp_dict = {}
+            if not all_stale_metrics:
+                # If all_stale_metrics were not determined for ANALYZER_ANALYZE_LOW_PRIORITY_METRICS
+                # get them
+                try:
+                    all_stale_metrics = list(self.redis_conn_decoded.smembers('aet.analyzer.stale'))
+                except:
+                    logger.error(traceback.format_exc())
+                    logger.error('error :: failed to get Redis key aet.analyzer.stale for ANALYZER_CHECK_LAST_TIMESTAMP')
+                    all_stale_metrics = []
+
         # Distill timeseries strings into lists
         for i, metric_name in enumerate(assigned_metrics):
             self.check_if_parent_is_alive()
@@ -2222,6 +2282,29 @@ def spin_process(self, i, unique_metrics):
                 except:
                     pass
 
+            # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+            analyzer_check_last_last_timestamp_new_timestamp = True
+            if ANALYZER_CHECK_LAST_TIMESTAMP and metrics_last_timestamp_dict and last_timeseries_timestamp:
+                try:
+                    last_analyzed_timestamp = metrics_last_timestamp_dict[base_name]
+                except:
+                    last_analyzed_timestamp = None
+                if last_analyzed_timestamp:
+                    try:
+                        if int(last_analyzed_timestamp) == int(last_timeseries_timestamp):
+                            analyzer_check_last_last_timestamp_new_timestamp = False
+                            if base_name not in all_stale_metrics:
+                                if int(spin_start) - int(last_timeseries_timestamp) >= settings.STALE_PERIOD:
+                                    # Send it for analysis to be classifed as
+                                    # stale
+                                    analyzer_check_last_last_timestamp_new_timestamp = True
+                        if not analyzer_check_last_last_timestamp_new_timestamp:
+                            metrics_without_new_data.append(base_name)
+                        else:
+                            metrics_with_new_data.append(base_name)
+                    except:
+                        pass
+
             try:
 
                 # @added 20200425 - Feature #3508: ionosphere.untrainable_metrics
@@ -2283,9 +2366,18 @@ def spin_process(self, i, unique_metrics):
 
                 # @added 20201018 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
                 check_for_anomalous = True
+
+                # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+                if ANALYZER_CHECK_LAST_TIMESTAMP:
+                    if not analyzer_check_last_last_timestamp_new_timestamp:
+                        check_for_anomalous = False
+
+                # @added 20201018 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
                 if ANALYZER_MAD_LOW_PRIORITY_METRICS and ANALYZER_ANALYZE_LOW_PRIORITY_METRICS:
                     try:
-                        if low_priority_metric:
+                        # @modified 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+                        # if low_priority_metric:
+                        if low_priority_metric and check_for_anomalous:
                             mad_data = [item[1] for item in timeseries[-ANALYZER_MAD_LOW_PRIORITY_METRICS:]]
                             # @added 20201020 - Feature #3810: ANALYZER_MAD_LOW_PRIORITY_METRICS
                             # Handle very sparsely populated metrics with only a
@@ -2339,6 +2431,16 @@ def spin_process(self, i, unique_metrics):
                 if LOCAL_DEBUG:
                     logger.debug('debug :: metric %s - anomalous - %s' % (str(metric_name), str(anomalous)))
 
+                # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+                if ANALYZER_CHECK_LAST_TIMESTAMP and check_for_anomalous:
+                    try:
+                        self.redis_conn.hset(
+                            metrics_last_timestamp_hash_key, base_name,
+                            int_metric_timestamp)
+                        metrics_added_to_last_timestamp_hash_key.append(base_name)
+                    except:
+                        pass
+
                 # @added 20200608 - Feature #3566: custom_algorithms
                 # @modified 20201127 - Feature #3848: custom_algorithms - run_before_3sigma parameter
                 # Only run DEBUG_CUSTOM_ALGORITHMS if there is a custom
@@ -3309,6 +3411,18 @@ def spin_process(self, i, unique_metrics):
 
         del low_priority_assigned_metrics
 
+        # @added 20201212 - Feature #3884: ANALYZER_CHECK_LAST_TIMESTAMP
+        if ANALYZER_CHECK_LAST_TIMESTAMP:
+            try:
+                logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - there were %s metrics with no new data that were not analysed' % str(len(metrics_without_new_data)))
+                logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - there were %s metrics with new data that were analysed' % str(len(metrics_with_new_data)))
+                logger.info('ANALYZER_CHECK_LAST_TIMESTAMP - %s metrics last analysed timestamps where updated in %s Redis hash key' % (
+                    str(len(metrics_added_to_last_timestamp_hash_key)),
+                    metrics_last_timestamp_hash_key))
+            except:
+                logger.error(traceback.format_exc())
+                logger.error('error :: ANALYZER_CHECK_LAST_TIMESTAMP log error')
+
         # @added 20201111 - Feature #3480: batch_processing
         #                   Bug #2050: analyse_derivatives - change in monotonicity
         if not manage_derivative_metrics: