Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Panorama stampede on restart after not running #26

Closed
earthgecko opened this issue Sep 7, 2016 · 1 comment
Closed

Handle Panorama stampede on restart after not running #26

earthgecko opened this issue Sep 7, 2016 · 1 comment

Comments

@earthgecko
Copy link
Owner

Although Panorama is set up to record anomalies with the actual data in the
check file, similar to a queue, if Panorama happens to stall and is restarted
later the Panorama anomaly submissions are somewhat skewed and as it populates
the DB, it will probably fire any Skyline Analyzer or Mirage alerts that are on
any mysql metrics namespaces :)

Although it is creating a Panorama key, the keys will not match the event if
these are old check files, but realistically we still want to record the
anomalies in the DB with the same logical that is applied to current data, in
terms of setting keys or other similar method that will achieve the desired goal.

However that is not a trivial as it may seem I do not think.

Current workaround

Move the check files or let them be processed.

If they are processed, they will skew other metrics, even Skyline metrics and
possibly any related MySQL metrics too.

# Move the check files
PANORAMA_DIR="<YOUR_SKYLINE_PANORAMA_DIR>"  # e.g. PANORAMA_DIR="/opt/skyline/panaroma"

# STOP your Panorama if it is running

mkdir -p "${PANORAMA_DIR}/manually_skipped"
for i in $(find "${PANORAMA_DIR}/check" -type f -name "*.txt")
do
  /bin/mv -f "$i" "${PANORAMA_DIR}/manually_skipped/"
done

# START your Panorama, sorry some anomalies are not recorded
earthgecko added a commit that referenced this issue Sep 21, 2016
- Mirage changes include a changed to panorama style skyline_functions
  load_metric_vars and fail_check
- Handle Panorama stampede on restart after not running #26
  Added to settings and Panorama to allow to discard any checks older than
  PANORAMA_CHECK_MAX_AGE to prevent a stampede if desired, not ideal but solves
  the stampede problem for now - #26
- Added the original Skyline UI back as a then tab, for nostalgic and historical
  reasons.
- Bumped to version 1.0.8

Added:
skyline/webapp/static/css/skyline.css
skyline/webapp/templates/then.html

Modified:
docs/panorama.rst
skyline/mirage/mirage.py
skyline/panorama/panorama.py
skyline/settings.py
skyline/webapp/webapp.py
skyline/webapp/templates/layout.html
skyline/tests/test_imports.py
skyline/skyline_version.py
earthgecko added a commit that referenced this issue Sep 21, 2016
- Mirage changes include a changed to panorama style skyline_functions
  load_metric_vars and fail_check
- Handle Panorama stampede on restart after not running #26
  Added to settings and Panorama to allow to discard any checks older than
  PANORAMA_CHECK_MAX_AGE to prevent a stampede if desired, not ideal but solves
  the stampede problem for now - #26
- Added the original Skyline UI back as a then tab, for nostalgic and historical
  reasons.
- Bumped to version 1.0.8

Added:
skyline/webapp/static/css/skyline.css
skyline/webapp/templates/then.html

Modified:
docs/panorama.rst
skyline/mirage/mirage.py
skyline/panorama/panorama.py
skyline/settings.py
skyline/webapp/webapp.py
skyline/webapp/templates/layout.html
skyline/tests/test_imports.py
skyline/skyline_version.py
@earthgecko
Copy link
Owner Author

An additional variable was added to settings.py in v1.0.8 of PANORAMA_CHECK_MAX_AGE which allows for the discarding of any checks older than this value. This stopped the stampeding of MySQL. Although not ideal as anomalies are not recorded, however matching expiration times, etc on old metric checks and replaying the metric checks in the same fashion as realtime is non-trivial, so for now this will do.

PANORAMA_CHECK_MAX_AGE = 300
"""
:var PANORAMA_CHECK_MAX_AGE: Panorama will only process a check file if it is
    not older than PANORAMA_CHECK_MAX_AGE seconds.  If it is set to 0 it does
    all.  This setting just ensures if Panorama stalls for some hours and is
    restarted, the user can choose to discard older checks and miss anomalies
    being recorded if they so choose to, to prevent Panorama stampeding against
    MySQL if something went down and Panorama comes back online with lots of
    checks.
:vartype PANORAMA_CHECK_MAX_AGE: int
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant