Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphite queue grows to ~10k or more after director deployment #8586

Open
marcelfischer opened this issue Jan 6, 2021 · 8 comments
Open
Assignees
Labels
area/graphite Metrics to Graphite bug Something isn't working needs feedback We'll only proceed once we hear from you again

Comments

@marcelfischer
Copy link

Describe the bug

When we do a deployment in director, the queue for graphite in icinga2 goes to ~10k or even more. The queue itself goes back to 0 pretty fast but after that we typically see a lost of 10 to 15 minutes of performance data.
I wonder if this graphite queue does make sense anyway. Im not an graphite expert but as far as I understood, graphite or whisper automatically sets missing values to null. So if I expect a value every 2 minutes and icinga is currently reloading and queues some graphite data, this data wont fit into the whisper file anymore because its already too "late".
Two icinga masters with graphite in HA mode.

To Reproduce

Deploy Config with director, but I guess reloading icinga will lead to the same situation.

Expected behavior

I dont want to loose 10-15 Minutes of performance data.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): 2.11.8
  • Operating System and version: RHEL 7.9
  • Enabled features (icinga2 feature list): api checker graphite ido-mysql mainlog notification
  • Icinga Web 2 version and modules (System - About): 2.8.2
@Al2Klimov
Copy link
Member

Tasks

  • Reproduce
  • If successful, keep VM for now

@yhabteab
Copy link
Member

Hi @marcelfischer, Thanks for the report. Can you also share some details here, how to get it reproduced, so, for example how often you deploy configs and if, you also have configured sync rules or only deploy single configs. I can't reproduce it unfortunately. Thanks!

@yhabteab yhabteab added the needs feedback We'll only proceed once we hear from you again label Jan 15, 2021
@marcelfischer
Copy link
Author

marcelfischer commented Jan 18, 2021

Currently we deploy once a day automatically via director. Some sync rules running before. Sometimes we have many changes, sometimes only a few custom variables for a bunch of hosts. But the issue is kind of independend of the amount of changes we activate. Every time we see the graphite queue and also the mysql queue having a huge spike. I think its also related to #5465

here you see a log output from our last deployment at 01:48

config master:

[2021-01-18 01:48:31 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1, rate: 1136.57/s (68194/min 341687/5min 1025428/15min);
[2021-01-18 01:50:01 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1, rate: 1129.5/s (67770/min 341365/5min 1025042/15min);
[2021-01-18 01:50:21 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 2, rate: 1125.17/s (67510/min 341269/5min 1025015/15min);
[2021-01-18 01:57:41 +0100] information/GraphiteWriter: 'graphite' resumed.
[2021-01-18 02:00:07 +0100] information/GraphiteWriter: Finished reconnecting to Graphite in 0.0180399 second(s).
[2021-01-18 02:00:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 9, rate: 584.233/s (35054/min 40355/5min 40355/15min);
[2021-01-18 02:00:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 5, rate: 808.05/s (48483/min 54122/5min 54122/15min);
[2021-01-18 02:00:59 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 0, rate: 999.15/s (59949/min 66493/5min 66493/15min);
[2021-01-18 02:01:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 5, rate: 620.633/s (37238/min 92020/5min 92020/15min);
[2021-01-18 02:02:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1536, rate: 1055.42/s (63325/min 148920/5min 148920/15min);
[2021-01-18 02:02:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1312, rate: 1186.7/s (71202/min 163634/5min 163634/15min);
[2021-01-18 02:02:49 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1040, rate: 1336.83/s (80210/min 176427/5min 176744/15min);
[2021-01-18 02:03:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 583, rate: 1346.05/s (80763/min 228483/5min 230831/15min);
[2021-01-18 02:03:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 444, rate: 1322.25/s (79335/min 240860/5min 243631/15min);
[2021-01-18 02:03:49 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 278, rate: 1322.98/s (79379/min 253818/5min 257052/15min);
[2021-01-18 02:04:19 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1, rate: 1290.35/s (77421/min 290581/5min 295410/15min);
[2021-01-18 02:04:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 2, rate: 1278.28/s (76697/min 303043/5min 308344/15min);
[2021-01-18 02:05:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 8, rate: 920.733/s (55244/min 323241/5min 364425/15min);
[2021-01-18 02:05:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 5, rate: 896.933/s (53816/min 319350/5min 374132/15min);
[2021-01-18 02:06:09 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 0, rate: 811.183/s (48671/min 315985/5min 385549/15min);
[2021-01-18 02:06:19 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1736, rate: 846.883/s (50813/min 324706/5min 402625/15min);
[2021-01-18 02:06:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 5281, rate: 840.083/s (50405/min 329986/5min 415580/15min);
[2021-01-18 02:06:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 5598, rate: 898.383/s (53903/min 335976/5min 428408/15min);
[2021-01-18 02:06:49 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1154, rate: 1075.93/s (64556/min 344596/5min 441130/15min);
[2021-01-18 02:09:59 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 4, rate: 802.6/s (48156/min 233422/5min 558289/15min);
[2021-01-18 02:10:19 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 3, rate: 1196.5/s (71790/min 235629/5min 587441/15min);
[2021-01-18 02:10:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 665, rate: 1390.25/s (83415/min 239372/5min 604548/15min);
[2021-01-18 02:10:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 4041, rate: 1460.75/s (87645/min 244225/5min 618730/15min);
[2021-01-18 02:10:49 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 4118, rate: 1515.53/s (90932/min 256522/5min 633096/15min);
[2021-01-18 02:10:59 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 4193, rate: 1447.02/s (86821/min 266859/5min 646220/15min);
[2021-01-18 02:11:09 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 4256, rate: 1339.03/s (80342/min 272481/5min 658671/15min);
[2021-01-18 02:11:19 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 2515, rate: 1417.35/s (85041/min 269676/5min 673049/15min);
[2021-01-18 02:11:29 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 3518, rate: 1346.23/s (80774/min 269862/5min 686347/15min);
[2021-01-18 02:11:39 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 1579, rate: 1315.07/s (78904/min 269053/5min 698388/15min);

second master:

[2021-01-18 01:53:21 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2021-01-18 01:56:45 +0100] information/GraphiteWriter: 'graphite' resumed.
[2021-01-18 01:56:45 +0100] information/GraphiteWriter: Finished reconnecting to Graphite in 0.001858 second(s).
[2021-01-18 01:56:51 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 7949, rate: 104.95/s (6297/min 6297/5min 6297/15min); empty in 10 seconds
[2021-01-18 01:57:01 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 17459, rate: 257.333/s (15440/min 15440/5min 15440/15min); empty in 18 seconds
[2021-01-18 01:57:11 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 19407, rate: 408.9/s (24534/min 24534/5min 24534/15min); empty in 1 minute and 39 seconds
[2021-01-18 01:57:21 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 15866, rate: 557.833/s (33470/min 33470/5min 33470/15min); empty in less than 1 millisecond
[2021-01-18 01:57:31 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 3244, rate: 768.2/s (46092/min 46092/5min 46092/15min);
[2021-01-18 01:57:34 +0100] information/GraphiteWriter: 'graphite' paused.
[2021-01-18 01:58:21 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 0, rate: 262.717/s (15763/min 49337/5min 49337/15min);
[2021-01-18 02:03:31 +0100] information/WorkQueue: #7 (GraphiteWriter, graphite) items: 0, rate: 0/s (0/min 0/5min 49337/15min);

@yhabteab yhabteab removed the needs feedback We'll only proceed once we hear from you again label Jan 27, 2021
@Al2Klimov
Copy link
Member

@N-o-X Aren't you working on this at the moment?

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Aug 9, 2021
@Al2Klimov Al2Klimov added the area/graphite Metrics to Graphite label Oct 19, 2021
@Al2Klimov
Copy link
Member

PING @N-o-X

@N-o-X
Copy link
Contributor

N-o-X commented Nov 3, 2021

Nope, I didn't work on this or any related issues, sorry.

@N-o-X N-o-X removed the needs feedback We'll only proceed once we hear from you again label Nov 3, 2021
@N-o-X N-o-X removed their assignment Nov 3, 2021
@Al2Klimov Al2Klimov added the bug Something isn't working label Feb 15, 2022
@Al2Klimov
Copy link
Member

@yhabteab How large was your reproducing setup? How much did it take to load the config?

@Al2Klimov
Copy link
Member

Two icinga masters with graphite in HA mode.

@marcelfischer Please share your config of those Icinga graphite features, also try v2.13.3+.

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/graphite Metrics to Graphite bug Something isn't working needs feedback We'll only proceed once we hear from you again
Projects
None yet
Development

No branches or pull requests

4 participants