Alert plugins can prevent alert conditions from firing #3663

avongluck-r1soft · 2017-03-27T17:54:34Z

We see an issue around the OpsGenie plugin for Graylog preventing all alert conditions from firing in 2.2.2 (with zero error messages).

After uninstalling the OpsGenie plugin and restarting Graylog the issue went away and alert conditions in streams started working. We've had an issue open to OpsGenie for a week now on it.

avongluck-r1soft · 2017-03-27T17:58:09Z

Testing:

Erase all conditions on stream.
Configure basic condition around message counts.
Alert to email and Slack to test
Manually send alert to trigger.
No alerts triggered.
Restart graylog
No alerts triggered.
Uninstall OpsGenie plugin
Restart graylog
Alerts triggered.

While it isn't Graylog's fault since this is a 3rd party plugin... the ability for a 3rd party plugin to potentially break all of the alerting is a serious issue as "Backup" notification methods will silently fail.

joschi · 2017-03-28T12:31:01Z

@avongluck-r1soft Are there any alerting or OpsGenie related exceptions in the logs of your Graylog node(s)?

avongluck-r1soft · 2017-03-31T22:21:10Z

So I tried to narrow the issue down.. but the behavior seems erratic. When does an alert "resolve"? I noticed that editing the conditions and unchecking the "Repeat notifications(optional)" box seems to "resolve" the alert. However the alert refuses to trigger any additional notifications after this point.

avongluck-r1soft · 2017-03-31T22:30:06Z

(I can re-check repeat notifications and still can't trigger any notifications 30 minutes since the last successful one)

avongluck-r1soft · 2017-03-31T22:45:52Z

If I restart graylog-server (leaving repeat notifications checked) I get a flood of them after startup.. then a alert is open and doesn't seem to ever resolve.

Here is the alert log:

2017-03-31 17:31:29.108
Graylog sent Full filesystem (Slack alarm callback) notification
2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:44:48.391
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:44:48.392
Condition is still satisfied, alert is unresolved

So 12 minutes later it is still considered open even though the threshold is 5 minutes.

avongluck-r1soft · 2017-03-31T22:46:48Z

Full filesystem (Message Count Alert Condition)
Alerting on stream Full filesystems
Configuration: Alert is triggered when there are more than 0 messages in the last 5 minutes. Grace period: 0 minutes. Including last 5 messages in alert notification. Configured to repeat notifications.

avongluck-r1soft · 2017-03-31T22:51:56Z

20 minutes later, still unresolved.
so:

Alert is unresolved
I toggle the Repeat notifications(optional) box in the conditions to test.
Alert shows as resolved, however internally it "isn't"?
No additional notifications will trigger
Restart graylog-server
Burst of alerts. Rinse and repeat.

avongluck-r1soft · 2017-03-31T22:58:32Z

Maybe the OpsGenie notification is at fault somehow and "holds" the alerts open?

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:57:33.698
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:57:33.698
Condition is still satisfied, alert is unresolved

Attaching a server.log which doesn't tell us much.

server.log.txt

avongluck-r1soft · 2017-03-31T23:15:07Z

Ok. Here is some solid documented oddness:

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 18:12:01.520
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 18:12:01.520
Condition is still satisfied, alert is unresolved

Notice as of 18:12 "no notifications since 17:31".
However "messages evaluated" shows:

2017-03-31 17:32:37.081	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:34:48.928	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:42:12.168	XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:02:39.432	XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:11:07.225	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST

So:

Repeat notifications checked
Alert open
New messages evaluated
No new events since 17:31

avongluck-r1soft · 2017-03-31T23:21:45Z

^^^ all of that testing was with the OpsGenie plugin installed.

I uninstalled the OpsGenie plugin again and removed the notification, now events seem to always happen for open alerts as expected. No errors seen in server.log up through now.

avongluck-r1soft · 2017-04-03T15:47:19Z

I painfully realized over the weekend at 3am that "repeat notifiations" means a notification every minute while an alert is open, even if only one condition was matched. With that understanding the comment above still proves to be out-of-line with intended behaviour.

edmundoa · 2017-04-04T10:00:15Z

Hi @avongluck-r1soft,

Sorry to read that 😞. We have an issue with grace period and repeat notifications (#3579), and it will be fixed in an upcoming patch release.

avongluck-r1soft · 2017-04-17T14:54:18Z

Ok. I think this issue is related to odd alert behavior. The alerts never seem to resolve.
Screenshots taken at 9:48am the same day:

Alert begin 1 hour 12 minutes ago:

Alert unresolved still after 1 hour 12 minutes:

Only one message evaluated 1 hour 12 minutes ago.

So at this point in time:

The grace period is expired (60 minutes)
The "0 messages in 5 minutes" is expired.

Anyone have any idea what's going on here? When should alerts consider themselves closed? I re-enabled the OpsGenie plugin and set it up after upgrading to 2.2.3. The OpsGenie guys said they couldn't reproduce the issue, however.

avongluck-r1soft · 2017-04-17T14:56:06Z

The first screenshot shows: "Stream had 1 messages in the last 5 minutes" (which shouldn't be true any longer) Maybe "more than 0 messages" is incorrectly configured? Normally "more than 0" means 1 or more... maybe the logic is >= 0 vs > 0 ?

avongluck-r1soft · 2017-04-17T15:07:04Z

Nah. https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/alerts/types/MessageCountAlertCondition.java#L184 Logic seems fine there (MORE is >)

dennisoelkers · 2017-05-15T09:39:42Z

@avongluck-r1soft: Is this still an issue with the most recent Graylog version (2.2.3)?

avongluck-r1soft · 2017-05-16T15:15:07Z

Yeah. Here's an awesome screenshot showing some weird alert state:

avongluck-r1soft · 2017-05-16T15:15:47Z

Given the configuration, i'd expect the alert to close after 2 minutes. However its been open for 15 with no signs of closing.

avongluck-r1soft · 2017-05-16T15:44:26Z

Ah. ok. I got some great screenshots showing the problem:

Without OpsGenie plugin installed: (working as expected. Closes after 2 minutes)

With OpsGenie plugin installed and configured: (not working. Alert gets stuck open, no OpsGenie notification (others are listed as triggered, but never seem to fire?)

joschi · 2017-05-16T15:58:30Z

@avongluck-r1soft Anything interesting in the logs of your Graylog nodes at that time?

avongluck-r1soft · 2017-05-16T17:54:15Z

Nothing at all interesting in the logs (quiet)
I'm not crazy, right? That second screenshot shows an alert in a state it shouldn't be in.

2017-05-16T10:33:12.031-05:00 INFO  [CmdLineTool] Loaded plugin: OpsGenieAlarmCallback 1.0.0 [com.opsgenie.plugin.graylog.OpsGenieAlarmCallbackPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Elastic Beats Input 2.2.3 [org.graylog.plugins.beats.BeatsInputPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Collector 2.2.3 [org.graylog.plugins.collector.CollectorPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.2.3 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.2.3 [org.graylog.plugins.map.MapWidgetPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.2.3 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Anonymous Usage Statistics 2.2.3 [org.graylog.plugins.usagestatistics.UsageStatsPlugin]
2017-05-16T10:33:12.041-05:00 INFO  [CmdLineTool] Loaded plugin: Slack 2.4.0 [org.graylog2.plugins.slack.callback.SlackAlarmCallback]
2017-05-16T10:33:12.194-05:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms10g -Xmx10g -XX:NewRatio=1 -XX:PermSize=128m -XX:MaxPermSize=256m -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2017-05-16T10:33:12.324-05:00 INFO  [Version] HV000001: Hibernate Validator null
2017-05-16T10:33:13.789-05:00 INFO  [InputBufferImpl] Message journal is enabled.
2017-05-16T10:33:13.804-05:00 INFO  [NodeId] Node ID: XXX-c572-4717-b7e8-f7cada1266c6
2017-05-16T10:33:13.933-05:00 INFO  [LogManager] Loading logs.
2017-05-16T10:33:13.969-05:00 INFO  [LogManager] Logs loading complete.
2017-05-16T10:33:13.969-05:00 INFO  [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2017-05-16T10:33:14.083-05:00 INFO  [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 1 parallel message handlers.
2017-05-16T10:33:14.102-05:00 INFO  [cluster] Cluster created with settings {hosts=[10.XXX.XXX.XXX:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2017-05-16T10:33:14.143-05:00 INFO  [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=10.XXX.XXX.XXX:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2017-05-16T10:33:14.162-05:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:157}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.164-05:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.XXX.XXX.XXX:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 12]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=435048}
2017-05-16T10:33:14.173-05:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:158}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] version[2.4.4], pid[12887], build[fcbb46d/2017-01-03T11:33:16Z]
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initializing ...
2017-05-16T10:33:14.404-05:00 INFO  [plugins] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] modules [], plugins [graylog-monitor], sites []
2017-05-16T10:33:15.359-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initialized
2017-05-16T10:33:15.399-05:00 INFO  [ProcessBuffer] Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:16.497-05:00 INFO  [RulesEngineProvider] No static rules file loaded.
2017-05-16T10:33:16.805-05:00 INFO  [OutputBuffer] Initialized OutputBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:17.544-05:00 INFO  [ServerBootstrap] Graylog server 2.2.3+7adc951 starting up
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] JRE: Oracle Corporation 1.8.0_121 on Linux 3.10.0-327.36.3.el7.x86_64
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Deployment: rpm
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] OS: CentOS Linux 7 (Core) (centos)
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Arch: amd64
2017-05-16T10:33:17.556-05:00 WARN  [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2017-05-16T10:33:17.567-05:00 INFO  [PeriodicalsService] Starting 26 periodicals ...
2017-05-16T10:33:17.568-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.569-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] starting ...
2017-05-16T10:33:17.572-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
2017-05-16T10:33:17.575-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.577-05:00 INFO  [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2017-05-16T10:33:17.588-05:00 INFO  [connection] Opened connection [connectionId{localValue:4, serverValue:160}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [connection] Opened connection [connectionId{localValue:3, serverValue:159}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2017-05-16T10:33:17.591-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.592-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:6, serverValue:162}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:7, serverValue:163}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.599-05:00 INFO  [connection] Opened connection [connectionId{localValue:5, serverValue:161}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.644-05:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2017-05-16T10:33:17.646-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsNodePeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.648-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsClusterPeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.650-05:00 INFO  [PeriodicalsService] Not starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical. Not configured to run on this node.
2017-05-16T10:33:17.650-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.collector.periodical.PurgeExpiredCollectorsThread] periodical in [0s], polling every [3600s].
2017-05-16T10:33:17.718-05:00 INFO  [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
2017-05-16T10:33:17.890-05:00 INFO  [V20161130141500_DefaultStreamRecalcIndexRanges] Cluster not connected yet, delaying migration until it is reachable.
2017-05-16T10:33:17.911-05:00 INFO  [transport] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] publish_address {127.0.0.1:9350}, bound_addresses {[::1]:9350}, {127.0.0.1:9350}
2017-05-16T10:33:17.916-05:00 INFO  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] graylog-prod/HPPt87SoSmewgUKqwfs6UQ
2017-05-16T10:33:18.054-05:00 INFO  [JerseyService] Enabling CORS for HTTP endpoint
2017-05-16T10:33:20.919-05:00 WARN  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] waited for 3s and no initial state was set by the discovery
2017-05-16T10:33:20.919-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] started
2017-05-16T10:33:21.059-05:00 INFO  [service] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] detected_master {Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}, added {{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300},{graylog-bca2dfe8-77c8-4d01-8495-d08f7bdea82f}{TgllDbp2TOOcMijSrWLE1g}{10.XXX.XX.XX}{10.170.70.17:9350}{client=true, data=false, master=false},}, reason: zen-disco-receive(from master [{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}])
2017-05-16T10:33:25.239-05:00 INFO  [NetworkListener] Started listener bound to [10.XXX.XXX.XXX:12900]
2017-05-16T10:33:25.241-05:00 INFO  [HttpServer] [HttpServer] Started.
2017-05-16T10:33:25.241-05:00 INFO  [JerseyService] Started REST API at <http://10.XXX.XXX.XXX:12900/>
2017-05-16T10:33:27.199-05:00 INFO  [NetworkListener] Started listener bound to [127.0.0.1:9000]
2017-05-16T10:33:27.199-05:00 INFO  [HttpServer] [HttpServer-1] Started.
2017-05-16T10:33:27.200-05:00 INFO  [JerseyService] Started Web Interface at <http://127.0.0.1:9000/>
2017-05-16T10:33:27.201-05:00 INFO  [ServiceManagerListener] Services are healthy
2017-05-16T10:33:27.202-05:00 INFO  [ServerBootstrap] Services started, startup times in ms: {KafkaJournal [RUNNING]=9, InputSetupService [RUNNING]=11, JournalReader [RUNNING]=19, OutputSetupService [RUNNING]=25, BufferSynchronizerService [RUNNING]=25, ConfigurationEtagService [RUNNING]=47, PeriodicalsService [RUNNING]=99, StreamCacheService [RUNNING]=258, IndexerSetupService [RUNNING]=3514, JerseyService [RUNNING]=9634}
2017-05-16T10:33:27.204-05:00 INFO  [ServerBootstrap] Graylog server up and running.
2017-05-16T10:33:27.209-05:00 INFO  [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2017-05-16T10:33:27.226-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now STARTING
2017-05-16T10:33:27.264-05:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Syslog UDP Input, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} should be 262144 but is 212992.
2017-05-16T10:33:27.270-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now RUNNING

avongluck-r1soft · 2017-06-15T20:55:50Z

I've finally found an interesting correlation with a repeatable pass situation.

I remove our Nginx proxy from infront of the Graylog server.
Alerts fire as expected. OpsGenie plugin works as expected.
OpsGenie notifications trigger and no longer hang up
- We were seeing the notifications never firing, and the alerts never closing.

This really makes no sense to me unless binding graylog on 127.0.0.1:9000 was somehow messing up the outbound network within the java server (hanging up the notification?). We don't see any http calls failing, but there was a noticeable removal of erratic behavior once nginx was removed from infront of the web interface (aka. we bound web_listen_uri to the private interface and killed off nginx)

I've attached the proxy + graylog config we were using.

graylog.config.txt

alexmateescu · 2017-06-20T12:13:51Z

avongluck

i have the same issue with opsgenie but i have no nginx in front of the graylog server.

could you please paste the graylog server conf so i can compare it to mine. Thanks

this issue has been driving us nuts.

bernd · 2020-11-14T19:20:10Z

The old alerting system has been replaced with the alerts and events system in the meantime. This issue shouldn't happen anymore. Please open a new issue if this is still a problem. Thank you!

dennisoelkers self-assigned this Mar 28, 2017

joschi added to-verify bug labels Apr 3, 2017

dennisoelkers removed their assignment Mar 15, 2018

bernd closed this as completed Nov 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alert plugins can prevent alert conditions from firing #3663

Alert plugins can prevent alert conditions from firing #3663

avongluck-r1soft commented Mar 27, 2017

avongluck-r1soft commented Mar 27, 2017

joschi commented Mar 28, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017 •

edited

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Apr 3, 2017

edmundoa commented Apr 4, 2017

avongluck-r1soft commented Apr 17, 2017

avongluck-r1soft commented Apr 17, 2017 •

edited

avongluck-r1soft commented Apr 17, 2017

dennisoelkers commented May 15, 2017

avongluck-r1soft commented May 16, 2017

avongluck-r1soft commented May 16, 2017 •

edited

avongluck-r1soft commented May 16, 2017

joschi commented May 16, 2017

avongluck-r1soft commented May 16, 2017

avongluck-r1soft commented Jun 15, 2017 •

edited

alexmateescu commented Jun 20, 2017

bernd commented Nov 14, 2020

Alert plugins can prevent alert conditions from firing #3663

Alert plugins can prevent alert conditions from firing #3663

Comments

avongluck-r1soft commented Mar 27, 2017

avongluck-r1soft commented Mar 27, 2017

joschi commented Mar 28, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Mar 31, 2017 • edited

avongluck-r1soft commented Mar 31, 2017

avongluck-r1soft commented Apr 3, 2017

edmundoa commented Apr 4, 2017

avongluck-r1soft commented Apr 17, 2017

avongluck-r1soft commented Apr 17, 2017 • edited

avongluck-r1soft commented Apr 17, 2017

dennisoelkers commented May 15, 2017

avongluck-r1soft commented May 16, 2017

avongluck-r1soft commented May 16, 2017 • edited

avongluck-r1soft commented May 16, 2017

joschi commented May 16, 2017

avongluck-r1soft commented May 16, 2017

avongluck-r1soft commented Jun 15, 2017 • edited

alexmateescu commented Jun 20, 2017

bernd commented Nov 14, 2020

avongluck-r1soft commented Mar 31, 2017 •

edited

avongluck-r1soft commented Apr 17, 2017 •

edited

avongluck-r1soft commented May 16, 2017 •

edited

avongluck-r1soft commented Jun 15, 2017 •

edited