Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert plugins can prevent alert conditions from firing #3663

Closed
avongluck-r1soft opened this issue Mar 27, 2017 · 24 comments
Closed

Alert plugins can prevent alert conditions from firing #3663

avongluck-r1soft opened this issue Mar 27, 2017 · 24 comments

Comments

@avongluck-r1soft
Copy link

We see an issue around the OpsGenie plugin for Graylog preventing all alert conditions from firing in 2.2.2 (with zero error messages).

After uninstalling the OpsGenie plugin and restarting Graylog the issue went away and alert conditions in streams started working. We've had an issue open to OpsGenie for a week now on it.

@avongluck-r1soft
Copy link
Author

Testing:

  1. Erase all conditions on stream.
  2. Configure basic condition around message counts.
  3. Alert to email and Slack to test
  4. Manually send alert to trigger.
  5. No alerts triggered.
  6. Restart graylog
  7. No alerts triggered.
  8. Uninstall OpsGenie plugin
  9. Restart graylog
  10. Alerts triggered.

While it isn't Graylog's fault since this is a 3rd party plugin... the ability for a 3rd party plugin to potentially break all of the alerting is a serious issue as "Backup" notification methods will silently fail.

@dennisoelkers dennisoelkers self-assigned this Mar 28, 2017
@joschi
Copy link
Contributor

joschi commented Mar 28, 2017

@avongluck-r1soft Are there any alerting or OpsGenie related exceptions in the logs of your Graylog node(s)?

@avongluck-r1soft
Copy link
Author

So I tried to narrow the issue down.. but the behavior seems erratic. When does an alert "resolve"? I noticed that editing the conditions and unchecking the "Repeat notifications(optional)" box seems to "resolve" the alert. However the alert refuses to trigger any additional notifications after this point.

@avongluck-r1soft
Copy link
Author

(I can re-check repeat notifications and still can't trigger any notifications 30 minutes since the last successful one)

@avongluck-r1soft
Copy link
Author

If I restart graylog-server (leaving repeat notifications checked) I get a flood of them after startup.. then a alert is open and doesn't seem to ever resolve.

Here is the alert log:

2017-03-31 17:31:29.108
Graylog sent Full filesystem (Slack alarm callback) notification
2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:44:48.391
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:44:48.392
Condition is still satisfied, alert is unresolved

So 12 minutes later it is still considered open even though the threshold is 5 minutes.

@avongluck-r1soft
Copy link
Author

Full filesystem (Message Count Alert Condition)
Alerting on stream Full filesystems
Configuration: Alert is triggered when there are more than 0 messages in the last 5 minutes. Grace period: 0 minutes. Including last 5 messages in alert notification. Configured to repeat notifications.

@avongluck-r1soft
Copy link
Author

20 minutes later, still unresolved.
so:

  1. Alert is unresolved
  2. I toggle the Repeat notifications(optional) box in the conditions to test.
  3. Alert shows as resolved, however internally it "isn't"?
  4. No additional notifications will trigger
  5. Restart graylog-server
  6. Burst of alerts. Rinse and repeat.

@avongluck-r1soft
Copy link
Author

Maybe the OpsGenie notification is at fault somehow and "holds" the alerts open?

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:57:33.698
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:57:33.698
Condition is still satisfied, alert is unresolved

Attaching a server.log which doesn't tell us much.

server.log.txt

@avongluck-r1soft
Copy link
Author

avongluck-r1soft commented Mar 31, 2017

Ok. Here is some solid documented oddness:

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 18:12:01.520
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 18:12:01.520
Condition is still satisfied, alert is unresolved

Notice as of 18:12 "no notifications since 17:31".
However "messages evaluated" shows:

2017-03-31 17:32:37.081	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:34:48.928	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:42:12.168	XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:02:39.432	XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:11:07.225	XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST

So:

  • Repeat notifications checked
  • Alert open
  • New messages evaluated
  • No new events since 17:31

@avongluck-r1soft
Copy link
Author

^^^ all of that testing was with the OpsGenie plugin installed.

I uninstalled the OpsGenie plugin again and removed the notification, now events seem to always happen for open alerts as expected. No errors seen in server.log up through now.

@avongluck-r1soft
Copy link
Author

I painfully realized over the weekend at 3am that "repeat notifiations" means a notification every minute while an alert is open, even if only one condition was matched. With that understanding the comment above still proves to be out-of-line with intended behaviour.

@edmundoa
Copy link
Contributor

edmundoa commented Apr 4, 2017

Hi @avongluck-r1soft,

Sorry to read that 😞. We have an issue with grace period and repeat notifications (#3579), and it will be fixed in an upcoming patch release.

@avongluck-r1soft
Copy link
Author

Ok. I think this issue is related to odd alert behavior. The alerts never seem to resolve.
Screenshots taken at 9:48am the same day:

Alert begin 1 hour 12 minutes ago:
1

Alert unresolved still after 1 hour 12 minutes:
2

Only one message evaluated 1 hour 12 minutes ago.
3

So at this point in time:

  • The grace period is expired (60 minutes)
  • The "0 messages in 5 minutes" is expired.

Anyone have any idea what's going on here? When should alerts consider themselves closed? I re-enabled the OpsGenie plugin and set it up after upgrading to 2.2.3. The OpsGenie guys said they couldn't reproduce the issue, however.

@avongluck-r1soft
Copy link
Author

avongluck-r1soft commented Apr 17, 2017

The first screenshot shows: "Stream had 1 messages in the last 5 minutes" (which shouldn't be true any longer) Maybe "more than 0 messages" is incorrectly configured? Normally "more than 0" means 1 or more... maybe the logic is >= 0 vs > 0 ?

@dennisoelkers
Copy link
Member

@avongluck-r1soft: Is this still an issue with the most recent Graylog version (2.2.3)?

@avongluck-r1soft
Copy link
Author

Yeah. Here's an awesome screenshot showing some weird alert state:

nonsense

@avongluck-r1soft
Copy link
Author

avongluck-r1soft commented May 16, 2017

Given the configuration, i'd expect the alert to close after 2 minutes. However its been open for 15 with no signs of closing.

@avongluck-r1soft
Copy link
Author

Ah. ok. I got some great screenshots showing the problem:

Without OpsGenie plugin installed: (working as expected. Closes after 2 minutes)
timeline-ok

With OpsGenie plugin installed and configured: (not working. Alert gets stuck open, no OpsGenie notification (others are listed as triggered, but never seem to fire?)
timeline-problem

@joschi
Copy link
Contributor

joschi commented May 16, 2017

@avongluck-r1soft Anything interesting in the logs of your Graylog nodes at that time?

@avongluck-r1soft
Copy link
Author

Nothing at all interesting in the logs (quiet)
I'm not crazy, right? That second screenshot shows an alert in a state it shouldn't be in.

2017-05-16T10:33:12.031-05:00 INFO  [CmdLineTool] Loaded plugin: OpsGenieAlarmCallback 1.0.0 [com.opsgenie.plugin.graylog.OpsGenieAlarmCallbackPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Elastic Beats Input 2.2.3 [org.graylog.plugins.beats.BeatsInputPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Collector 2.2.3 [org.graylog.plugins.collector.CollectorPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.2.3 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.2.3 [org.graylog.plugins.map.MapWidgetPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.2.3 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Anonymous Usage Statistics 2.2.3 [org.graylog.plugins.usagestatistics.UsageStatsPlugin]
2017-05-16T10:33:12.041-05:00 INFO  [CmdLineTool] Loaded plugin: Slack 2.4.0 [org.graylog2.plugins.slack.callback.SlackAlarmCallback]
2017-05-16T10:33:12.194-05:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms10g -Xmx10g -XX:NewRatio=1 -XX:PermSize=128m -XX:MaxPermSize=256m -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2017-05-16T10:33:12.324-05:00 INFO  [Version] HV000001: Hibernate Validator null
2017-05-16T10:33:13.789-05:00 INFO  [InputBufferImpl] Message journal is enabled.
2017-05-16T10:33:13.804-05:00 INFO  [NodeId] Node ID: XXX-c572-4717-b7e8-f7cada1266c6
2017-05-16T10:33:13.933-05:00 INFO  [LogManager] Loading logs.
2017-05-16T10:33:13.969-05:00 INFO  [LogManager] Logs loading complete.
2017-05-16T10:33:13.969-05:00 INFO  [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2017-05-16T10:33:14.083-05:00 INFO  [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 1 parallel message handlers.
2017-05-16T10:33:14.102-05:00 INFO  [cluster] Cluster created with settings {hosts=[10.XXX.XXX.XXX:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2017-05-16T10:33:14.143-05:00 INFO  [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=10.XXX.XXX.XXX:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2017-05-16T10:33:14.162-05:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:157}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.164-05:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.XXX.XXX.XXX:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 12]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=435048}
2017-05-16T10:33:14.173-05:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:158}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] version[2.4.4], pid[12887], build[fcbb46d/2017-01-03T11:33:16Z]
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initializing ...
2017-05-16T10:33:14.404-05:00 INFO  [plugins] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] modules [], plugins [graylog-monitor], sites []
2017-05-16T10:33:15.359-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initialized
2017-05-16T10:33:15.399-05:00 INFO  [ProcessBuffer] Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:16.497-05:00 INFO  [RulesEngineProvider] No static rules file loaded.
2017-05-16T10:33:16.805-05:00 INFO  [OutputBuffer] Initialized OutputBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:17.544-05:00 INFO  [ServerBootstrap] Graylog server 2.2.3+7adc951 starting up
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] JRE: Oracle Corporation 1.8.0_121 on Linux 3.10.0-327.36.3.el7.x86_64
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Deployment: rpm
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] OS: CentOS Linux 7 (Core) (centos)
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Arch: amd64
2017-05-16T10:33:17.556-05:00 WARN  [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2017-05-16T10:33:17.567-05:00 INFO  [PeriodicalsService] Starting 26 periodicals ...
2017-05-16T10:33:17.568-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.569-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] starting ...
2017-05-16T10:33:17.572-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
2017-05-16T10:33:17.575-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.577-05:00 INFO  [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2017-05-16T10:33:17.588-05:00 INFO  [connection] Opened connection [connectionId{localValue:4, serverValue:160}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [connection] Opened connection [connectionId{localValue:3, serverValue:159}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2017-05-16T10:33:17.591-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.592-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:6, serverValue:162}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:7, serverValue:163}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.599-05:00 INFO  [connection] Opened connection [connectionId{localValue:5, serverValue:161}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.644-05:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2017-05-16T10:33:17.646-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsNodePeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.648-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsClusterPeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.650-05:00 INFO  [PeriodicalsService] Not starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical. Not configured to run on this node.
2017-05-16T10:33:17.650-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.collector.periodical.PurgeExpiredCollectorsThread] periodical in [0s], polling every [3600s].
2017-05-16T10:33:17.718-05:00 INFO  [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
2017-05-16T10:33:17.890-05:00 INFO  [V20161130141500_DefaultStreamRecalcIndexRanges] Cluster not connected yet, delaying migration until it is reachable.
2017-05-16T10:33:17.911-05:00 INFO  [transport] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] publish_address {127.0.0.1:9350}, bound_addresses {[::1]:9350}, {127.0.0.1:9350}
2017-05-16T10:33:17.916-05:00 INFO  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] graylog-prod/HPPt87SoSmewgUKqwfs6UQ
2017-05-16T10:33:18.054-05:00 INFO  [JerseyService] Enabling CORS for HTTP endpoint
2017-05-16T10:33:20.919-05:00 WARN  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] waited for 3s and no initial state was set by the discovery
2017-05-16T10:33:20.919-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] started
2017-05-16T10:33:21.059-05:00 INFO  [service] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] detected_master {Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}, added {{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300},{graylog-bca2dfe8-77c8-4d01-8495-d08f7bdea82f}{TgllDbp2TOOcMijSrWLE1g}{10.XXX.XX.XX}{10.170.70.17:9350}{client=true, data=false, master=false},}, reason: zen-disco-receive(from master [{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}])
2017-05-16T10:33:25.239-05:00 INFO  [NetworkListener] Started listener bound to [10.XXX.XXX.XXX:12900]
2017-05-16T10:33:25.241-05:00 INFO  [HttpServer] [HttpServer] Started.
2017-05-16T10:33:25.241-05:00 INFO  [JerseyService] Started REST API at <http://10.XXX.XXX.XXX:12900/>
2017-05-16T10:33:27.199-05:00 INFO  [NetworkListener] Started listener bound to [127.0.0.1:9000]
2017-05-16T10:33:27.199-05:00 INFO  [HttpServer] [HttpServer-1] Started.
2017-05-16T10:33:27.200-05:00 INFO  [JerseyService] Started Web Interface at <http://127.0.0.1:9000/>
2017-05-16T10:33:27.201-05:00 INFO  [ServiceManagerListener] Services are healthy
2017-05-16T10:33:27.202-05:00 INFO  [ServerBootstrap] Services started, startup times in ms: {KafkaJournal [RUNNING]=9, InputSetupService [RUNNING]=11, JournalReader [RUNNING]=19, OutputSetupService [RUNNING]=25, BufferSynchronizerService [RUNNING]=25, ConfigurationEtagService [RUNNING]=47, PeriodicalsService [RUNNING]=99, StreamCacheService [RUNNING]=258, IndexerSetupService [RUNNING]=3514, JerseyService [RUNNING]=9634}
2017-05-16T10:33:27.204-05:00 INFO  [ServerBootstrap] Graylog server up and running.
2017-05-16T10:33:27.209-05:00 INFO  [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2017-05-16T10:33:27.226-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now STARTING
2017-05-16T10:33:27.264-05:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Syslog UDP Input, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} should be 262144 but is 212992.
2017-05-16T10:33:27.270-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now RUNNING

@avongluck-r1soft
Copy link
Author

avongluck-r1soft commented Jun 15, 2017

I've finally found an interesting correlation with a repeatable pass situation.

  • I remove our Nginx proxy from infront of the Graylog server.
  • Alerts fire as expected. OpsGenie plugin works as expected.
  • OpsGenie notifications trigger and no longer hang up
    • We were seeing the notifications never firing, and the alerts never closing.

This really makes no sense to me unless binding graylog on 127.0.0.1:9000 was somehow messing up the outbound network within the java server (hanging up the notification?). We don't see any http calls failing, but there was a noticeable removal of erratic behavior once nginx was removed from infront of the web interface (aka. we bound web_listen_uri to the private interface and killed off nginx)

I've attached the proxy + graylog config we were using.

graylog.config.txt

@alexmateescu
Copy link

avongluck

i have the same issue with opsgenie but i have no nginx in front of the graylog server.

could you please paste the graylog server conf so i can compare it to mine. Thanks

this issue has been driving us nuts.

@dennisoelkers dennisoelkers removed their assignment Mar 15, 2018
@bernd
Copy link
Member

bernd commented Nov 14, 2020

The old alerting system has been replaced with the alerts and events system in the meantime. This issue shouldn't happen anymore. Please open a new issue if this is still a problem. Thank you!

@bernd bernd closed this as completed Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants