pending->ACK->pending loop #253

trikosuave · 2015-11-11T21:21:58Z

Log-courier 1.8.1
logstash 1.5.4

debug logging shows:

2015/11/11 21:11:02.844349 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:02.844377 3 payloads still pending, resetting timeout
2015/11/11 21:11:07.849274 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:07.849303 3 payloads still pending, resetting timeout
2015/11/11 21:11:12.503068 Spooler flushing 21 events due to spool timeout exceeded
2015/11/11 21:11:12.503111 Sending new payload of 21 events
2015/11/11 21:11:12.504060 4/10 pending payloads now in transit
2015/11/11 21:11:12.504154 Send now open: Awaiting events for new payload
2015/11/11 21:11:12.858196 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:12.858242 4 payloads still pending, resetting timeout
2015/11/11 21:11:17.858407 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:17.858434 4 payloads still pending, resetting timeout

This will continue getting the same payload ACK and then the pending payload count will increase. This happens until I restart logstash, then log-courier will EOF and start again.

The only thing out of the ordinary on the logstash side is:

:message=>["INFLIGHT_EVENTS_REPORT", "2015-11-11T21:19:40+00:00", {"input_to_filter"=>20, "filter_to_output"=>0, "outputs"=>[]}],

I have 5 logstash servers behind an ELB all point to the same ES cluster. i've seen filter_to_output at both 20 and 0.

I'm not sure where to begin on this.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

driskell · 2015-11-11T23:03:07Z

There's bits in #219 that will help. And some more info on deciphering info in #243

First off - check Logstash logs.

Essentially the ACKN repeating same payload number means a blocked pipeline in Logstash and the plugin has gone into back off to stop events coming through and stop any communication error while it waits for the pipeline to pickup. In your case likely it is stuck permanently. Logstash logs sometimes tell you the problem.

Failing that, try the QUIT signal to Logstash (kill -QUIT 12345 where 12345 is the process ID) and you'll see a really ugly Java thread dump in the Logstash stdout (usually a log file). You can look at it yourself using info at bottom of #243 or post it up in a gist and I'll gladly run through it.

Most common problem is zero |filterworkers and the thread dump is a way of confirming it, and it becomes a case of finding out why they crashed. Other common problem is >elasticsearch outputs are stuck due to some networking or discovery issue.

ACKN is sign of healthy courier though in most cases.

driskell · 2015-11-22T14:09:15Z

If you have any further problems, feel free to reopen or open new issues.

driskell closed this as completed Nov 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pending->ACK->pending loop #253

pending->ACK->pending loop #253

trikosuave commented Nov 11, 2015

driskell commented Nov 11, 2015

driskell commented Nov 22, 2015

pending->ACK->pending loop #253

pending->ACK->pending loop #253

Comments

trikosuave commented Nov 11, 2015

driskell commented Nov 11, 2015

driskell commented Nov 22, 2015