Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pending->ACK->pending loop #253

Closed
trikosuave opened this issue Nov 11, 2015 · 2 comments
Closed

pending->ACK->pending loop #253

trikosuave opened this issue Nov 11, 2015 · 2 comments

Comments

@trikosuave
Copy link

Log-courier 1.8.1
logstash 1.5.4

debug logging shows:

2015/11/11 21:11:02.844349 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:02.844377 3 payloads still pending, resetting timeout
2015/11/11 21:11:07.849274 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:07.849303 3 payloads still pending, resetting timeout
2015/11/11 21:11:12.503068 Spooler flushing 21 events due to spool timeout exceeded
2015/11/11 21:11:12.503111 Sending new payload of 21 events
2015/11/11 21:11:12.504060 4/10 pending payloads now in transit
2015/11/11 21:11:12.504154 Send now open: Awaiting events for new payload
2015/11/11 21:11:12.858196 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:12.858242 4 payloads still pending, resetting timeout
2015/11/11 21:11:17.858407 ACKN message received for payload 5e2645e96e5566f86dd03931867375e2 sequence 0
2015/11/11 21:11:17.858434 4 payloads still pending, resetting timeout

This will continue getting the same payload ACK and then the pending payload count will increase. This happens until I restart logstash, then log-courier will EOF and start again.

The only thing out of the ordinary on the logstash side is:

:message=>["INFLIGHT_EVENTS_REPORT", "2015-11-11T21:19:40+00:00", {"input_to_filter"=>20, "filter_to_output"=>0, "outputs"=>[]}],

I have 5 logstash servers behind an ELB all point to the same ES cluster. i've seen filter_to_output at both 20 and 0.

I'm not sure where to begin on this.

Thanks for your help!

@driskell
Copy link
Owner

There's bits in #219 that will help. And some more info on deciphering info in #243

First off - check Logstash logs.

Essentially the ACKN repeating same payload number means a blocked pipeline in Logstash and the plugin has gone into back off to stop events coming through and stop any communication error while it waits for the pipeline to pickup. In your case likely it is stuck permanently. Logstash logs sometimes tell you the problem.

Failing that, try the QUIT signal to Logstash (kill -QUIT 12345 where 12345 is the process ID) and you'll see a really ugly Java thread dump in the Logstash stdout (usually a log file). You can look at it yourself using info at bottom of #243 or post it up in a gist and I'll gladly run through it.

Most common problem is zero |filterworkers and the thread dump is a way of confirming it, and it becomes a case of finding out why they crashed. Other common problem is >elasticsearch outputs are stuck due to some networking or discovery issue.

ACKN is sign of healthy courier though in most cases.

@driskell
Copy link
Owner

If you have any further problems, feel free to reopen or open new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants