Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception on LogRotate #50

Open
schodemeiss opened this issue Aug 18, 2016 · 16 comments
Open

Exception on LogRotate #50

schodemeiss opened this issue Aug 18, 2016 · 16 comments

Comments

@schodemeiss
Copy link

@schodemeiss schodemeiss commented Aug 18, 2016

Hello,

Whenever logrotate rotates away a log file being written by a webserver, I get the follow exception thrown:

(FileTailer[fh:SlimAccess:/var/log/access_log]) com.amazon.kinesis.streaming.agent.tailing.FileTailer [ERROR] FileTailer[fh:SlimAccess:/var/log/access_log]: Error when processing current input file or when tracking its status.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.refresh(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.updateRecordParser(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(Unknown Source)
at com.google.common.util.concurrent.Callables$3.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

Logrotate isn't occurring more than once every second either.

/var/log/access_log {
copytruncate
compress
rotate 365
size=+1M
olddir rotated
notifempty
missingok
}

The log folder and file has r+x permissions too. How are we supposed to rotate logs away from Firehose? even Linux truncate manages to blow up the Firehose Agent. I don't actually want to keep the raw-log as Firehose has basically done that for me by putting it in it's stream, I just don't want my EC2 to fill up from huge logs.

@schodemeiss

This comment has been minimized.

Copy link
Author

@schodemeiss schodemeiss commented Aug 18, 2016

See response below (this is all fine basically):
As an extra on top of this, I just noticed that when the agent is running I get messages such as this:

Tailer Progress: Tailer has parsed 171872 records (18218432 bytes), and has successfully sent 171439 records to destination.

Notice the amount of records parsed and the amount sent to destination are not equal! Does that seem right?

@chaochenq

This comment has been minimized.

Copy link
Contributor

@chaochenq chaochenq commented Aug 18, 2016

Hello, to quickly answer the question

Notice the amount of records parsed and the amount sent to destination are not equal! Does that seem right?

This is fine. Agent batches the parsed records in publish queue, and send the batches in Firehose.PutRecordBatch. So there could be some parsed records that haven't been sent and failed but retriable records in memory pending to send.

I'll take a another look at the question, I imagine it's due to the same race condition as pointed in the referenced issue.

@schodemeiss

This comment has been minimized.

Copy link
Author

@schodemeiss schodemeiss commented Aug 19, 2016

@chaochenq Thanks for the reply about the two different numbers.

I'm still having the logrotate problem. I just can't get my (essentially) Apache logs to rotate without it causing that exception above.

I really don't want to keep the log, I just want to truncate the file and save the space without having to restart Apache when the file gets big.

@chris-gilmore

This comment has been minimized.

Copy link
Contributor

@chris-gilmore chris-gilmore commented Sep 7, 2016

@schodemeiss This is just a wild guess, but have you tried using the delaycompress option along with compress?

@schodemeiss

This comment has been minimized.

Copy link
Author

@schodemeiss schodemeiss commented Sep 14, 2016

Yes, but that hasn't helped at all. I still get exactly the same as the above I'm afraid :(. Any word on this at all?

@chaochenq

This comment has been minimized.

Copy link
Contributor

@chaochenq chaochenq commented Sep 14, 2016

@schodemeiss can you try other rotate mode like create/rename rather than copytruncate? copytruncate seems to be the one that we found agent can't handle well and would cause race conditions

@schodemeiss

This comment has been minimized.

Copy link
Author

@schodemeiss schodemeiss commented Sep 19, 2016

Yeahp, tried that too. Still unhappy, sorry!

@chaochenq

This comment has been minimized.

Copy link
Contributor

@chaochenq chaochenq commented Sep 19, 2016

@schodemeiss how does your agent config look like? Could it be possibly due to the same root cause in this issue? #17

@urjitbhatia

This comment has been minimized.

Copy link

@urjitbhatia urjitbhatia commented Sep 23, 2016

@schodemeiss I ran into the same issue and somewhat solved it. Agent does not like the copytruncate mode but modifying apps to work with the create mode will be a nightmare.
So, the hack is to still use copytruncate but make the agent config look for a wildcard.
Example:

"filePattern": "/var/log/foo/*my_firehose_log*"

This way, it will still track them after logrotate moves them around.

@urjitbhatia

This comment has been minimized.

Copy link

@urjitbhatia urjitbhatia commented Sep 23, 2016

@chaochenq using create mode with logrotate means our services have to deal with changing inodes while they are writing and detecting if logrotate has run etc. Is there a timeline for fixing the agent to not have a race condition with copytruncate mode?

@chaochenq

This comment has been minimized.

Copy link
Contributor

@chaochenq chaochenq commented Sep 23, 2016

I wonder what your agent config looked like when you saw the exceptions with copytruncate. Because the race condition shouldn't occur all the time. I see that you mentioned about the work around of having a wild card of "my_firehose_log", and that's exactly what we suggest for using the agent. The idea is each flow should have a filePattern of "yourfile.log_" or "yourfile_" to match all the rotated files so that the agent is able to track the rotated files in case:

  1. the file gets rotated, and we don't lose the state of tracking files
  2. recover from failures and catch up from the rotated logs

If you just set the pattern as "/var/log/foo/*my_firehose_log" initially in the config, I think it's highly possible that undefined behavior will happen during a rotation. Like what's discussed in this issue: #17

@mohrt

This comment has been minimized.

Copy link

@mohrt mohrt commented Aug 29, 2017

I'm getting this same error WITH the wildcard. Elastic Beanstalk rotates its httpd logs into a subfolder, so I have two flow entries: one for the live log, one for the rotated directory. This may be part of my problem, they are treated as separate logs? and the agent doesn't support subfolder blobs AFAIK.

@rkustner

This comment has been minimized.

Copy link

@rkustner rkustner commented Aug 30, 2017

I'd suggest making it possible to send some kind of signal to the agent to reload the open files.
So you can run it as a post action after rotating logs.

not sure if java understands something like kill -HUP?

@spullara

This comment has been minimized.

Copy link

@spullara spullara commented Apr 3, 2018

Seeing the same bug in an endless loop:

2018-04-03 17:51:21.779+0000 ip-10-199-93-156 (FileTailer[fh:lukew_nginx_access_log:/var/log/nginx/access.log]) com.amazon.kinesis.streaming.agent.tailing.FileTailer [ERROR] FileTailer[fh:lukew_nginx_access_log:/var/log/nginx/access.log]: Error when processing current input file or when tracking its status.
java.lang.IllegalArgumentException: Current file is not contained in current snapshot!
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
	at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.<init>(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.refresh(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.FileTailer.updateRecordParser(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source)
	at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source)
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:60)
	at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
	at java.lang.Thread.run(Thread.java:748)
@yokochi

This comment has been minimized.

Copy link

@yokochi yokochi commented Sep 30, 2018

I have same a issue. I use Elastic Beanstalk. it rotates logs into a subfolder, So I created a dummy log file.
Example:

touch /var/log/nginx/access.log.dummy

/etc/aws-kinesis/agent.json

"filePattern": "/var/log/nginx/access.log*"
@eunseon-ya

This comment has been minimized.

Copy link

@eunseon-ya eunseon-ya commented Oct 29, 2018

I use Elastic Beanstalk, and nginx log file is rotated to subfolder by default.

I solve this issue with comment follow line in TrackedFileRotationAnalyzer.java

Preconditions.checkState(incoming.size() > 1); // always true after rotation by Truncate

it works same with dummy file...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.