Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to tune NanoLog to better handle burst? #46

Closed
wizwx opened this issue Nov 12, 2020 · 3 comments
Closed

How to tune NanoLog to better handle burst? #46

wizwx opened this issue Nov 12, 2020 · 3 comments

Comments

@wizwx
Copy link

wizwx commented Nov 12, 2020

I'm testing NanoLog in a program, and noticed that roughly half of the messages were dropped in the log.

Here is the setting of my testing program. The program is multithreaded, and one of the worker threads is reading some msg (roughly 0.5k bytes per msg) from socket and log it and then process it. During the test, I read roughly ~160 msgs within 2 min from the socket, and logged all of them. However, only ~80 of them showed up in the log file. One of the cases I noticed is that the program handled ~10 msgs within 2 mini second, and only the first and last of the 10 msgs were successfully logged.

If I tweak the program to sleep for 1 second or call NanoLog::sync before logging the msg, all ~160 messages will be logged successfully.

So I looked at the runtime/Config.h file, and tried the following two settings, but strangely enough I saw no improvement and still roughly half of the messages were dropped.
STAGING_BUFFER_SIZE = 1<<25
RELEASE_THRESHOLD = 1<<19

STAGING_BUFFER_SIZE = 1<<25
RELEASE_THRESHOLD = 1<<20

Any suggestions on how I should tune the parameters to better handle burstiness? Thank you.

@syang0
Copy link
Member

syang0 commented Nov 13, 2020

Hm, that is very strange. The statistics you gave should be well within NanoLog's performance envelope. Even if the messages were 1KB (rounded up), that should only result in 160KB of space, and the default configuration allocates ~1MB of staging buffer space per thread. There should be enough space to handle your workload. Additionally, NanoLog is a guaranteed logger, so all statements should have executed before continuing to the next line of code in your application.

Is there some source code you could point me to that would reproduce this bug?

In the mean time, could you perform some sanity checks for me:

  1. Does the code path with the NanoLog statement execute for sure (i.e. if you add a regular printf("%d\r\n", counter++) next to the NanoLog statement, does it execute?).
  2. Is sync() executed before the application exits to dump the remaining log statements to disk?
  3. Is there any pattern to the log statements that made it to disk? For example, do all the log statements only come from one thread when multiple are logging, are all the log statements at the head/tail of the execution, or are they only dropped in a burst?

@wizwx
Copy link
Author

wizwx commented Nov 13, 2020

I will have to dig further to understand why.

For your sanity check questions,

  1. the NanoLog is for sure executed. If I simply add NanoLog::sync or sleep(1) before the log statement, none of the logged message will be missing in the log file; if not then about half of the messages are missing; The reason I noticed the messages are missing is because I count the message processed and the count does not match with the messages extracted from the logs;

  2. I registered atexit to call NanoLog::sync. Also the messages are logged in the same thread and the missing messages can be in the middle, not always at the end.

  3. All the missing messages come from the same thread. the dropped messages seem to be related burstiness, as in one case i see 7 out of 9 messages are dropped, and the 9 messages are on the socket within 2 mini second. I just noticed the issue and I did not have a chance to drill down further yet to confirm any pattern. Also I actually play/logged another 80 messages with roughly the same size from another socket and thread before playing/recording the 160 messages (on another socket/thread). For the first 80 messages, I never see any missing messages in the log file.

Another strange thing is that it does not help if I add NanoLog::sync right after the NANO_LOG statement, and still about half of the messages are dropped in that case. I have to add NanoLog::sync right before the Log statement and then no more missing messages.

I will have to play with it a little bit more and see what I can get. Will let you know if any finding.

@wizwx
Copy link
Author

wizwx commented Nov 19, 2020

There is no bug in NanoLog. In the same thread where I logged the received message, I had another log that has a typo in the format string, where "%s" is used for an integer value by mistake. So if several messages are encoded in one call of encodeLogMsgs in the compression thread, all messages after the log message with wrong format are lost in the decompressor.

@wizwx wizwx closed this as completed Nov 19, 2020
syang0 added a commit that referenced this issue Nov 21, 2020
Added compile-time format checking to C++17 NanoLog's example
GNUmakefiles and associated documentation. This helps users
avoid formatting errors that could corrupt the log file at
runtime (such as with issue #46) and closes issue #47.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants