-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async: NLog is swallowing log messages #1652
Comments
To rule out any misconfiguration issues I switched to the following NLog.Config <?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"
autoReload="true"
throwExceptions="false"
internalLogLevel="Warn" internalLogFile="logs\nlog-internal.log" >
<targets async="true">
<target xsi:type="File" name="tracelog" fileName="${basedir}/logs/trace.log" keepFileOpen="true"
layout="${longdate} ${uppercase:${level:padding=-6:fixedLength=true}} ${logger} ${message} ${exception:format=tostring}"
/>
</targets>
<rules>
<logger name="*" minlevel="Trace" writeTo="tracelog" />
</rules>
</nlog> Slow(er) application: The application ran for approximately 18 minutes, the log file size is 270MB. Apart from the database optimization no application changes or NLog configuration changes were made. Also I made sure the exact same data has been processed in both scenarios. In my application I have a Timer which reports statistics over the remaining work every 5000ms.
When the application started there were 50.000 unprocessed records. Given the fact the "slow application" log file was larger last Friday, I assume some messages are still missing even from the slow application's tracelog file. Update 2: Further tests conclude it's the async=true causing these kind of issues. |
Hi, This is by design. By default log messages are discarded (if there are too many) when using
|
Thank you for your response. How long is...
? The amount of 10.000 only makes sense together with a timespan :-) I checked my logs (with async true enabled) I have up to 2000 log lines per second. Is that too much? I will try using the BufferingWrapper then. |
Well it runs every 100ms, but it could be delayed by other processes. |
you mean |
No, BufferingWrapper. Why would I favor the AsyncWrapper over the BufferingWrapper?
And I'd really be interested in how many messages per second NLog can handle before dropping messages. Is this environment dependent? Isn't this something you can put scientifically reliable numbers on? |
because asyncWrapper already buffers, and your are using 'async' and the proposal is to replace it with the AsyncWrapper. Please fully |
Ok, I was refering to
See Remarks on BufferingWrapper We basically have a buffering AsyncWrapper and an asynchronous BufferingWrapper. |
Well I think the recommendation is, if you need buffering (only), then use the bufferWrapper. The implementation of the buffer- and asyncwrapper are really alike. The difference it that async is writing every X miliseconds and buffering every X messages. They both use
I don't there is a technical quality difference. Only a functional one, which is small |
Is this now more clear? Could be close this one? |
Well to me it is still unclear when....
...is ok to work with and when to use a different approach. Some numbers would probably help. Or at least a notification (NLog internal log?) that messages are being dropped. Apart from that I'm using the BufferingWrapper now which works just fine. |
That should be the case already. You could have the same blocking or queing behaviour in the asyncwrapper. BUT the shorthand (just Too advice is to:
|
Can only agree that it is very confusing that async=true means messages are lost. Especially since the async=true is the recommended way to enable async-behavior. Would be nice that that async=true meant adaptive. So it would have an initial queue-limit of 1000 (batchsize = 1000), and a sleep-time of 100 ms. It should be allowed to grow to queue-limit of 10000. If it reaches the queue-limit of 10000, then it should change to blocking with batchsize = 2500 and sleep-time = 0 ms. |
Agree. But there is a good reason for it. We think that logging should never block your program (in CPU and memory) and so by default it should not throw exceptions or take to much CPU/memory. We have multiple complains onthis subject, but most of the time those are artificial tests. Also, we don't have any complaints that NLog is blocking their program ;)
That's a nice approach, but I still think NLog could block your program (by taking too much CPU/memory), right?
Maybe it's an improvement to set the queueLimit and overflowAction on targets like async? e.g. <targets async="true" async-queuelimit=1000 async-overflow-action="Block" >
... your targets go here ...
</targets> |
Well if using the default setup without async=true, then NLog will "block your program" in the synchronous call to the file-stream and will make sure log-messages are not lost. So it is strange that the recommended way to improve performance (async=true), makes a behavior change that log-message are lost. As a software developer then I spend a lot of time on tweaking the log-messages so the amount of logging is just right, so when an accident occurs then I have the ability to diagnose the cause. If random log-messages are discarded then I would be completely lost. I think it is bad enough that async-logging can introduce a "gap" at program crash, so one doesn't have the latest log-messages available in the log-file. But luckily program crashes are usually solved with crash-dumps without needing the log-files. |
Do you mean that it's bad that log messages could be lost if you write them async and your application got killed?
True, but also the blocking is less an issue in the non-async way. It;s the same thread and sync, so it's far more transparent where the blocking occurs. This is not the case with a thread running in the background. Any way, I think we should improve something here, but I don't think we should move from discard to blocking (in some way). That could be an issue for current configurations. Also, is there not something wrong when writing more then 10.000 events within a second? |
Well the default is 50 ms sleep-time and batch-size of 100. So it becomes a max-limit of 2000 msgs/sec. Which is a lot of log-messages :), but it still don't warrant the behavior change when using async=true. Not the first time that I had to diagnose an accident where I had gigabytes of log-files, because the program-logic had exercised the logging as part of the accident. Still it was useful to have the correct logging and not randomly picked log-messages. Would think that when a program has the time to generate 10000 msgs/sec, then it also has the time to wait for them to be written. Especially when I have seen that NLog can easily handle 1.000.000 msgs/sec. |
I was thinking the other way around, if the application is 10000 msgs/sec, it probably needs a lot of CPU and memory, so NLog should not take all the CPU and memory ;) I think there is no best default setting. It depends. And therefor it should be easy to understand and configure, agree? |
I think that these rare creatures that need to log 10000 msgs/sec and are okay with loosing log-messages should not use async=true. Instead they should change their nlog-config to use discard as logging behavior. While doing this exercise, then they might also consider if an incomplete log-file has any value. I think the default value of async=true should be conservative and valid. If 50 ms sleep-time, 10000 queue-size and batch-size of 1000 is not enough, then application or nlog-config should be tweaked. Not cutting corners with discarding random messages. |
I'm not happy this issue is still open, so I have a created a new one to separate the issues. #1789 This report is currently by-design and so I will close it. |
After discovering the huge amount of different NLog targets (Where many depends on unstable network traffic). Then I think it is best to keep the default at discard. It is only stable targets (like the file-target) that can actually promise to behave nicely when blocking. I suggest that we close this issue as resolved. There have been made improvements to AsyncWrapper so instead of 2000 msgs/sec then it can handle 40000 msgs/sec in default config. I suggest that we push the bar a little further: #2890 |
@arunIITB You have an invalid config. Guess we need to add more validation.
I recommend that you just use <targets async="true">
See also https://github.com/NLog/NLog/wiki/Performance if you want better control of how to configure default-wrapper.
Btw. please create new issue instead of waking up old giants.
|
@snakefoot Thanks for your quick response. Sorry for not providing full context. We have been using Nlog with , but we started seeing missing log entries. Then we learned that setting async="true" will swallow log entries due to default value of OverflowAction="Discard" as per the document https://github.com/NLog/NLog/wiki/AsyncWrapper-target. I thought because of I set OverflowAction to Grow, Nlog takes time to increase internal buffer size, just to confirm my assumption, I reset overflowAction="Discard" ( default value) assuming this is as good as setting . But again this is also very slow. Now rephrasing my first question again, Why expanded form of is very slow, the reason I have to use expanded form is to override default overflowaction from Discard to Grow. Hope the question is clear. P.S: why to wakup old giant?. |
Type (choose one): Bug
NLog version: 4.3.8
Platform: .NET 4.6.2
Current NLog config (xml or C#, if relevant)
ALSO SEE UPDATES IN MY FIRST ANSWER
I already tried redirecting some trace output to separate files. But typically I'm dumping it all into one file (tracelog).
Bug description:
I'm always refering to the file target (tracelog target) and resulting log files.
I made a very interesting observation. Through setting an index in my database I drastically optimized the query performance. One "run" is now taking 3 minutes as opposed to 20 minutes before the optimization.
What I would expect is exactly the same log messages. A different message order is expected, because it's a producer-consumer driven asynchronous application. However my optimized application (different query + with index) outputs much less log messages than its slower predecessor.
Naturally I first checked the changed search query does indeed return the same amount of data to process - and it does. I can also narrow down NLog verbosity to "Fatal" messages only and only write successfully processed messages to a Fatal endpoint (for testing purposes). If I only have this target enabled I get exactly the same result for both the slow and the fast version of my application.
But as soon as I start logging on Trace-level verbosity again the log files differ significantly. 350MB vs 65MB.
Also I realized even in my "slow" version not all Trace messages are present in the log file.
For example I can see data is processed in my processing pipeline but I never got a corresponding message from my producer or from any pipeline filters further up the consumers' pipeline stack.
Before digging deeper I just wanted to make sure this is not some kind of expected behavior. Is it?
The text was updated successfully, but these errors were encountered: