Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async: NLog is swallowing log messages #1652

Closed
naymore opened this issue Sep 19, 2016 · 25 comments
Closed

Async: NLog is swallowing log messages #1652

naymore opened this issue Sep 19, 2016 · 25 comments

Comments

@naymore
Copy link

naymore commented Sep 19, 2016

Type (choose one): Bug

NLog version: 4.3.8

Platform: .NET 4.6.2

Current NLog config (xml or C#, if relevant)

ALSO SEE UPDATES IN MY FIRST ANSWER

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"
      autoReload="true"
      throwExceptions="false"
      internalLogLevel="Warn" internalLogFile="logs\nlog-internal.log" >

  <targets async="true">
    <target xsi:type="ColoredConsole" name="c_stats" layout="${time} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${message}">
      <highlight-row condition="true" foregroundColor="DarkGray"/>
    </target>
    <target xsi:type="ColoredConsole" name="c_darkgreen" layout="${time} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${message}">
      <highlight-row backgroundColor="NoChange" condition="true" foregroundColor="Green"/>
    </target>
    <target xsi:type="ColoredConsole" name="c_highlight" layout="${time} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${message}">
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0001')" foregroundColor="White"/>
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0002')" foregroundColor="Green"/>
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0003')" foregroundColor="Blue"/>
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0004')" foregroundColor="Cyan"/>
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0005')" foregroundColor="Magenta"/>
      <highlight-row backgroundColor="NoChange" condition="contains('${message}','Consumer0006')" foregroundColor="Yellow"/>
    </target>
    <target xsi:type="File" name="tracelog" fileName="${basedir}/logs/trace.log"
          layout="${longdate} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${logger} ${message} ${exception:format=tostring}"
      />
    <target xsi:type="File" name="tracelog_consumer" fileName="${basedir}/logs/trace-consumer.log"
          layout="${longdate} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${logger} ${message} ${exception:format=tostring}"
      />
    <target xsi:type="File" name="tracelog_producer" fileName="${basedir}/logs/trace-producer.log"
          layout="${longdate} ${uppercase:${level:padding=-6:fixedLength=true}} ${threadid:padding=-2:fixedLength=true} ${logger} ${message} ${exception:format=tostring}"
      />
  </targets>

  <rules>
    <logger name="MyApplication.Client.Producer" minlevel="Warn" writeTo="c_darkgreen" final="false" />
    <logger name="MyApplication.Client.Consumer" minlevel="Warn" writeTo="c_highlight" final="false" />
    <logger name="MyApplication.Client.Producer" minlevel="Trace" writeTo="tracelog_consumer" final="true" />
    <logger name="MyApplication.Client.Consumer" minlevel="Trace" writeTo="tracelog_producer" final="true" />

    <logger name="MyApplication.Client.SourceRepoStats" minlevel="Info" writeTo="c_stats" final="false" />
    <logger name="MyApplication.Client.SourceRepoStats" minlevel="Info" writeTo="tracelog" final="true" />
    <logger name="MyApplication.Client.SourceRepoStats" maxlevel="Debug" final="true" /> <!-- dump -->

    <logger name="MyApplication.Client.Configuration.ConfigurationManager" minlevel="Info" writeTo="tracelog" final="true" />
    <logger name="MyApplication.Client.Configuration.ConfigurationManager" maxlevel="Debug" final="true" /> <!-- dump -->

    <!--<logger name="MyApplication.DataProcessor.DefaultDataProcessor" minlevel="Debug" writeTo="tracelog" final="true" />-->
    <!--<logger name="MyApplication.Client.Producer" minlevel="Trace" writeTo="tracelog" final="true"/>-->
    <!--<logger name="MyApplication.CustomTransformers.EsmMachineData" minlevel="Trace" writeTo="tracelog" final="true"/>-->

    <logger name="*" minlevel="Trace" writeTo="tracelog" />
  </rules>

</nlog>
// I'm using this pattern in all my classes.
private readonly NLog.Logger _logger = NLog.LogManager.GetCurrentClassLogger();

I already tried redirecting some trace output to separate files. But typically I'm dumping it all into one file (tracelog).

Bug description:

I'm always refering to the file target (tracelog target) and resulting log files.

I made a very interesting observation. Through setting an index in my database I drastically optimized the query performance. One "run" is now taking 3 minutes as opposed to 20 minutes before the optimization.

What I would expect is exactly the same log messages. A different message order is expected, because it's a producer-consumer driven asynchronous application. However my optimized application (different query + with index) outputs much less log messages than its slower predecessor.

Naturally I first checked the changed search query does indeed return the same amount of data to process - and it does. I can also narrow down NLog verbosity to "Fatal" messages only and only write successfully processed messages to a Fatal endpoint (for testing purposes). If I only have this target enabled I get exactly the same result for both the slow and the fast version of my application.
But as soon as I start logging on Trace-level verbosity again the log files differ significantly. 350MB vs 65MB.

Also I realized even in my "slow" version not all Trace messages are present in the log file.
For example I can see data is processed in my processing pipeline but I never got a corresponding message from my producer or from any pipeline filters further up the consumers' pipeline stack.

Before digging deeper I just wanted to make sure this is not some kind of expected behavior. Is it?

@naymore
Copy link
Author

naymore commented Sep 19, 2016

To rule out any misconfiguration issues I switched to the following NLog.Config

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"
      autoReload="true"
      throwExceptions="false"
      internalLogLevel="Warn" internalLogFile="logs\nlog-internal.log" >

  <targets async="true">
    <target xsi:type="File" name="tracelog" fileName="${basedir}/logs/trace.log" keepFileOpen="true"
          layout="${longdate} ${uppercase:${level:padding=-6:fixedLength=true}} ${logger} ${message} ${exception:format=tostring}"
      />
  </targets>

  <rules>
    <logger name="*" minlevel="Trace" writeTo="tracelog" />
  </rules>

</nlog>

Slow(er) application: The application ran for approximately 18 minutes, the log file size is 270MB.
Optimized application: The application ran for approximately 1.5minutes, the log file size is 25MB.

Apart from the database optimization no application changes or NLog configuration changes were made. Also I made sure the exact same data has been processed in both scenarios.

In my application I have a Timer which reports statistics over the remaining work every 5000ms.
In my fast application this log entry appears only ONCE. Although the application ran for about 1.5 minutes.

2016-09-19 11:55:14.2147 INFO MyApplicationClient.SourceRepoStats (STATS) [Unprocessed: 6029] [Failed: 0] [Invalid: 155]

When the application started there were 50.000 unprocessed records.
So 6029 indicates it was pretty close to the end of the whole processing when the statistics message first (and solely) appeared.

Given the fact the "slow application" log file was larger last Friday, I assume some messages are still missing even from the slow application's tracelog file.

Update 2: Further tests conclude it's the async=true causing these kind of issues.
Without the async="true" attribute all the messages are in the log file.

@304NotModified
Copy link
Member

304NotModified commented Sep 19, 2016

Hi,

This is by design. By default log messages are discarded (if there are too many) when using async=true. See https://github.com/NLog/NLog/wiki/AsyncWrapper-target#async-attribute-will-discard-by-default

if you write a lot of messages (more then 10000) in a short time, it's possible that messages will be lost.

If you need all the log messages, do use the AsyncWrapper instead of the async attribute.

@naymore
Copy link
Author

naymore commented Sep 19, 2016

Thank you for your response.

How long is...

in a short time

? The amount of 10.000 only makes sense together with a timespan :-)

I checked my logs (with async true enabled) I have up to 2000 log lines per second. Is that too much?

I will try using the BufferingWrapper then.

@304NotModified
Copy link
Member

Well it runs every 100ms, but it could be delayed by other processes.

@304NotModified 304NotModified changed the title NLog is swallowing log messages Async: NLog is swallowing log messages Sep 19, 2016
@304NotModified
Copy link
Member

I will try using the BufferingWrapper then.

you mean AsyncWrapper ?

@naymore
Copy link
Author

naymore commented Sep 20, 2016

No, BufferingWrapper. Why would I favor the AsyncWrapper over the BufferingWrapper?

The BufferingWrapper can write asynchronously by itself. No need to use the async attribute or AsyncWrapper. See remarks at the BufferingWrapper.

And I'd really be interested in how many messages per second NLog can handle before dropping messages. Is this environment dependent? Isn't this something you can put scientifically reliable numbers on?

@304NotModified 304NotModified self-assigned this Sep 20, 2016
@304NotModified
Copy link
Member

No, BufferingWrapper. Why would I favor the AsyncWrapper over the BufferingWrapper?

because asyncWrapper already buffers, and your are using 'async' and the proposal is to replace it with the AsyncWrapper. Please fully
read https://github.com/NLog/NLog/wiki/AsyncWrapper-target

@304NotModified 304NotModified removed their assignment Sep 21, 2016
@naymore
Copy link
Author

naymore commented Sep 21, 2016

Ok, I was refering to

Using the slidingTimeout (remark: "on the BufferingWrapper") is preferred over the async attribute and AsyncWrapper.

See Remarks on BufferingWrapper

We basically have a buffering AsyncWrapper and an asynchronous BufferingWrapper.
Any recommendations? From the documentation my understanding was the AsyncWrapper is inferior to the BufferingWrapper.

@304NotModified
Copy link
Member

304NotModified commented Sep 21, 2016

Any recommendations? From the documentation my understanding was the Async(target)Wrapper is inferior to the Buffering(target)Wrapper.

Well I think the recommendation is, if you need buffering (only), then use the bufferWrapper.

The implementation of the buffer- and asyncwrapper are really alike. The difference it that async is writing every X miliseconds and buffering every X messages. They both use Timers.

rom the documentation my understanding was the AsyncWrapper is inferior to the BufferingWrapper.

I don't there is a technical quality difference. Only a functional one, which is small

@304NotModified
Copy link
Member

Is this now more clear? Could be close this one?

@naymore
Copy link
Author

naymore commented Oct 5, 2016

Well to me it is still unclear when....

async="true"

...is ok to work with and when to use a different approach. Some numbers would probably help. Or at least a notification (NLog internal log?) that messages are being dropped.

Apart from that I'm using the BufferingWrapper now which works just fine.

@304NotModified
Copy link
Member

Or at least a notification (NLog internal log?) that messages are being dropped.

That should be the case already.

You could have the same blocking or queing behaviour in the asyncwrapper. BUT the shorthand (just async=true) does pick the discared by default.

Too advice is to:

  • You need a fast application and logging should never slow down your application, use async=true. Discarding should be OK.
  • Discarding is not OK, or you like to tune the Async settings. Use the AsyncWrapper (don't forget to write to the wrapper in your <rules>)
  • You like to combine some messages, like the last 10 messages in one mail. Use the BufferWrapper.

@snakefoot
Copy link
Contributor

snakefoot commented Oct 31, 2016

Can only agree that it is very confusing that async=true means messages are lost. Especially since the async=true is the recommended way to enable async-behavior.

Would be nice that that async=true meant adaptive. So it would have an initial queue-limit of 1000 (batchsize = 1000), and a sleep-time of 100 ms. It should be allowed to grow to queue-limit of 10000. If it reaches the queue-limit of 10000, then it should change to blocking with batchsize = 2500 and sleep-time = 0 ms.

@304NotModified
Copy link
Member

Can only agree that it is very confusing that async=true means messages are lost.

Agree. But there is a good reason for it. We think that logging should never block your program (in CPU and memory) and so by default it should not throw exceptions or take to much CPU/memory.

We have multiple complains onthis subject, but most of the time those are artificial tests. Also, we don't have any complaints that NLog is blocking their program ;)

Would be nice that that async=true meant adaptive. So it would have an initial queue-limit of 1000 (batchsize = 1000), and a sleep-time of 100 ms. It should be allowed to grow to queue-limit of 10000. If it reaches the queue-limit of 10000, then it should change to blocking with batchsize = 2500 and sleep-time = 0 ms.

That's a nice approach, but I still think NLog could block your program (by taking too much CPU/memory), right?

Especially since the async=true is the recommended way to enable async-behavior.

Maybe it's an improvement to set the queueLimit and overflowAction on targets like async?

e.g.

<targets async="true" async-queuelimit=1000 async-overflow-action="Block" > 
  ... your targets go here ...
</targets>

@snakefoot
Copy link
Contributor

snakefoot commented Nov 1, 2016

We think that logging should never block your program (in CPU and memory)

Well if using the default setup without async=true, then NLog will "block your program" in the synchronous call to the file-stream and will make sure log-messages are not lost.

So it is strange that the recommended way to improve performance (async=true), makes a behavior change that log-message are lost. As a software developer then I spend a lot of time on tweaking the log-messages so the amount of logging is just right, so when an accident occurs then I have the ability to diagnose the cause. If random log-messages are discarded then I would be completely lost.

I think it is bad enough that async-logging can introduce a "gap" at program crash, so one doesn't have the latest log-messages available in the log-file. But luckily program crashes are usually solved with crash-dumps without needing the log-files.

@304NotModified
Copy link
Member

I think it is bad enough that async-logging can introduce a "gap" at program crash, so one doesn't have the latest log-messages available in the log-file.

Do you mean that it's bad that log messages could be lost if you write them async and your application got killed?

So it is strange that the recommended way to improve performance (async=true), makes a behavior change that log-message are lost.

True, but also the blocking is less an issue in the non-async way. It;s the same thread and sync, so it's far more transparent where the blocking occurs. This is not the case with a thread running in the background.

Any way, I think we should improve something here, but I don't think we should move from discard to blocking (in some way). That could be an issue for current configurations.

Also, is there not something wrong when writing more then 10.000 events within a second?

@snakefoot
Copy link
Contributor

Well the default is 50 ms sleep-time and batch-size of 100. So it becomes a max-limit of 2000 msgs/sec.

Which is a lot of log-messages :), but it still don't warrant the behavior change when using async=true. Not the first time that I had to diagnose an accident where I had gigabytes of log-files, because the program-logic had exercised the logging as part of the accident. Still it was useful to have the correct logging and not randomly picked log-messages.

Would think that when a program has the time to generate 10000 msgs/sec, then it also has the time to wait for them to be written. Especially when I have seen that NLog can easily handle 1.000.000 msgs/sec.

@304NotModified
Copy link
Member

Would think that when a program has the time to generate 10000 msgs/sec, then it also has the time to wait for them to be written. Especially when I have seen that NLog can easily handle 1.000.000 msgs/sec.

I was thinking the other way around, if the application is 10000 msgs/sec, it probably needs a lot of CPU and memory, so NLog should not take all the CPU and memory ;)

I think there is no best default setting. It depends. And therefor it should be easy to understand and configure, agree?

@snakefoot
Copy link
Contributor

snakefoot commented Nov 1, 2016

I think there is no best default setting. It depends. And therefor it should be easy to understand and configure, agree?

I think that these rare creatures that need to log 10000 msgs/sec and are okay with loosing log-messages should not use async=true. Instead they should change their nlog-config to use discard as logging behavior. While doing this exercise, then they might also consider if an incomplete log-file has any value.

I think the default value of async=true should be conservative and valid. If 50 ms sleep-time, 10000 queue-size and batch-size of 1000 is not enough, then application or nlog-config should be tweaked. Not cutting corners with discarding random messages.

@304NotModified
Copy link
Member

I'm not happy this issue is still open, so I have a created a new one to separate the issues. #1789

This report is currently by-design and so I will close it.

@snakefoot
Copy link
Contributor

After discovering the huge amount of different NLog targets (Where many depends on unstable network traffic). Then I think it is best to keep the default at discard.

It is only stable targets (like the file-target) that can actually promise to behave nicely when blocking. I suggest that we close this issue as resolved. There have been made improvements to AsyncWrapper so instead of 2000 msgs/sec then it can handle 40000 msgs/sec in default config. I suggest that we push the bar a little further: #2890

@arunIITB
Copy link

arunIITB commented Jun 11, 2019

#1789 I am using latest Nlog 4.6.4. Still I am seeing slow writing with following config ( overflowAction="Grow").
I am seeing same writing speed whether I set overflowAction="Discard" or overflowAction="Grow".

nlog.txt

Is the above configuration looks fine?.

@snakefoot
Copy link
Contributor

snakefoot commented Jun 11, 2019 via email

@arunIITB
Copy link

@snakefoot Thanks for your quick response.

Sorry for not providing full context.

We have been using Nlog with , but we started seeing missing log entries. Then we learned that setting async="true" will swallow log entries due to default value of OverflowAction="Discard" as per the document https://github.com/NLog/NLog/wiki/AsyncWrapper-target.
So I tried using overflowAction="Grow", log writing speed drastically reduced something in the order of 50 times slower.
My config entry now is,

nlog_Grow.txt

I thought because of I set OverflowAction to Grow, Nlog takes time to increase internal buffer size, just to confirm my assumption, I reset overflowAction="Discard" ( default value) assuming this is as good as setting . But again this is also very slow.

Now rephrasing my first question again,

Why expanded form of is very slow, the reason I have to use expanded form is to override default overflowaction from Discard to Grow.

Hope the question is clear.

P.S: why to wakup old giant?.
I thought 9 months is not very old. The issue I am facing is similar to issue discussed on this thread. I am happy to open a new issue as well, if you want.

@snakefoot
Copy link
Contributor

snakefoot commented Jun 12, 2019

@arunIITB You still have an invalid config (performance drop comes from no longer using async). Please create new issue, as requested (This one is 3 years old). Your issue is about not able to read the manual (see link I gave you before about using <default-wrapper>).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants