Skip to content
This repository has been archived by the owner on Feb 5, 2021. It is now read-only.

Log lines merged to single message #107

Open
chris-vest opened this issue Jan 14, 2019 · 8 comments
Open

Log lines merged to single message #107

chris-vest opened this issue Jan 14, 2019 · 8 comments

Comments

@chris-vest
Copy link

In certain occasions the logs coming from our services are not correctly parsed. As a consequence, thousands of log lines are crammed together in one big message sent to Sumo Logic. Because the message is so large, it overflows Sumo Logic limit for an individual message and the content is split into multiple messages. As a result, it loses its JSON formatting and it becomes "invisible" for many queries that work under the assumption that the log entries are structured JSON.

This seems to be occurring only with Kafka Connect; when certain operations are performed it logs very noisily.

I have noticed that sumologic-fluentd is being OOM killed. I'm wondering if these log lines are being stuffed into a single message in order to 'catch up' after sumologic-fluentd is restarted?

@bin3377
Copy link
Contributor

bin3377 commented Jan 22, 2019

Hi @chris-vest , Thanks for reporting this issue.
It sounds like the multiline detection creating this issue. What's the setting of MULTILINE_START_REGEXP and is there any sample log lines failed to be parsed?

@chris-vest
Copy link
Author

@bin3377

MULTILINE_START_REGEXP = /^\w{3} \d{1,2}, \d{4}/

Example log: https://pastebin.com/SPr2rxFZ

Just to clarify, this only seems to happen after we enable a new connector - this causes Connect to log a huge amount of lines very quickly, which seems to overload fluentd somehow - causing the massive, squashed log message to end up in SumoLogic. Restarting the fluentd pods then fixes this issue.

@bin3377
Copy link
Contributor

bin3377 commented Feb 13, 2019

I think there are still something not very clear.
After connect the log to a "huge lines", how it looks like? each line is a valid json and with \n at end?
On the other side, we using fluentd memory buffer by default when streaming out records to Sumo backend. So there are 8MB size limitation of trunk. And Sumo HTTP endpoint has suggestion size of 100KB to 1MB. So I may suggest never generate a single payload over 1MB

@chris-vest
Copy link
Author

So Kafka Connect is logging very frequently after a new Connector is configured, which ends up overloading fluentd. The resulting log in SumoLogic looks like this - https://pastebin.com/SPr2rxFZ

The initial logs from Kafka Connect are no different to the logs which it normally creates, the only difference is the frequency of those logs.

@chris-vest
Copy link
Author

I am still seeing this issue, this is what I'm getting from the sumologic-fluentd pods:

sumologic-fluentd-sumologic-fluentd-m8d54 sumologic-fluentd-sumologic-fluentd 2019-05-13 09:42:52 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferChunkOverflowError error="a 13726192bytes record is larger than buffer chunk limit size" location="/var/lib/gems/2.3.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:677:in `block in write_step_by_step'" tag="containers.mnt.log.containers.msk-connect-7c74548dcf-jrv7g_strimzi_msk-connect-c2c8a2ddf227310d7f3cf27ac8ff38c7188e025292e99c42a60d8a57c35e1434.log"

@chris-vest
Copy link
Author

@bin3377 I've upgraded to 2.3.0 to see if this is mitigated at all.

Looking at the buffer settings, I'm a little confused as to where to put them when configuring fluentd? I know I need to set the chunk_limit_size, but which file should that go in? sumo.out.conf?

@chris-vest
Copy link
Author

chris-vest commented Jun 7, 2019

@bin3377 I've tried adding this:

        out.sumo.buffer.conf: |-
          <match **>
            @type sumologic
            log_key "log"
            endpoint xxxxxx
            verify_ssl true
            log_format "json"
            flush_interval 5s
            num_threads 1
            open_timeout 60
            add_timestamp true
            timestamp_key "timestamp"
            proxy_uri ""
            <buffer>
              chunk_limit_size 1m
              flush_thread_count 1
              flush_interval 5s
            </buffer>
          </match>

But I'm still seeing log lines merged into single, large log messages in Sumologic - once this happens, fluentd stops sending messages to Sumologic.

@chris-vest
Copy link
Author

I think one of your original comments was correct, in that when I cause the event in my cluster, and the application logs busily, the logs are no longer picked up by the regex. However, shouldn't setting the chunk_limit_size mitigate crashing fluentd?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants