Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High volume JSON-formatted logs collected by agent are truncated in DataDog #5764

Open
ian-axelrod opened this issue Jun 18, 2020 · 16 comments

Comments

@ian-axelrod
Copy link

ian-axelrod commented Jun 18, 2020

Output of the info page (if this is a bug)


===============
Agent (v6.17.1)
===============

  Status date: 2020-06-18 19:20:18.561964 UTC
  Agent start: 2020-06-12 15:31:00.513581 UTC
  Pid: 350
  Go Version: go1.12.9
  Python Version: 2.7.17
  Build arch: amd64
  Check Runners: 6
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System UTC time: 2020-06-18 19:20:18.561964 UTC

  Host Info
  =========
    bootTime: 2020-02-10 23:02:23.000000 UTC
    kernelVersion: 4.14.158-129.185.amzn2.x86_64
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 70
    uptime: 2944h28m39s

  Hostnames
  =========
    ec2-hostname: **REDACTED**
    hostname: **REDACTED**
    instance-id: **REDACTED**
    socket-fqdn: **REDACTED**
    socket-hostname: **REDACTED**
    host tags:
      env:stg
    hostname provider: configuration

  Metadata
  ========
    cloud_provider: AWS
    hostname_source: configuration

=========
Collector
=========

  Running Checks
  ==============
    
    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 6, Total: 212,856
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    disk (2.6.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 196, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 106ms
      
    
    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/docker.yaml
      Total Runs: 35,478
      Metric Samples: Last Run: 165, Total: 1 M
      Events: Last Run: 0, Total: 1,067
      Service Checks: Last Run: 1, Total: 36,091
      Average Execution Time : 67ms
      
    
    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 5, Total: 177,385
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 78, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 6, Total: 212,862
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 17, Total: 603,109
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    network (1.14.0)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 37, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms
      
    
    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 592
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 592
      Average Execution Time : 20.014s
      
    
    tm_docker_thinpool (1.0.0)
    --------------------------
      Instance ID: tm_docker_thinpool:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_docker_thinpool.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    tm_kafka-connect (1.0.2)
    ------------------------
      Instance ID: tm_kafka-connect:e69530c8133401db [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_kafka-connect.yaml
      Total Runs: 35,437
      Metric Samples: Last Run: 102, Total: 1 M
      Events: Last Run: 0, Total: 139
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 642ms
      
    
    tm_kafka_consumer_lag (1.0.0)
    -----------------------------
      Instance ID: tm_kafka_consumer_lag:e69530c8133401db [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_kafka_consumer_lag.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 138, Total: 1 M
      Events: Last Run: 0, Total: 193
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 554ms
      
    
    tm_rqworkers (1.0.2)
    --------------------
      Instance ID: tm_rqworkers:dccdd5fa6ffb8a61 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_rqworkers.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 1, Total: 35,477
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      
    
    zk (2.4.0)
    ----------
      Instance ID: zk:1f25e659fc9a1d9 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/zk.d/auto_conf.yaml
      Total Runs: 4,243
      Metric Samples: Last Run: 30, Total: 127,236
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 4,243
      Average Execution Time : 7ms
      metadata:
        version.major: 3
        version.minor: 4
        version.patch: 14
        version.raw: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf
        version.release: 4
        version.scheme: semver
      
  Python 3 Linter Warnings
  =======================
      tm_kafka-connect
      ----------------
        tm_kafka-connect.py:116:8 : print statement used (print-statement, E1601)
        tm_kafka-connect.py:119:12 : print statement used (print-statement, E1601)
        tm_kafka-connect.py:120:8 : print statement used (print-statement, E1601)
    
      tm_kafka_consumer_lag
      ---------------------
        tm_kafka_consumer_lag.py:127:8 : print statement used (print-statement, E1601)
        tm_kafka_consumer_lag.py:130:12 : print statement used (print-statement, E1601)
        tm_kafka_consumer_lag.py:131:8 : print statement used (print-statement, E1601)
    
      tm_pagespeed
      ------------
        tm_pagespeed.py:80:8 : print statement used (print-statement, E1601)
        tm_pagespeed.py:83:12 : print statement used (print-statement, E1601)
        tm_pagespeed.py:84:8 : print statement used (print-statement, E1601)
    
      tm_rqworkers
      ------------
        tm_rqworkers.py:59:8 : print statement used (print-statement, E1601)
        tm_rqworkers.py:62:12 : print statement used (print-statement, E1601)
        tm_rqworkers.py:63:8 : print statement used (print-statement, E1601)
    
  Loading Errors
  ==============
    tm_craft
    --------
      Core Check Loader:
        Check tm_craft not found in Catalog
        
      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so
        
      Python Check Loader:
        unable to import module 'tm_craft': No module named tm_craft
        
    tm_data_etl_pipeline
    --------------------
      Core Check Loader:
        Check tm_data_etl_pipeline not found in Catalog
        
      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so
        
      Python Check Loader:
        unable to import module 'tm_data_etl_pipeline': invalid syntax (tm_data_etl_pipeline.py, line 52)
        
    tm_zk
    -----
      Core Check Loader:
        Check tm_zk not found in Catalog
        
      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so
        
      Python Check Loader:
        unable to import module 'tm_zk': No module named tm_zk
        
========
JMXFetch
========

  Initialized checks
  ==================
    kafka
      instance_name : **REDACTED**
      message : 
      metric_count : 110
      service_check_count : 0
      status : OK
      instance_name : **REDACTED**
      message : 
      metric_count : 260
      service_check_count : 0
      status : OK
  Failed checks
  =============
    no checks
    
=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 35,477
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 4,988
    Metadata: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    SketchSeries: 0
    Success: 75,942
    TimeseriesV1: 35,477

  API Keys status
  ===============
    API key ending with **REDACTED**: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - **REDACTED**

==========
Logs Agent
==========
    Sending uncompressed logs in SSL encrypted TCP to agent-intake.logs.datadoghq.com on port 10516
    BytesSent: 4.6498322631e+10
    EncodedBytesSent: 4.6498322631e+10
    LogsProcessed: 4.3108137e+07
    LogsSent: 4.3108137e+07

  docker
  ------
    Type: docker
    Status: OK
    Inputs: b69e4e8c2066af5bc1e778a931ec2f7b8b94775fead5b066ef611e3e2a220713 
    Type: docker
    Status: OK
    Inputs: 735042e629cdc59ff49156cb8c6039bb416dae71cebdd3a7fa611e001f2e7ee6 
    Type: docker
    Status: OK
    Inputs: 3998c92f3f32a3536a8d8d143e9e81b8d1febd3b5f41f28c14c178b876e69785 
    Type: docker
    Status: OK
    Inputs: 490cb6968cb9b573085822bbff9d2e1c39e0027f24d8bcfc305ee7fb39d14b26 

  container_collect_all
  ---------------------
    Type: docker
    Status: OK
    Inputs: 1a9fe7ac6d8a0e7e4ad5bdb8b9d79bc2e239c256d432c2240f5f635ae3e1336e f1c98a998a2325766f95a785d6ae382159882d6d02bfe4bfb003233b77d4891e 

  tm_zk
  -----
    Type: file
    Path: /etc/zookeeper/logs/zookeeper.log
    Status: OK
    Inputs: /etc/zookeeper/logs/zookeeper.log 

=========
Aggregator
=========
  Checks Metric Sample: 29.6 M
  Dogstatsd Metric Sample: 9.3 M
  Event: 1,400
  Events Flushed: 1,400
  Number Of Flushes: 35,477
  Series Flushed: 31.3 M
  Service Check: 630,639
  Service Checks Flushed: 666,112

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 9.3 M
  Metric Parse Errors: 0
  Service Check Packets: 60,254
  Service Check Parse Errors: 0
  Udp Bytes: 4.2 G
  Udp Packet Reading Errors: 0
  Udp Packets: 4.5 M
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

Describe what happened:
I recently converted The Muse's Kafka Connect service log output from standard log4j console output to JSON formatted output using logback. I confirmed this works for other services, such as Kafka Brokers. With Kafka Connect, this works fine for INFO level logs and above, but I find that a majority of DEBUG logs end up with the message 'TRUNCATED' preceding the log data; and I find that the log data itself is the concatenation of dozens of unconverted JSON log records. The more logs there are in a short period of time, the more I see truncated. See the screenshots below.

Screen Shot 2020-06-18 at 1 58 55 PM

Screen Shot 2020-06-18 at 2 02 43 PM

Describe what you expected:
Logs output as expected, that is, one deserialized log record per row in the DataDog Log UI.

Steps to reproduce the issue:

  1. Start up Agent 6.17.1 container configured to collect all container logs.
  2. On the same host, run another container that generates a massive amount of JSON-encoded logs. Allow this to run for at least 2 minutes.
  3. Observe the truncated logs in your DataDog UI.

NOTE: I can create a sample Kafka Connect setup for you to test, if that will be helpful. I would just need to modify my existing setup to remove any employer-specific information.

Additional environment details (Operating System, Cloud provider, etc):
Cloud Provider: AWS
Cloud Service: ECS
Agent version: 6.17.1
Agent containerized? Yes.
Log output driver: json-file [non-blocking, 4m intermediate ring, 5 file rotation, 500kb per file]
Log collection method: Agent [SSL encrypted TCP, uncompressed]

@dmitrykarachentsev
Copy link

I faced the same issue. Is there any update?

@alexandru-iacob
Copy link

This issue is still present.

@sgt10
Copy link

sgt10 commented Jan 3, 2022

We are having the same issue with:
$ dotnet --version
6.0.100
and Serilog
Can we get an update regarding this issue?

@olivielpeau
Copy link
Member

Hi all, please make sure that:

If you're still having issues, please contact our support team and send them an Agent flare along with the raw logs that are affected (after removing any sensitive information from them).

@benknight
Copy link

I'm having this issue as well.

@MattMyers3
Copy link

Also having this issue. Any updates?

@ssrahul96
Copy link

ssrahul96 commented Nov 15, 2022

facing same issue, where my log single json line log size is more than 400kB

This needs to go in as a config https://github.com/DataDog/datadog-agent/blob/main/pkg/logs/internal/decoder/decoder.go#L23

@kostakostyukov1
Copy link

I'm having the same issue.
is there a solution ?

@sameeraksc
Copy link

sameeraksc commented Mar 23, 2023

This is still happening. 3 years still no solution

@souenzzo
Copy link

Having the same issue.

All my json logs has less then 400 chars

@Benbentwo
Copy link

Benbentwo commented Jun 16, 2023

This needs to go in as a config https://github.com/DataDog/datadog-agent/blob/main/pkg/logs/internal/decoder/decoder.go#L23

@DataDog Team, why was this magic number chosen? Shouldn't this be a very simple PR (Which i'd happily open) to fix? What is the hold up on this?

@sameeraksc
Copy link

@DataDog Any update on this? :-(

@koosha-t
Copy link

same issue! Seems nobody cares!

@annieTechno
Copy link

Experiencing the same. How to fix it?

@i524
Copy link

i524 commented Jul 9, 2024

#5764 (comment)

This needs to go in as a config main/pkg/logs/internal/decoder/decoder.go#L23

https://github.com/DataDog Team, why was this magic number chosen? Shouldn't this be a very simple PR (Which i'd happily open) to fix? What is the hold up on this?

It seems t's been fixed here:
#19055

@emmercm
Copy link

emmercm commented Oct 8, 2024

Even when maxing out the max_message_size_bytes setting @i524 mentioned, I still have this issue.

Keeping log files smaller than max_message_size_bytes does solve the problem, but I have found times that I need max_message_size_bytes to be larger than 1MB.

For my use case with Apache Flink, I can have Log4j2 rotate files based on size to ensure they're smaller than max_message_size_bytes; but then I encounter this issue with tailing filename wildcards which will drop some log lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

17 participants