Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dd-trace-java v1.11.0 crashes the JVM #4978

Closed
nantoniazzi opened this issue Mar 28, 2023 · 18 comments
Closed

dd-trace-java v1.11.0 crashes the JVM #4978

nantoniazzi opened this issue Mar 28, 2023 · 18 comments
Assignees
Labels
Milestone

Comments

@nantoniazzi
Copy link

nantoniazzi commented Mar 28, 2023

Our server automatically downloaded the latest trace agent. v1.11.0
After a few minutes, our servers started to crash and reboot in loop. After some investigation, it seems that the JVM was crashing. So maybe it's more a JVM issue, but it seems to be your recent changes that introduced this behaviour.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f171904b992, pid=23811, tid=24534
#
# JRE version: OpenJDK Runtime Environment Corretto-17.0.6.10.1 (17.0.6+10) (build 17.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.6.10.1 (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjavaProfiler16018171302964888844.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# JFR recording file will be written. Location: /usr/share/tomcat/hs_err_pid23811.jfr
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-17/issues/
#

---------------  S U M M A R Y ------------

---------------  T H R E A D  ---------------

Current thread (0x00007f1755ab1900):  JavaThread "dd-trace-processor" daemon [_thread_in_Java, id=24534, stack(0x00007f16fc328000,0x00007f16fc429000)]

Stack: [0x00007f16fc328000,0x00007f16fc429000],  sp=0x00007f16fc4266d8,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjavaProfiler16018171302964888844.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
C  [libjavaProfiler16018171302964888844.so+0x1b956]  Profiler::recordSample(void*, unsigned long long, int, int, Event*)+0x256
C  [libjavaProfiler16018171302964888844.so+0x1c2c6]  PerfEvents::signalHandler(int, siginfo_t*, void*)+0x116


siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f17081c3673
@jbachorik jbachorik self-assigned this Mar 28, 2023
@jbachorik jbachorik added the comp: profiling Profiling label Mar 28, 2023
@claude-peon-nyt
Copy link

claude-peon-nyt commented Mar 29, 2023

We are seeing similar with Java 11 + CentOS. Identical application instances with version 1.10.0~c545cdc5a3 do not experience this issue.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9d35f84992, pid=6088, tid=14890
#
# JRE version: OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-1.el7_9) (11.0.18+10) (build 11.0.18+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-1.el7_9) (11.0.18+10-LTS, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjavaProfiler15813639711709130809.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207&component=java-11-openjdk
#

---------------  S U M M A R Y ------------
Command Line: -D[Standalone] -Xms2048m -Xmx10240m -XX:MaxPermSize=1024m .....
Host: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 4 cores, 15G, CentOS Linux release 7.9.2009 (Core)

---------------  T H R E A D  ---------------

Current thread (0x0000555d8a4d4800):  JavaThread "default task-16" [_thread_in_vm, id=14890, stack(0x00007f9d31a39000,0x00007f9d31b3a000)]

Stack: [0x00007f9d31a39000,0x00007f9d31b3a000],  sp=0x00007f9d31b35ad8,  free space=1010k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libjavaProfiler15813639711709130809.so+0x7992]  Buffer::putVar64(unsigned long long)+0x102
C  [libjavaProfiler15813639711709130809.so+0x1b956]  Profiler::recordSample(void*, unsigned long long, int, int, Event*)+0x256
C  [libjavaProfiler15813639711709130809.so+0x1c2c6]  PerfEvents::signalHandler(int, siginfo_t*, void*)+0x116
C  [libpthread.so.0+0xf630]
V  [libjvm.so+0xd411cc]  SharedRuntime::montgomery_square(int*, int*, int, long, int*)+0x15c


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000555dfaa25f48

@skagedal
Copy link

Seeing the same issue with Temurin base image and Java 11.0.18.

@mcculls
Copy link
Contributor

mcculls commented Mar 29, 2023

Just in case people haven't seen the workaround Jaroslav posted on slack:

As a quick remedy you can add -Ddd.profiling.ddprof.enabled=false to disable the native profiler library. This will disable code hotspots but if you need to use 1.11.0 for other fixes/features this would be the fastest way to get you on track

@jbachorik
Copy link
Contributor

We are validating the fix now and will do a patch release once we are sure the root cause is fixed.
Will keep you posted.

@rylectro
Copy link

rylectro commented Mar 29, 2023

Also experienced this today:

Problematic frame:
C [libjavaProfiler1836949094165478159.so+0x7992] Buffer::putVar64(unsigned long long)+0x102

@jeevangali-agi
Copy link

this is simulated even for Java 8.0_352-b08. at the same frame.
image

@jbachorik jbachorik added this to the 1.11.1 milestone Mar 29, 2023
@jbachorik
Copy link
Contributor

Fixed by #4981

@bootswithdefer
Copy link

1.11.1 didn't solve the issue for us

29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Amazon.com Inc.
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_362-b08
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8.0-amazon-corretto/jre
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            5.4.228-131.415.amzn2.x86_64
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 8.5.87.0
29-Mar-2023 16:13:57.110 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Feb 27 2023 19:32:33 UTC
29-Mar-2023 16:13:57.108 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/8.5.87
    Using VM: OpenJDK 64-Bit Server VM
    Ergonomics Machine Class: server
    Max. Heap Size (Estimated): 2.88G
VM settings:
[dd.trace 2023-03-29 16:13:56:456 -0700] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.11.1~514d7ebfe1","os_name":"Linux","os_version":"5.4.228-131.415.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"1.8.0_362","jvm_vendor":"Amazon.com Inc.","jvm_version":"25.362-b08","java_class_version":"52.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"warapps-myasu","agent_url":"http://10.120.43.39:8126","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"ci-1680128715949","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"e50c8cdd-1856-45cd-a69e-eb0bf3eebd6e","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":true,"datadog_profiler_safe":true}
[dd.trace 2023-03-29 16:13:56:302 -0700] [main] INFO com.datadog.appsec.AppSecSystem - AppSec is ENABLED_INACTIVE with powerwaf(libddwaf: 1.8.2) no rules loaded
#
#   https://github.com/corretto/corretto-8/issues/
# If you would like to submit a bug report, please visit:
#
# /usr/local/tomcat/hs_err_pid7.log
# An error report file with more information is saved as:
#
# Core dump written. Default location: /usr/local/tomcat/core or core.7
#
# C  [libjavaProfiler3165790072185085033.so+0x78b2]  Buffer::putVar64(unsigned long long)+0x102
# Problematic frame:
# Java VM: OpenJDK 64-Bit Server VM (25.362-b08 mixed mode linux-amd64 compressed oops)
# JRE version: OpenJDK Runtime Environment (8.0_362-b08) (build 1.8.0_362-b08)
#
#  SIGSEGV (0xb) at pc=0x00007f82c658c8b2, pid=7, tid=0x00007f825d9ea700
#
# A fatal error has been detected by the Java Runtime Environment:
#
[error occurred during error reporting (null), id 0xb]```

@petercsiba
Copy link

petercsiba commented Mar 29, 2023

+1 to above
we ended up pinning to the latest stable version we observed 1.10.1

RUN wget -O dd-java-agent.jar 'https://github.com/DataDog/dd-trace-java/releases/download/v1.10.1/dd-java-agent.jar'

instead of

RUN wget -O dd-java-agent.jar 'https://dtdg.co/latest-java-tracer

@jeevangali-agi
Copy link

Thank you, but this does not seem to be working yet. we are on Java 8. it still crashes with the latest agent.

SIGSEGV (0xb) at pc=0x00007fc0f07a28b2, pid=45, tid=0x00007fc078b5f700

@Misocainea
Copy link

+1 to not fixed, experiencing the same crash on 1.11.1

@richardstartin
Copy link
Member

We acknowledge that this crash is still possible, hence reopening the issue, and are working on getting 1.11.2 out with full mitigation.

@richardstartin richardstartin modified the milestones: 1.11.1, 1.11.2 Mar 31, 2023
@richardstartin
Copy link
Member

We managed to reproduce the issue and verified a fix for this, released in 1.11.2. If the crash persists after upgrading to 1.11.2, please report back with the backtrace from the hs_err file, the JDK version and the base docker base image being used (or the linux and libc versions otherwise).

@Misocainea
Copy link

1.11.2 has resolved this for me. Thank you.

@ahululu
Copy link

ahululu commented Nov 16, 2023

We encountered the same problem。。。。error just like above
kubernetes: AWS EKS
Node: Amazon arm linux 2023
OS: RockyLinux 9.2 arm
datadog Agent install: Helm

@richardstartin
Copy link
Member

Hi @ahululu could you provide more details please?

  1. The version of dd-java-agent.jar
  2. The backtrace and siginfo from the hs_err_pid file

@ahululu
Copy link

ahululu commented Nov 27, 2023

Hi @ahululu could you provide more details please?

  1. The version of dd-java-agent.jar
  2. The backtrace and siginfo from the hs_err_pid file
    1:
    "version":"1.25.0-SNAPSHOT~a1f05ad311"
    2:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa809aa64, pid=7, tid=0x0000ffff11b7c120
#
# JRE version: Java(TM) SE Runtime Environment (8.0_391) (build 1.8.0_391-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.391-b13 mixed mode linux-aarch64 compressed oops)
# Problematic frame:
# j  org.apache.http.pool.PoolEntry.updateExpiry(JLjava/util/concurrent/TimeUnit;)V+26
#
# Core dump written. Default location: /var/xxxx/xxx/core or core.7
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000ffff4400c800):  JavaThread "xxl-job, JobThread-6-1700728575017" [_thread_in_Java, id=943, stack(0x0000ffff11afd000,0x0000ffff11b7d000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000079800e440
Stack: [0x0000ffff11afd000,0x0000ffff11b7d000],  sp=0x0000ffff11b79450,  free space=497k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j  org.apache.http.pool.PoolEntry.updateExpiry(JLjava/util/concurrent/TimeUnit;)V+26
J 25012 C1 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection(Lorg/apache/http/HttpClientConnection;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;)V (395 bytes) @ 0x0000ffffabb82aa4 [0x0000ffffabb822c0+0x7e4]
J 22703 C1 org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(Z)V (176 bytes) @ 0x0000ffffab5e0350 [0x0000ffffab5dff40+0x410]
J 22587 C1 org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(Ljava/io/InputStream;)Z (77 bytes) @ 0x0000ffffab572d24 [0x0000ffffab572880+0x4a4]
J 20414 C1 org.apache.http.conn.EofSensorInputStream.checkClose()V (54 bytes) @ 0x0000ffffaae9e170 [0x0000ffffaae9e000+0x170]
J 20413 C1 org.apache.http.conn.EofSensorInputStream.close()V (10 bytes) @ 0x0000ffffaae98600 [0x0000ffffaae98540+0xc0]
J 25081 C1 java.util.zip.GZIPInputStream.close()V (22 bytes) @ 0x0000ffffaa2ac0f0 [0x0000ffffaa2abe80+0x270]
J 25080 C1 org.apache.http.client.entity.LazyDecompressingInputStream.close()V (35 bytes) @ 0x0000ffffa9f4bcf8 [0x0000ffffa9f4bb80+0x178]
J 25046 C1 org.apache.http.util.EntityUtils.toString(Lorg/apache/http/HttpEntity;Lorg/apache/http/entity/ContentType;)Ljava/lang/String; (184 bytes) @ 0x0000ffffabbb2b58 [0x0000ffffabbb14c0+0x1698]
J 20452 C1 com.xxxx.xxxx.xxxxx.xxxxx.RestTemplateHttpClient.getResponseResult(Lorg/apache/http/HttpResponse;)Lcom/xxxx/xxxxx/xxx/entity/HttpRequestResult; (204 bytes) @ 0x0000ffffaaebc3bc [0x0000ffffaaebc240+0x17c]
J 22605 C1 com.xxx.xxx.xxx.xxx.RestTemplateHttpClient.doGetTLS13(Ljava/lang/String;Ljava/util/Map;Ljava/util/Map;)Lcom/xxx/xxx/xxx/xxxx/HttpRequestResult; (244 bytes) @ 0x0000ffffab57dc58 [0x0000ffffab57b780+0x24d8]
J 24088 C1 com.xxxxx.xxxx.xxxxx.xxxx.service.impl.xxxxx.inquireByRefNo(Ljava/lang/String;Ljava/util/Map;Ljava/util/Map;)Lcom/xxx/xxx/xxx/xxxx/Response; (151 bytes) @ 0x0000ffffab9b0dbc [0x0000ffffab9b0d00+0xbc]
J 22662 C1 com.xxxxx.xxx.xxx.xxx.service.impl.xxxx.compensatexxx(Lcom/xxx/xxx/entity/xxxEntity;)Lcom/xxx/xxx/xxxx/entity/Response; (443 bytes) @ 0x0000ffffab5a8474 [0x0000ffffab5a7d00+0x774]
J 24838 C1 com.xxxxx.xxxx.xxx.xxxx.service.impl.xxxxxx.compensatexxx(Lcom/xxxx/xxx/entity/xxxx;)V (236 bytes) @ 0x0000ffffabab4364 [0x0000ffffabab4040+0x324]
J 25085 C1 com.xxx.xx.xxx.xxxx.service.impl.xxxx$$FastClassBySpringCGLIB$$9316ee24.invoke(ILjava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (1682 bytes) @ 0x0000ffffabc1ccd8 [0x0000ffffabc10bc0+0xc118]
J 22609 C1 org.springframework.cglib.proxy.MethodProxy.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (72 bytes) @ 0x0000ffffab581ad4 [0x0000ffffab581940+0x194]
J 22076 C1 org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint()Ljava/lang/Object; (36 bytes) @ 0x0000ffffab3a7140 [0x0000ffffab3a7040+0x100]
J 19581 C1 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed()Ljava/lang/Object; (126 bytes) @ 0x0000ffffaac506f0 [0x0000ffffaac4fc40+0xab0]
J 19580 C1 org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed()Ljava/lang/Object; (47 bytes) @ 0x0000ffffaac518f0 [0x0000ffffaac51840+0xb0]
J 23204 C1 org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(Ljava/lang/reflect/Method;Ljava/lang/Class;Lorg/springframework/transaction/interceptor/TransactionAspectSupport$InvocationCallback;)Ljava/lang/Object; (522 bytes) @ 0x0000ffffab739b74 [0x0000ffffab739300+0x874]
J 22109 C1 org.springframework.transaction.interceptor.TransactionInterceptor.invoke(Lorg/aopalliance/intercept/MethodInvocation;)Ljava/lang/Object; (44 bytes) @ 0x0000ffffab3e8124 [0x0000ffffab3e7c40+0x4e4]
J 19581 C1 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed()Ljava/lang/Object; (126 bytes) @ 0x0000ffffaac50628 [0x0000ffffaac4fc40+0x9e8]
J 19580 C1 org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed()Ljava/lang/Object; (47 bytes) @ 0x0000ffffaac518f0 [0x0000ffffaac51840+0xb0]
J 20313 C1 org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;Lorg/springframework/cglib/proxy/MethodProxy;)Ljava/lang/Object; (229 bytes) @ 0x0000ffffaae45484 [0x0000ffffaae445c0+0xec4]
J 22685 C1 com.xxxx.xxxx.xxx.xxxx.service.impl.xxxServiceImpl$$EnhancerBySpringCGLIB$$3fd9963c.compensatexxx(Lcom/xx/xxx/entity/xxxx;)V (48 bytes) @ 0x0000ffffab5d05e4 [0x0000ffffab5d0200+0x3e4]
j  com.xxx.xxx.xxx.xxx.schedule.xxxxx.executexxxRecovery(Ljava/lang/String;)V+278
j  com.xxx.xxx.xxxx.xxxxx.schedule.xxxxx.execute()V+27
v  ~StubRoutines::call_stub
V  [libjvm.so+0x5f2104]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xe3c
V  [libjvm.so+0x8df00c]  Reflection::invoke(instanceKlassHandle, methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0xa4c
V  [libjvm.so+0x8e0664]  Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x12c
V  [libjvm.so+0x6693bc]  JVM_InvokeMethod+0x104
J 3468  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 bytes) @ 0x0000ffffa89fdce8 [0x0000ffffa89fdc40+0xa8]
V  [libjvm.so+0xb5b000]

@richardstartin
Copy link
Member

Hi @ahululu - this is indeed a JVM crash, but not related to this issue, and it doesn't look like this could be caused by the profiler. It may help if you post this info to the other crash report you are commenting on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests