Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI test cycles slow to a crawl with 1.18.0+ #5696

Closed
internetstaff opened this issue Aug 8, 2023 · 9 comments · Fixed by #5726, #5727 or #5761
Closed

CI test cycles slow to a crawl with 1.18.0+ #5696

internetstaff opened this issue Aug 8, 2023 · 9 comments · Fixed by #5726, #5727 or #5761
Assignees
Labels
comp: ci visibility Continuous Integration Visibility type: bug

Comments

@internetstaff
Copy link

Upgrading past 1.17.0 causes our test cycles to slow down so much they time out (~7 minutes to 60++).

We're on Java 17, Spring Boot 2.7, Junit 5. We haven't dug too deeply yet. What would help?

@nikita-tkachenko-datadog nikita-tkachenko-datadog added the comp: ci visibility Continuous Integration Visibility label Aug 8, 2023
@nikita-tkachenko-datadog
Copy link
Contributor

nikita-tkachenko-datadog commented Aug 8, 2023

Hi @internetstaff,

As a quick fix I can suggest disabling code coverage in the tracer. This can be done by specifying the system property (-Ddd.civisibility.code.coverage.enabled=false) or the environment variable (export DD_CIVISIBILITY_CODE_COVERAGE_ENABLED=false).
This should fix the issue.

On the other hand, I suspect that you might be using an older way of injecting the tracer: manually configuring Maven Surefire plugin or Gradle test task. If that is the case, my other suggestion would be for you to try the newer configuration way: recent versions of the tracer work best when the build system itself is instrumented, which requires a slightly different way of attaching the tracer. Instructions on how to do it can be found in CI Visibility docs (in short, it boils down to specifying MAVEN_OPTS/org.gradle.jvmargs instead of modifying the project build files).

Let me know if this helps.
Thanks!

@internetstaff
Copy link
Author

Disabling code coverage does appear to resolve the issue.

You're also correct in that we are still configuring surefire argLine.

Attempting the "new" way of integration we first ran into the requirement to enumerate all our annotation processors, which was frustrating enough that we eventually discovered dd.civisibility.compiler.plugin.auto.configuration.enabled=false.

However, getting past the compile stage, our tests then began crashing.

18:31:01,596 [ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
18:31:01,596 [ERROR] Error occurred in starting fork, check output in log
18:31:01,596 [ERROR] Process Exit Code: 134
18:31:01,596 [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?

There's not really much else to go on. We are on bellsoft/liberica-openjdk-alpine:17.0.8-7-x86_64 which is a slightly unusual JDK container build (alpine + glibc), but ...

Thanks.

@nikita-tkachenko-datadog
Copy link
Contributor

Regarding the "The forked VM terminated without properly saying goodbye." could you please share the command that Maven Surefire was using to start the JVM? It's usually written to console along with the error (should be right before the error message, as far as I recall)

@internetstaff
Copy link
Author

Here's a slightly redacted version:

Command was /bin/sh -c cd '/builds/redacted/javarepo/common/common' && '/usr/lib/jvm/jdk-17.0.8-bellsoft-x86_64/bin/java' '-javaagent:/builds/redacted/javarepo/.m2/repository/org/jacoco/org.jacoco.agent/0.8.10/org.jacoco.agent-0.8.10-runtime.jar=destfile=/builds/redacted/javarepo/common/common/target/jacoco-it.exec,excludes=**/generated/**/*' '-Ddd.civisibility.compiler.plugin.auto.configuration.enabled=false' '-Ddd.iast.enabled=false' '-Ddd.appsec.enabled=false' '-Ddd.civisibility.agent.jar.uri=file:/datadog/dd-java-agent.jar' '-Ddd.remote_config.enabled=false' '-Ddd.trace.enabled=false' '-Ddd.jmxfetch.enabled=false' '-Ddd.profiling.enabled=false' '-Ddd.prioritization.type=ENSURE_TRACE' '-Ddd.service=java' '-Ddd.civisibility.code.coverage.enabled=false' '-Ddd.civisibility.enabled=true' '-Ddd.cws.enabled=false' '-javaagent:/datadog/dd-java-agent.jar' '-javaagent:/builds/redacted/javarepo/.m2/repository/org/jacoco/org.jacoco.agent/0.8.10/org.jacoco.agent-0.8.10-runtime.jar=destfile=/builds/redacted/javarepo/common/common/target/jacoco-it.exec,excludes=**/generated/**/*' '-Duser.timezone=UTC' '-Duser.language=en' '-Duser.region=US' '-Dspring.cloud.bootstrap.enabled=false' '-XX:-TieredCompilation' '-XX:TieredStopAtLevel=1' '-jar' '/builds/redacted/javarepo/common/common/target/surefire/surefirebooter-20230811183016149_6.jar' '/builds/redacted/javarepo/common/common/target/surefire' '2023-08-11T18-30-14_790-jvmRun2' 'surefire-20230811183016149_4tmp' 'surefire_1-20230811183016149_5tmp'

@nikita-tkachenko-datadog
Copy link
Contributor

I believe I have identified and fixed the issue that was causing this failure. Could you please try the newer instrumentation way once more with release v1.19.1?

I also think I've solved the problem that was causing your annotation processors to "disappear", so it'd be very nice if you could give it a try without the dd.civisibility.compiler.plugin.auto.configuration.enabled=false property (i.e. with the compiler plugin configuration enabled).

Thanks!

@internetstaff
Copy link
Author

We've tried 1.19.2.

It seems like we can enable code coverage again without the slow down, and we did not have to do any finagling of annotation processors nor disable the compiler plugin.

However, test runs fail in a different way now:

    at datadog.trace.civisibility.events.BuildEventsHandlerImpl.onTestModuleFinish (BuildEventsHandlerImpl.java:125)
    at datadog.trace.instrumentation.maven3.MavenExecutionListener.mojoSucceeded (MavenExecutionListener.java:157)
    at jdk.internal.reflect.GeneratedMethodAccessor85.invoke (Unknown Source)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:568)
    at datadog.trace.instrumentation.maven3.MavenLifecycleParticipant.lambda$afterSessionStart$0 (MavenLifecycleParticipant.java:70)
    at jdk.proxy3.$Proxy48.mojoSucceeded (Unknown Source)
    at org.apache.maven.lifecycle.internal.DefaultExecutionEventCatapult.fire (DefaultExecutionEventCatapult.java:80)
    at org.apache.maven.lifecycle.internal.DefaultExecutionEventCatapult.fire (DefaultExecutionEventCatapult.java:38)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:336)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
    at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
    at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
    at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
    at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:193)
    at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:180)
    at java.util.concurrent.FutureTask.run (FutureTask.java:264)
    at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:539)
    at java.util.concurrent.FutureTask.run (FutureTask.java:264)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635)
    at java.lang.Thread.run (Thread.java:833)```

@nikita-tkachenko-datadog
Copy link
Contributor

I suspect you might have a module in your project that runs multiple test executions bound to the same lifecycle phase. If that is the case, could you please try running the build with v1.19.3? The tracer was not handling this scenario correctly, but the issue is fixed in the newest release.

And if that doesn't help, could you please provide some details wrt the setup in the module that is failing? Namely the Surefire and Failsafe plugins configs (anything non-standard would be of interest, such as having multiple <execution> or <goal> declarations, binding to a non-default lifecycle phase, etc).

Thank you!

@internetstaff
Copy link
Author

1.19.3 did not fix the session not found problem. However, I noticed it's caused by having two modules with the exact same <name> configured. In our case, this was a copy-and-paste error we were happy to fix, but I'm wondering if that couldn't be a problem for others.

We seem to be able to compile and test now with this integration method. Thanks!

@nikita-tkachenko-datadog
Copy link
Contributor

Great, I'm glad it works for you now!
And thanks for highlighting the identical <name> issue, I'll see if I can make the tracer respond in a more obvious way to cases like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment