Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model checking hangs in 2.16 on jdk 19 #130

Closed
ben-manes opened this issue Nov 23, 2022 · 8 comments
Closed

model checking hangs in 2.16 on jdk 19 #130

ben-manes opened this issue Nov 23, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@ben-manes
Copy link

When upgrading from 2.15, a previously successful test hangs and times out on CI. I was able to reproduce this locally on jdk19 (both 11 and 17 passed). Using jps for the pid and jstack for the stacktrace, I could see that it was always doing work within lincheck methods. Sorry that I cannot provide more insights.

JAVA_VERSION=19 ./gradlew caffeine:lincheckTest

https://github.com/ben-manes/caffeine/actions/runs/3528817208/jobs/5919342189

@ben-manes
Copy link
Author

@ben-manes
Copy link
Author

I profiled and found that it is throwing NoClassDefFoundError and I presume retrying indefinitely. The reason appears to be a failed transform,

Could not initialize class org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.ForkJoinPool

void java.lang.Error.(String)
void java.lang.LinkageError.(String)
void java.lang.NoClassDefFoundError.(String)
void com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(Runnable)
void com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run()
void com.github.benmanes.caffeine.lincheck.AbstractLincheckCacheTest$$Lambda$3254+0x00000008029f7140.74472028.execute(Runnable)

If I remove the offending code then the test passes. In this case it is an optimization to more aggressively resubmit onto the common pool if work remains, else wait for the next triggering action. I don't know why other JDK versions wouldn't suffer this problem and only on jdk19 with 2.16 does it break.

/**
  * Performs the maintenance work, blocking until the lock is acquired.
  *
  * @param task an additional pending task to run, or {@code null} if not present
  */
void performCleanUp(@Nullable Runnable task) {
  evictionLock.lock();
  try {
    maintenance(task);
  } finally {
    evictionLock.unlock();
  }
  if ((drainStatusOpaque() == REQUIRED) && (executor == ForkJoinPool.commonPool())) {
    scheduleDrainBuffers();
  }
}

@ben-manes
Copy link
Author

Oh, I missed this initialization exception for the static load,

Exception java.lang.IllegalAccessError:
class org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.ForkJoinPool (in unnamed module @0x1add17cc) cannot access class jdk.internal.vm.SharedThreadContainer (in module java.base) because module java.base does not export jdk.internal.vm to unnamed module @0x1add17cc [in thread "FixedActiveThreadsExecutor@849727439-1"] 1

I can reproduce eagerly to fail the test by adding ForkJoinPool.commonPool() to the class constructor, which leads to opening jdk.internal.vm and jdk.internal.access so that the tests pass.

ben-manes added a commit to ben-manes/caffeine that referenced this issue Nov 27, 2022
Lincheck's classloader will rewrite usages of ForkJoin's commonPool to an
instrumented instance. This requires opening additional modules for access,
or else a static initialization failure is thrown and the class is left in
an invalid state. This results in the test rerunning indefinitely. Once
this module restriction is lifted it is able to proceed as expected and
validate that the cache passes linearization tests.

JetBrains/lincheck#130
@alefedor
Copy link
Contributor

Hi @ben-manes !

So, in short, it seems the error is caused by lazy static initialization in concurrent threads leading to a deadlock.
Interesting. This behaviour is likely due to cyclic class dependencies, though it is yet unclear why exactly this happens

@ben-manes
Copy link
Author

Hi @alefedor

The summary is that ForkJoinPool now imports classes from jdk.internal.access and jdk.internal.vm. This is likely something to do with virtual threads. When asm rewrites to a shaded instance the module system restricts access to these classes, which fails the initialization of the common pool. As that is a static instance this happens during classloading, the entire ForkJoinPool class fails to be loaded and a linkage error is thrown. Unfortunately, Lincheck swallows all throwable exceptions blindly, whereas Error types should always be propagated and never handled directly. This results in the test retrying forever and not reporting what went wrong.

The important fix is to always rethrow java.lang.Error types so that the test terminates early and reports the failure, as this exception type is meant to indicate that the JVM is in an unrecoverable state. Then if additional modules are required in future runtimes, users can quickly diagnose and make the necessary changes.

The user change is to include add-opens directives for jdk.internal.vm and jdk.internal.access. As ForkJoinPool is listed in your transformations, it seems that this should be in your README for Java 9+. This is less pressing as merely outdated documentation.

@ndkoval
Copy link
Collaborator

ndkoval commented Jan 18, 2024

The problem should be resolved with the changes for #136. I've also created an issue about propagating Errors (#258).

@ndkoval
Copy link
Collaborator

ndkoval commented Apr 29, 2024

Hi, @ben-manes! It has taken a while to address the issue, but the recent 2.30 release should've fixed it. Could you please check?

@ben-manes
Copy link
Author

oh yes, this is fixed. I am on 2.29 now as I ran into a bug where Lincheck self terminates when its hung,

Gradle suite > Gradle test > com.github.benmanes.caffeine.lincheck.CaffeineLincheckTest$BoundedLincheckTest > modelCheckingTest FAILED
    org.jetbrains.kotlinx.lincheck.LincheckAssertionError:
    = The execution has hung, see the thread dump =

I reported it under #311 but my simplification was a misdirect, so I'll update that issue title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants