Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValidateLayerConsistencyTest is flaky #4211

Open
matthiasblaesing opened this issue Jun 8, 2022 · 10 comments
Open

ValidateLayerConsistencyTest is flaky #4211

matthiasblaesing opened this issue Jun 8, 2022 · 10 comments
Labels
kind:bug Bug report or fix tests

Comments

@matthiasblaesing
Copy link
Contributor

matthiasblaesing commented Jun 8, 2022

Apache NetBeans version

Apache NetBeans latest daily build

What happened

The ValidateLayerConsistencyTest seems to be flaky:

https://github.com/apache/netbeans/blob/2bdbb637653cf19179963a3150add8f99d8b6977/platform/o.n.core/test/qa-functional/src/org/netbeans/core/validation/ValidateLayerConsistencyTest.java

Sometimes fails in an assertion:

2022-06-08T18:30:24.0613903Z     [junit] FINE [org.openide.util.lookup.implspi.ActiveQueue]: Got dequeued reference org.openide.util.lookup.AbstractLookup$ReferenceToResult@597289e4
2022-06-08T18:30:24.0614562Z     [junit] FINE [org.openide.util.lookup.implspi.ActiveQueue]: Got dequeued reference org.openide.util.lookup.AbstractLookup$ReferenceToResult@67cb9a30
2022-06-08T18:30:27.1650822Z     [junit] java.lang.AssertionError: Has to be NbRepository: org.openide.filesystems.Repository@5cf749fa Initialization stack: java.lang.Exception: initialized
2022-06-08T18:30:27.1651908Z     [junit] 	at org.openide.filesystems.Repository.<init>(Repository.java:326)
2022-06-08T18:30:27.1652502Z     [junit] 	at org.openide.filesystems.Repository$3.call(Repository.java:373)
2022-06-08T18:30:27.1652987Z     [junit] 	at org.openide.filesystems.Repository$3.call(Repository.java:368)
2022-06-08T18:30:27.1653709Z     [junit] 	at org.openide.filesystems.Repository.delayFileSystemAttachImpl(Repository.java:1018)
2022-06-08T18:30:27.1654297Z     [junit] 	at org.openide.filesystems.Repository.getDefault(Repository.java:368)
2022-06-08T18:30:27.1654794Z     [junit] 	at org.netbeans.core.startup.Main.start(Main.java:298)
2022-06-08T18:30:27.1655764Z     [junit] 	at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-06-08T18:30:27.1656203Z     [junit] 	at java.lang.Thread.run(Thread.java:750)
2022-06-08T18:30:27.1656523Z     [junit] 
2022-06-08T18:30:27.1656885Z     [junit] 	at org.netbeans.core.startup.Main.start(Main.java:298)
2022-06-08T18:30:27.1657532Z     [junit] 	at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-06-08T18:30:27.1657948Z     [junit] 	at java.lang.Thread.run(Thread.java:750)

The full log of the run is zipped: fulllog.txt.zip (The quoted part start in line 109186).

The listed stack trace was retrieved using an idea from @jtulach, by creating a fake exception in the Repository constructor, fetching the stacktrace from it and storing it into a public field of that class.

This seems to be a race when initializing the META-INF based services. The NbRepository that is expected, is registered as such a service and thus should be available from the default lookup.

This is the code in master (at time of writing):

public static Repository getDefault() {
final Lookup lkp = Lookup.getDefault();
synchronized (Repository.class) {
if (repository != null) {
return repository;
}
}
try {
Repository newRepo = delayFileSystemAttachImpl(new Callable<Repository>() {
public Repository call() throws Exception {
Repository r = lkp.lookup(Repository.class);
if (r == null) {
// if not provided use default one
r = new Repository(new MainFS());
}
return r;
}
});
synchronized (Repository.class) {
if (repository == null) {
repository = newRepo;
}
return repository;
}
} catch (IOException ex) {
throw new IllegalStateException(ex);
}
}

and the wrong Repository is created in line 362.

Did this work correctly in an earlier version?

No

Operating System

Linux

JDK

JDK 8 - 19 all show behavior sometimes

Apache NetBeans packaging

Other

@jglick
Copy link
Contributor

jglick commented Jun 9, 2022

WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@7a7b0070 (from SystemClassLoader[807 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@7ac7a4e4

While the race condition is under investigation, perhaps the test could detect the corruption and skip itself. Not sure how, given that it seems to have crashed the JVM rather than failing properly.

@matthiasblaesing
Copy link
Contributor Author

If I read the sequence correctly, the error happens in the startup sequenze of the tested NetBeans instance. The error roots in an assert in that sequenze. We can't change that to a JUnit assumption.

What is more, the error seems to only manifest on the CI/CD infrastructure, but not on my laptop.

@vieiro
Copy link
Contributor

vieiro commented Jul 4, 2022

Hi, this failed today in #4327 .

Quoting from the build log before it's deleted we can read (1)

2022-07-03T20:40:22.2367088Z     [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.2368142Z     [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.spi.ArchiveRootProvider is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343

Question is, if org.openide.filesystems.Repository is not "the real McCoy", how is the synchronized(Repository.class) block that @matthiasblaesing quoted above expected to behave?

I mean, if Repository.class is not "the real McCoy" then the synchronized(Repository.class) must be synchronizing in a "fake McCoy" and not in the "real McCoy", right?

BTW, who is McCoy?

Cheers,
Antonio

(1)

2022-07-03T20:40:22.2342797Z     [junit] java.lang.AssertionError: Has to be NbRepository: org.openide.filesystems.Repository@51b9c375
2022-07-03T20:40:22.2353885Z     [junit] 	at org.netbeans.core.startup.Main.start(Main.java:298)
2022-07-03T20:40:22.2354312Z     [junit] 	at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
2022-07-03T20:40:22.2354723Z     [junit] 	at java.base/java.lang.Thread.run(Thread.java:829)
2022-07-03T20:40:22.2355358Z     [junit] -------------------------------------------------------------------------------
2022-07-03T20:40:22.2355740Z     [junit] >Log Session: Sunday, July 3, 2022 at 8:40:21 PM Coordinated Universal Time
2022-07-03T20:40:22.2356013Z     [junit] >System Info: 
2022-07-03T20:40:22.2356487Z     [junit]   Product Version         = Apache NetBeans Platform Dev (Build dev-53a6ce54c0cf1cae704977185237f5125c3aa361)
2022-07-03T20:40:22.2356976Z     [junit]   Operating System        = Linux version 5.13.0-1031-azure running on amd64
2022-07-03T20:40:22.2357445Z     [junit]   Java; VM; Vendor        = 11.0.15; OpenJDK 64-Bit Server VM 11.0.15+10-LTS; Azul Systems, Inc.
2022-07-03T20:40:22.2357895Z     [junit]   Runtime                 = OpenJDK Runtime Environment 11.0.15+10-LTS
2022-07-03T20:40:22.2358317Z     [junit]   Java Home               = /opt/hostedtoolcache/Java_Zulu_jdk/11.0.15-10/x64
2022-07-03T20:40:22.2358666Z     [junit]   System Locale; Encoding = en (nb); UTF-8
2022-07-03T20:40:22.2358932Z     [junit]   Home Directory          = /home/runner
2022-07-03T20:40:22.2359232Z     [junit]   Current Directory       = /home/runner/work/netbeans/netbeans
2022-07-03T20:40:22.2359744Z     [junit]   User Directory          = /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/work/userdir1
2022-07-03T20:40:22.2360337Z     [junit]   Cache Directory         = /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/work/userdir1/var/cache
2022-07-03T20:40:22.2360747Z     [junit]   Installation            = /home/runner/work/netbeans/netbeans/nbbuild/netbeans/ide
2022-07-03T20:40:22.2361112Z     [junit]                             /home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform
2022-07-03T20:40:22.2361512Z     [junit]   Boot & Ext. Classpath   = 
2022-07-03T20:40:22.2362980Z     [junit]   Application Classpath   = loaded by URLClassLoader file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util-ui.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/boot.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-util-lookup.jar file:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/lib/org-openide-modules.jar file:/opt/hostedtoolcache/Java_Zulu_jdk/11.0.15-10/x64/lib/jrt-fs.jar
2022-07-03T20:40:22.2365166Z     [junit]   Startup Classpath       = /home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/core-base.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-openide-filesystems-compat8.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-netbeans-libs-asm.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/org-openide-filesystems.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-commons-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/asm-tree-9.3.jar:/home/runner/work/netbeans/netbeans/nbbuild/netbeans/platform/core/core.jar
2022-07-03T20:40:22.2366364Z     [junit] -------------------------------------------------------------------------------
2022-07-03T20:40:22.2367088Z     [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.Repository is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.2368142Z     [junit] WARNING [org.openide.util.lookup.MetaInfServicesLookup]: org.openide.filesystems.spi.ArchiveRootProvider is not the real McCoy! Actually found it in org.netbeans.MainImpl$BootClassLoader@77e9807f (from SystemClassLoader[820 modules]) but searched for from org.netbeans.MainImpl$BootClassLoader@1a677343
2022-07-03T20:40:22.5039755Z     [junit] Testsuite: org.netbeans.core.validation.ValidateLayerConsistencyTest
2022-07-03T20:40:22.5108340Z     [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec
2022-07-03T20:40:22.5109036Z     [junit] 
2022-07-03T20:40:22.5109530Z     [junit] Testcase: org.netbeans.core.validation.ValidateLayerConsistencyTest:testActionInstancesOnlyInActionsFolder:	Caused an ERROR
2022-07-03T20:40:22.5110096Z     [junit] Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
2022-07-03T20:40:22.5110632Z     [junit] junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
2022-07-03T20:40:22.5111165Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5113034Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5113714Z     [junit] 	at java.base/java.util.Vector.forEach(Vector.java:1388)
2022-07-03T20:40:22.5114263Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5114848Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5115435Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5117187Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5117766Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5119554Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5120205Z     [junit] 	at java.base/java.util.Vector.forEach(Vector.java:1388)
2022-07-03T20:40:22.5120739Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5121244Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5121926Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5122469Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5122965Z     [junit] 	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
2022-07-03T20:40:22.5123470Z     [junit] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-07-03T20:40:22.5123843Z     [junit] 
2022-07-03T20:40:22.5124022Z     [junit] 
2022-07-03T20:40:22.5554527Z     [junit] Test org.netbeans.core.validation.ValidateLayerConsistencyTest FAILED (crashed)
2022-07-03T20:40:22.7809459Z     [junit] WARNING: package com.apple.eio not in java.desktop
2022-07-03T20:40:22.8727524Z     [junit] Testsuite: org.netbeans.core.validation.ValidateModulesTest
2022-07-03T20:40:23.4814479Z     [junit] System class loader patches: /home/runner/work/netbeans/netbeans/platform/o.n.core/build/test/qa-functional/classes

@vieiro
Copy link
Contributor

vieiro commented Jul 4, 2022

Ah, I now see @jglick already posted about McCoy before...

@ebresie
Copy link
Contributor

ebresie commented Jul 6, 2022

When creating the new repository instance (i.e. when null) should that code be synchronized as well?

@jglick
Copy link
Contributor

jglick commented Jul 6, 2022

https://en.wikipedia.org/wiki/The_real_McCoy FYI

@matthiasblaesing
Copy link
Contributor Author

@vieiro

I mean, if Repository.class is not "the real McCoy" then the synchronized(Repository.class) must be synchronizing in a "fake McCoy" and not in the "real McCoy", right?

synchronized(Repository.class) means, that the synchronisation happens on the class Repository, not on a single instance of it. The problem happens in the code path, that returns the default repository and in there is a race condition. I suspect, that there is some Lookup code involved, that does some postprocessing of the results and delays the return, so that the default Repository is created.

@ebresie

When creating the new repository instance (i.e. when null) should that code be synchronized as well?

See my answer to @vieiro you can create as many repositories as you like. The problem is that only one Repository is used as the system wide repository and that needs to be the right one.

@vieiro
Copy link
Contributor

vieiro commented Jul 6, 2022

@matthiasblaesing

synchronized(Repository.class) means, that the synchronisation happens on the class Repository,

Well, of course. I was wondering if the class Repository may be coming from different ClassLoaders at different times (first for "the real McCoy" and later "for the fake McCoy", or viceversa). If so then that could be the race condition we're looking for, right? If this is possible then maybe we could replace the synchronized(Repository.class) , such as this: master...vieiro:netbeans:hotfix/4211

@jglick
Copy link
Contributor

jglick commented Jul 6, 2022

c0313ec does not look like it would change anything.

Anyway I doubt the issue is about the synchronized statements. lkp.lookup(Repository.class) is returning null because there are two BootClassLoaders in play, somehow, and thus two loaded copies of platform/core/org-openide-filesystems.jar.

Normal behavior for the test would be for the class loader to be created like this

at org.netbeans.MainImpl$BootClassLoader.<init>(MainImpl.java:248)
at org.netbeans.MainImpl.execute(MainImpl.java:154)
at org.netbeans.MainImpl.main(MainImpl.java:60)
at org.netbeans.Main.main(Main.java:58)
at …
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.netbeans.junit.NbModuleSuite$S.runInRuntimeContainer(NbModuleSuite.java:905)
at org.netbeans.junit.NbModuleSuite$S.access$100(NbModuleSuite.java:660)
at org.netbeans.junit.NbModuleSuite$S$1.protect(NbModuleSuite.java:681)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at org.netbeans.junit.NbModuleSuite$S.run(NbModuleSuite.java:677)
at …

and the NbRepository to be created like this

at org.openide.filesystems.Repository.<init>(Repository.java:319)
at org.netbeans.core.startup.NbRepository.<init>(NbRepository.java:54)
at …
at java.base/java.lang.Class.newInstance(Class.java:645)
at org.openide.util.lookup.implspi.SharedClassObjectBridge.newInstance(SharedClassObjectBridge.java:41)
at org.openide.util.lookup.MetaInfServicesLookup$Item.getInstance(MetaInfServicesLookup.java:489)
at org.openide.util.lookup.AbstractLookup.lookup(AbstractLookup.java:397)
at org.openide.util.lookup.ProxyLookup.lookup(ProxyLookup.java:281)
at org.openide.filesystems.Repository$3.call(Repository.java:361)
at org.openide.filesystems.Repository$3.call(Repository.java:359)
at org.openide.filesystems.Repository.delayFileSystemAttachImpl(Repository.java:1009)
at org.openide.filesystems.Repository.getDefault(Repository.java:359)
at org.netbeans.core.startup.Main.start(Main.java:281)
at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:98)
at java.base/java.lang.Thread.run(Thread.java:833)

I suspect the problem somehow pertains to JUnit or at least NbModuleSuite: that the failing test run is actually the second time NetBeans has been loaded in the JVM. The first time (possibly from a different test case), the default Lookup got an NbRepository as usual and cached it. The second time, the same Lookup class was used; when the new BootClassLoader was created, a new copy of Repository was loaded, and Repository.getDefault was called, the old NbRepository was found but the “real McCoy” defense detected that this was not actually assignable to the Repository class being queried, so it bailed out. Of course this is all not supposed to be possible, because each test run is supposed to load Lookup and everything else from scratch:

// loader that does not see our current classloader
JUnitLoader junit = new JUnitLoader(config.parentClassLoader, NbModuleSuite.class.getClassLoader());
URLClassLoader loader = new URLClassLoader(bootCP.toArray(new URL[0]), junit);
Class<?> main = loader.loadClass("org.netbeans.Main"); // NOI18N
Assert.assertEquals("Loaded by our classloader", loader, main.getClassLoader());

You can see from the build log that Log Session is printed twice for ValidateLayerConsistencyTest, implying that TopThreadGroup is being run twice.

@ebresie
Copy link
Contributor

ebresie commented Jul 9, 2022

Is some sort of additional tear down logic needed in some context (after test or before test)to prevent or something to setup this correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Bug report or fix tests
Projects
None yet
Development

No branches or pull requests

4 participants