Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 577508 - Fixing regression #38

Merged
merged 1 commit into from Apr 28, 2022
Merged

Conversation

jarthana
Copy link
Member

Signed-off-by: Jay Arthanareeswaran jarthana@in.ibm.com

@stephan-herrmann
Copy link
Contributor

Hi Jay, you seem to be hitting the same freeze that is blocking me for some days already in #28

I added a bunch of sysouts to debug this (which doesn't happen in the IDE), see result in https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/view/change-requests/job/PR-28/10/consoleText

All I know by now is: the execution stalls when ASTConverterBugTest.setUpSuite() calls waitUntilIndexesReady(), and inside it's engine.searchAllTypeNames() the never returns.

@iloveeclipse , @gayanper I believe the two of you know best what might have changed in indexing / searching. Can you help?

@jarthana
Copy link
Member Author

Thanks Stephan for point that out. I was wondering if that was something about publishing the test results since I assumed that the model tests had completed.

@jarthana
Copy link
Member Author

Stephan, I did a comparison between one of the passed runs [1] and last build for this PR [2] and I don't see the tests being failing or freezing things. I suspect things go wrong after an attempt to "Recording test results". Also I see this being printed in the failed run - "script returned exit code 143".

I wonder if the test result is too big and that causes something to blow up. I have seen something like this in the past with gerrit when I had forgotten to remote the DEBUG AUTOMATON flag in parser. But that's not the case here.

[1] https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/view/change-requests/job/PR-34/5/console
[2] https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/view/change-requests/job/PR-38/4/console

@sravanlakkimsetti
Copy link
Member

sravanlakkimsetti commented Apr 26, 2022

Can you try running with tycho 2.7.0. You will need to add "-Dtycho.version=2.7.0" to the maven command present in the Jenkinsfile from repo root

@stephan-herrmann
Copy link
Contributor

Stephan, I did a comparison between one of the passed runs [1] and last build for this PR [2] and I don't see the tests being failing or freezing things. I suspect things go wrong after an attempt to "Recording test results". Also I see this being printed in the failed run - "script returned exit code 143".

Please see that I already narrowed it down to freezing within engine.searchAllTypeNames() as invoked indirectly from ASTConverterBugTest.setUpSuite(). Since suite set-up happens outside any individual test it just looks as if no testing is active, where indeed test set-up is active.

@iloveeclipse
Copy link
Member

Staphan, sorry, I had no time yet to look into, but don't we use in jdt a timer thread to report thread dumps on deadlocks/hangs? I'm pretty sure we do this in jdt or platform debug, so if jdt core misses that, we should add it asap so we don't need to guess why build is hanging.

@stephan-herrmann
Copy link
Contributor

After jumping through some more maven hoops, I managed to get stacktraces from the freeze: freeze-jstack.txt

two interesting portions extracted here:

"main" #1 prio=5 os_prio=0 cpu=32921,05ms elapsed=181,43s tid=0x00007fdd680261e0 nid=0x35536 in Object.wait()  [0x00007fdd6fb78000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(java.base@17/Native Method)
	- waiting on <no object reference available>
	at org.eclipse.jdt.internal.core.search.processing.JobManager.performConcurrentJob(JobManager.java:284)
	- locked <0x00000000c05bbbe0> (a org.eclipse.jdt.internal.core.search.indexing.IndexManager)
	at org.eclipse.jdt.internal.core.search.BasicSearchEngine.searchAllTypeNames(BasicSearchEngine.java:1923)
	at org.eclipse.jdt.internal.core.search.BasicSearchEngine.searchAllTypeNames(BasicSearchEngine.java:1751)
	at org.eclipse.jdt.core.search.SearchEngine.searchAllTypeNames(SearchEngine.java:1100)
	at org.eclipse.jdt.core.tests.model.AbstractJavaModelTests.waitUntilIndexesReady(AbstractJavaModelTests.java:3967)
	at org.eclipse.jdt.core.tests.dom.ASTConverterBugsTest.setUpSuite(ASTConverterBugsTest.java:41)
...
"Java indexing" #31 daemon prio=5 os_prio=0 cpu=871,25ms elapsed=178,36s tid=0x00007fdd68640200 nid=0x355bb in Object.wait()  [0x00007fdd402f9000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@17/Native Method)
	- waiting on <no object reference available>
	at java.lang.Object.wait(java.base@17/Object.java:338)
	at org.eclipse.jdt.internal.core.search.processing.JobManager.run(JobManager.java:419)
	- locked <0x00000000c05bbbe0> (a org.eclipse.jdt.internal.core.search.indexing.IndexManager)
	at java.lang.Thread.run(java.base@17/Thread.java:833)

I read this as: search is waiting for a background job, but no thread has any matching activity.

@stephan-herrmann
Copy link
Contributor

Staphan, sorry, I had no time yet to look into, but don't we use in jdt a timer thread to report thread dumps on deadlocks/hangs? I'm pretty sure we do this in jdt or platform debug, so if jdt core misses that, we should add it asap so we don't need to guess why build is hanging.

I don't know how, where these timers are implemented. But I can see that timeout during an individual test execution produces stack traces, but timeout during suite set-up does not.

@iloveeclipse
Copy link
Member

OK, found it: org.eclipse.jdt.core.tests.model.FreezeMonitor is used in JDT model tests only, see org.eclipse.jdt.core.tests.model.SuiteOfTestCases.setUp() / teardown.

@stephan-herrmann
Copy link
Contributor

Can you try running with tycho 2.7.0. You will need to add "-Dtycho.version=2.7.0" to the maven command present in the Jenkinsfile from repo root

Thanks for the hint. Tried this locally. No difference.

At this point my time for investigation is up.

@jarthana
Copy link
Member Author

Can you try running with tycho 2.7.0. You will need to add "-Dtycho.version=2.7.0" to the maven command present in the Jenkinsfile from repo root

Tried, but didn't help. I can still see this being stuck.

@iloveeclipse
Copy link
Member

I don't know what is wrong with this and other PR that hangs, but my PR was done in 30 minutes, see #40 and https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/job/PR-40/1/

I will merge #40 now, so please rebase this change to see if it will properly report freeze (assuming freeze is in one of the the model test suites setup)

@iloveeclipse
Copy link
Member

@jarthana : please undo the bcd1997, it is not related and we want to use "default" Tychon.

Interestingly, your last PR build, even if your PR is NOT rebased (at least not in github) on my #40 change, shows the code that uses my change. So does jenkins automatic rebase of PR's ??? Why that???

image

@iloveeclipse
Copy link
Member

So with the new thread dump reported in https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/job/PR-38/6/console
we see that we have here #41

15:27:28  Possible frozen test case
15:27:28  "main": TIMED_WAITING
15:27:28      java.base@17.0.1/java.lang.Object.wait(Native Method)
15:27:28      org.eclipse.jdt.internal.core.search.processing.JobManager.performConcurrentJob(JobManager.java:284)
15:27:28      org.eclipse.jdt.internal.core.search.BasicSearchEngine.searchAllTypeNames(BasicSearchEngine.java:1923)
15:27:28      org.eclipse.jdt.internal.core.search.BasicSearchEngine.searchAllTypeNames(BasicSearchEngine.java:1751)
15:27:28      org.eclipse.jdt.core.search.SearchEngine.searchAllTypeNames(SearchEngine.java:1100)
15:27:28      org.eclipse.jdt.core.tests.model.AbstractJavaModelTests.waitUntilIndexesReady(AbstractJavaModelTests.java:3967)
15:27:28      org.eclipse.jdt.core.tests.dom.ASTConverterBugsTest.setUpSuite(ASTConverterBugsTest.java:41)
15:27:28      org.eclipse.jdt.core.tests.model.SuiteOfTestCases$Suite.runTest(SuiteOfTestCases.java:105)
15:27:28      junit.framework.TestSuite.run(TestSuite.java:236)
15:27:28      org.eclipse.jdt.core.tests.model.SuiteOfTestCases$Suite.superRun(SuiteOfTestCases.java:98)
15:27:28      org.eclipse.jdt.core.tests.model.SuiteOfTestCases$Suite$1.protect(SuiteOfTestCases.java:86)
15:27:28      junit.framework.TestResult.runProtected(TestResult.java:142)
15:27:28      org.eclipse.jdt.core.tests.model.SuiteOfTestCases$Suite.run(SuiteOfTestCases.java:95)
15:27:28      junit.framework.TestSuite.runTest(TestSuite.java:241)
15:27:28      junit.framework.TestSuite.run(TestSuite.java:236)
15:27:28      junit.framework.TestSuite.runTest(TestSuite.java:241)
15:27:28      junit.framework.TestSuite.run(TestSuite.java:236)
15:27:28      org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:90)
15:27:28  "Java indexing": WAITING
15:27:28      java.base@17.0.1/java.lang.Object.wait(Native Method)
15:27:28      java.base@17.0.1/java.lang.Object.wait(Object.java:338)
15:27:28      org.eclipse.jdt.internal.core.search.processing.JobManager.run(JobManager.java:419)
15:27:28      java.base@17.0.1/java.lang.Thread.run(Thread.java:833)

But AttachedJavadocTests code is still running too ???

15:27:28  "Thread-4": TIMED_WAITING
15:27:28      java.base@17.0.1/java.lang.Thread.sleep(Native Method)
15:27:28      org.eclipse.jdt.core.tests.model.AttachedJavadocTests$1.run(AttachedJavadocTests.java:770)
15:27:28  

@iloveeclipse
Copy link
Member

It is insane what happens in jenkins.

I've merged fix for jenkins to cancel parallel builds (#45), and see that after this merge all opened PR's started automatic rebuild in Jenkins:

image

If I understand it right, all opened PR's that aren't rebased yet on #45 will continue automatic rebuilds??? OMG.

@iloveeclipse
Copy link
Member

I believe all the trouble here could be caused by unfortunate jenkins setup or really by the actual code changes.
I haven't seen any hangs on my PR's I've created today, and I believe parallel job execution should be fixed now.

I've also tried #46 which enables JobManager.VERBOSE mode for tests, all tests were all green there plus one get an idea what the indexer does (but beware: the console output is about 60 MB).

I do not plan to merge that, because the console will explode - but feel free to do try that change too, in case hanging isn't solved after rebase on master here.

@stephan-herrmann
Copy link
Contributor

So with the new thread dump reported in https://ci.eclipse.org/jdt/job/eclipse.jdt.core-Github/job/PR-38/6/console we see that we have here #41
...
But AttachedJavadocTests code is still running too ???

15:27:28  "Thread-4": TIMED_WAITING
15:27:28      java.base@17.0.1/java.lang.Thread.sleep(Native Method)
15:27:28      org.eclipse.jdt.core.tests.model.AttachedJavadocTests$1.run(AttachedJavadocTests.java:770)
15:27:28  

That's a very interesting observation! Here's what AttachedJavadocTests$1 is doing (in a thread of its own):

				public void run() {
					Object javadocContent = projectInfo.javadocCache.get(type);
					while(javadocContent == null || javadocContent == BinaryType.EMPTY_JAVADOC) {
						try {
							Thread.sleep(50);
							javadocContent = projectInfo.javadocCache.get(type);
						} catch (InterruptedException e) {
						}
						synchronized (varThis) {
							varThis.notify();
						}
					}
				}

Interestingly, the enclosing test testBug329671() succeeded even though it's temp thread never got the expected result. OTOH, my stacktrace in #38 (comment) does not show this test, so it can't be the root cause here.

@stephan-herrmann
Copy link
Contributor

@jarthana you may try again after rebase, since #48 fixed this issue for me, and thus hopefully also for this PR (remember to revert the change of tycho version :) ).

I briefly tried to retrigger builds, but (a) did jenkins not find required artifacts, and (b) did I get confused by the difference of ".../jenkins/pr-head" vs. ".../jenkins/pr-merge", perhaps a recent configuration change by @iloveeclipse ?

Signed-off-by: Jay Arthanareeswaran <jarthana@in.ibm.com>
@jarthana
Copy link
Member Author

@jarthana you may try again after rebase, since #48 fixed this issue for me, and thus hopefully also for this PR (remember to revert the change of tycho version :) ).

I briefly tried to retrigger builds, but (a) did jenkins not find required artifacts, and (b) did I get confused by the difference of ".../jenkins/pr-head" vs. ".../jenkins/pr-merge", perhaps a recent configuration change by @iloveeclipse ?

Still run into this. I have no clue what's going on.

@iloveeclipse
Copy link
Member

Still run into this. I have no clue what's going on

@jarthana : you have to rebase your branch on the top of master & force push again.

  1. checkout master branch
  2. pull from origin repository to master branch
  3. checkout bug578244 branch
  4. select master in history view, right click and "rebase HEAD on"
  5. bug578244 branch should be now in rebased state and can be force pushed to jarthana/eclipse.jdt.core repo

@iloveeclipse
Copy link
Member

iloveeclipse commented Apr 27, 2022

Still run into this. I have no clue what's going on

@jarthana : you have to rebase your branch on the top of master & force push again.

OK, I see that the branch was rebased already, but now we have an infrastructure errors, not a hang:

15:45:45  [ERROR] The build could not read 1 project -> [Help 1]
15:45:45  [ERROR]   
15:45:45  [ERROR]   The project org.eclipse.jdt:eclipse.jdt.core:4.24.0-SNAPSHOT (/home/jenkins/agent/workspace/eclipse.jdt.core-Github_PR-38/pom.xml) has 1 error
15:45:45  [ERROR]     Unresolveable build extension: Plugin org.eclipse.tycho:tycho-maven-plugin:3.0.0-SNAPSHOT or one of its dependencies could not be resolved: Failed to collect dependencies at org.eclipse.tycho:tycho-maven-plugin:jar:3.0.0-SNAPSHOT -> org.eclipse.tycho:tycho-core:jar:3.0.0-SNAPSHOT -> org.eclipse.tycho:p2-maven-plugin:jar:3.0.0-SNAPSHOT -> org.eclipse.platform:org.eclipse.equinox.p2.director:jar:2.5.200 -> org.eclipse.platform:org.eclipse.core.jobs:jar:3.10.0: Failed to read artifact descriptor for org.eclipse.platform:org.eclipse.core.jobs:jar:3.10.0: Could not transfer artifact org.eclipse.platform:org.eclipse.core.jobs:pom:3.10.0 from/to cbi-releases (https://repo.eclipse.org/content/repositories/cbi-releases/): transfer failed for https://repo.eclipse.org/content/repositories/cbi-releases/org/eclipse/platform/org.eclipse.core.jobs/3.10.0/org.eclipse.core.jobs-3.10.0.pom: repo.eclipse.org:443 failed to respond -> [Help 2]

I assume something like https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/1223 reported today.
I will open new issue.
See https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/1225

@jarthana jarthana force-pushed the bug578244 branch 2 times, most recently from 644a1a8 to 8af9fa5 Compare April 27, 2022 17:13
@jarthana jarthana merged commit 767bffa into eclipse-jdt:master Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants