Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdLineTester_callsitedbgddrext_openj9 fail with jdk8,11,15 on linux and Macos #50

Open
sophia-guo opened this issue Dec 10, 2020 · 18 comments
Labels
bug Something isn't working upstream bug
Projects

Comments

@sophia-guo
Copy link
Contributor

https://github.com/AdoptOpenJDK/run-aqa/runs/1526577429?check_suite_focus=true

The failure happened before. Looks like an intermittent one.

Testing: Run !printallcallsites
Test start time: 2020/12/09 21:47:46 Coordinated Universal Time
Running command: /opt/hostedtoolcache/jdk-11-openj9/1.0.0/x64/bin/jdmpview -core j9core.dmp
Time spent starting: 2 milliseconds
Time spent executing: 1546 milliseconds
Test result: FAILED
Output from test:
[OUT] DTFJView version 4.29.5, using DTFJ version 1.12.29003
[OUT] Loading image from DTFJ...
[OUT]
[OUT] Could not load dump file and/or could not load XML file: null
[OUT] For a list of commands, type "help"; for how to use "help", type "help help"
[OUT] > DDR is not enabled for this core file, '!' commands are disabled
[OUT] >

Success condition was not found: [Output match: jvminit.c]
Failure condition was not found: [Output match: DDRInteractiveCommandException]
Failure condition was not found: [Output match: no shared cache]
Failure condition was not found: [Output match: unable to read]
Failure condition was not found: [Output match: could not read]
Failure condition was not found: [Output match: dump event]

Testing: Run !findallcallsites
Test start time: 2020/12/09 21:47:48 Coordinated Universal Time
Running command: /opt/hostedtoolcache/jdk-11-openj9/1.0.0/x64/bin/jdmpview -core j9core.dmp
Time spent starting: 4 milliseconds
Time spent executing: 914 milliseconds
Test result: FAILED
Output from test:
[OUT] DTFJView version 4.29.5, using DTFJ version 1.12.29003
[OUT] Loading image from DTFJ...
[OUT]
[OUT] Could not load dump file and/or could not load XML file: null
[OUT] For a list of commands, type "help"; for how to use "help", type "help help"
[OUT] > DDR is not enabled for this core file, '!' commands are disabled
[OUT] >

Success condition was not found: [Output match: jvminit.c]
Failure condition was not found: [Output match: DDRInteractiveCommandException]
Failure condition was not found: [Output match: no shared cache]
Failure condition was not found: [Output match: unable to read]
Failure condition was not found: [Output match: could not read]
Failure condition was not found: [Output match: dump event]

@karianna karianna added this to To do in run-aqa via automation Dec 10, 2020
@karianna karianna added the bug Something isn't working label Dec 10, 2020
@smlambert smlambert removed the bug Something isn't working label Dec 10, 2020
@smlambert
Copy link
Contributor

Removing bug label, as that would indicate there is a defect in the run-aqa repo (yet TBD).

Same issue is seen in newly added OpenJ9 Jenkins _cm builds (mentioned today in their general Slack channel), may be related to a recent change eclipse-openj9/openj9#11085

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Dec 11, 2020

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Dec 11, 2020

@sophia-guo
Copy link
Contributor Author

Looks like it's not related with eclipse-openj9/openj9#11085. Tried with the jdk from Oct 30 and get the same issue with run-aqa.

In Adopt tests pass on linux https://ci.adoptopenjdk.net/view/Test_functional/job/Test_openjdk11_j9_sanity.functional_x86-64_linux/
and fail on mac https://ci.adoptopenjdk.net/view/Test_functional/job/Test_openjdk11_j9_sanity.functional_x86-64_mac/ with message:
Screen Shot 2021-01-04 at 5 13 16 PM
, which means the core file was not created.

Also with grinder the problem can also be consistently reproduced on https://ci.adoptopenjdk.net/computer/test-aws-rhel8-x64-1/

https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox/192/consoleFull

15:44:54   [OUT] DTFJView version 4.29.5, using DTFJ version 1.12.29003
15:44:54   [OUT] Loading image from DTFJ...
15:44:54   [OUT] 
15:44:54   [OUT] Could not load dump file and/or could not load XML file: Image file '/home/jenkins/workspace/grinder_sandbox/openjdk-tests/TKG/test_output_16097930785750/cmdLineTester_callsitedbgddrext_openj9_0/j9core.dmp' not found.
15:44:54   [OUT] For a list of commands, type "help"; for how to use "help", type "help help"
15:44:54   [OUT] > DDR is not enabled for this core file, '!' commands are disabled
15:44:54   [OUT] > 
15:44:54  >> Success condition was not found: [Output match: jvminit.c]
15:44:54  >> Failure condition was not found: [Output match: DDRInteractiveCommandException]
15:44:54  >> Failure condition was not found: [Output match: no shared cache]
15:44:54  >> Failure condition was not found: [Output match: unable to read]
15:44:54  >> Failure condition was not found: [Output match: could not read]
15:44:54  >> Failure condition was not found: [Output match: dump event]

@sophia-guo
Copy link
Contributor Author

On ubuntu runners the failure is Could not load dump file and/or could not load XML file: null. Checking the uploaded test results we can see the file j9core.dmp exists but its empty( size ZERO).

On macos runners the failures is Image file '/Users/runner/work/runaqaTest/runaqaTest/openjdk-tests/TKG/test_output_16097963089553/cmdLineTester_callsitedbgddrext_openj9_0/j9core.dmp' not found.

Combining the comment of #50 (comment) looks like the test "Create core file" fails. No correct core file is created and test passed, which should fail.

The root reason is why the file was not created correctly. Looks like related with machine configuration as tests pass on most linux machine and fail on https://ci.adoptopenjdk.net/computer/test-aws-rhel8-x64-1/.

Is there specific|special machine requirements for creating the core file? @pshipton

@pshipton
Copy link

pshipton commented Jan 5, 2021

@sophia-guo you can try running java -Xcheck:dump on the machine. https://www.eclipse.org/openj9/docs/xcheck/#dump

@sophia-guo
Copy link
Contributor Author

Thanks @pshipton . Yes, I did get following message by running with -Xcheck:dump on ubuntu runner:

**JVMJ9VM135W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that core dumps are to be piped to an external program. The JVM may be unable to locate core dumps and rename them**.
Beginning puzzle.  Solving for 2 disks.
Moved disk 0 to 1
Moved disk 0 to 2
Moved disk 1 to 2
Puzzle solved!
JVMDUMP039I Processing dump event "vmstop", detail "#0000000000000000" at 2021/01/05 20:44:33 - please wait.
JVMDUMP032I JVM requested System dump using '/home/runner/work/runaqaTest/runaqaTest/openjdk-tests/TKG/test_output_16098794625591/cmdLineTester_callsitedbgddrext_openj9_test_0/j9core.dmp' in response to an event
**JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.5623**.

So would try disabling ABRT.

@sophia-guo
Copy link
Contributor Author

On mac runner:

create the core file:

Beginning puzzle.  Solving for 2 disks.
Moved disk 0 to 1
Moved disk 0 to 2
Moved disk 1 to 2
Puzzle solved!
JVMDUMP039I Processing dump event "vmstop", detail "#0000000000000000" at 2021/01/05 20:48:10 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/runner/work/runaqaTest/runaqaTest/openjdk-tests/TKG/test_output_16098796374024/cmdLineTester_callsitedbgddrext_openj9_test_0/j9core.dmp' in response to an event
**JVMDUMP012E Error in System dump: The core file created by child process with pid = 1662 was not found. Expected to find core file with name "/cores/core.1662"**
JVMDUMP013I Processed dump event "vmstop", detail "#0000000000000000".

@pshipton
Copy link

pshipton commented Jan 5, 2021

Seems to me that Mac signed builds (as produced by Adopt) can't create core files when running on newer OSes (10.15+ ?)

@pshipton
Copy link

pshipton commented Jan 5, 2021

For Ubuntu, the doc suggests checking /var/log/apport.log

@sophia-guo
Copy link
Contributor Author

Seems to me that Mac signed builds (as produced by Adopt) can't create core files when running on newer OSes (10.15+ ?)

All mac runners are 10.15+ https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners.

However same issue happened in AdoptOpenJDK: https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.functional_x86-64_mac/74/consoleFull and the mac is labeled as https://ci.adoptopenjdk.net/label/macos10.10/ , which looks like is 10.10.

@pshipton
Copy link

pshipton commented Jan 5, 2021

OpenJ9 doesn't have any 10.10 machines so we can't compare, they are all 10.11, 10.13, 10.14.

@jdekonin are you aware of any configuration settings that need to be done on Mac in order to get proper core files?
Similar question for Ubuntu where the core file is zero size. In theory it's not the ulimit since -Xcheck:dump should have flagged that.

@smlambert
Copy link
Contributor

not sure if related: adoptium/infrastructure#1282

@jdekonin
Copy link

jdekonin commented Jan 6, 2021

As @smlambert mentioned in the previous comment and related issue, I had found that the user needed to be granted permissions to write to the /cores folder on osx.

https://developer.apple.com/library/archive/technotes/tn2124/_index.html; search Core Dumps

@sophia-guo
Copy link
Contributor Author

@pshipton The file /var/log/apport.log is empty. So I tried to disable the apport looks good. https://github.com/sophia-guo/runaqaTest/runs/1657422047?check_suite_focus=true

@sophia-guo
Copy link
Contributor Author

Looks like there are two different issues here with Macos.

In Adopt environment there is no issues about permission. I have tried to modify the tests to directly create the dmp file to /cores/j9core.dmp and jdumpview the file and looks good on both mac 10.10 and mac 10.14.

https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox/197/console
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox/199/console

The problem is the dump file cannot be written to the directory other than /cores ( /cores directory is writable under currently configuration) . @jdekonin is there any configuration settings that can enable dump files written to any user preferred location?

@sophia-guo
Copy link
Contributor Author

sophia-guo commented Jan 7, 2021

In github mac runners it's different. I've tried enable the permission and tried to create the dump file to /core/j9core.dmp and the file can be found.

sudo chown root:admin /cores
sudo chmod 777 /cores
sudo  ulimit -c unlimited

The message is same as before The core file created by child process with pid = 1662 was not found. Expected to find core file with name "/cores/core.1662". And upload the /cores directory and shows the /cores directory is empty.

@sophia-guo sophia-guo added the bug Something isn't working label Jan 7, 2021
@sophia-guo
Copy link
Contributor Author

sophia-guo commented Jan 19, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream bug
Projects
No open projects
run-aqa
  
To do
Development

No branches or pull requests

5 participants