Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk19 alinux build with gcc11.2 fails - Failed to get next dwarf CU header #15390

Open
pshipton opened this issue Jun 21, 2022 · 29 comments
Open
Assignees
Milestone

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_linux_gcc11_Personal/4/console

18:07:04  [ 50%] Generating /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/ddr_info/sets/j9ddr.macros
18:07:04  [100%] Generating /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/runtime/j9ddr.dat, /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/superset.dat

18:07:40  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1509 traverse_cu_in_debug_section - Failed to get next dwarf CU header.
18:07:40  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1596 startScan - Failure scanning /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/runtime/libj9gc29.debuginfo
18:07:40  
18:07:40  CMakeFiles/j9ddr.dir/build.make:73: recipe for target '/home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/runtime/j9ddr.dat' failed
@keithc-ca
Copy link
Contributor

#15004 should mean the build is using -gdwarf-4, but the symptoms are like eclipse/omr#6135 which suggests otherwise.
I'll need access to a suitable machine to investigate further.

@pshipton
Copy link
Member Author

I've provided machine details.

@keithc-ca
Copy link
Contributor

keithc-ca commented Jun 22, 2022

gcc-11 seems to be generating debuginfo that even objdump -g doesn't like; here is a small sample of its complaints:

objdump: Error: Invalid location list entry type 8
objdump: Error: Invalid location list entry type 21
objdump: Error: Invalid range list entry type 19

objdump: Error: LEB value too large

objdump: Warning: Corrupt offset (0x00000030) in range entry 2
objdump: Warning: Corrupt offset (0x00000060) in range entry 3
objdump: Warning: Corrupt offset (0x000000b0) in range entry 4

objdump: Warning: Location lists in .debug_loclists section start at 0x0

objdump: Warning: Offset 0x10004f is bigger than .debug_loc section size.
objdump: Warning: Offset 0x100085 is bigger than .debug_loc section size.
objdump: Warning: Offset 0x1000bd is bigger than .debug_loc section size.

objdump: Warning: There are 5545406 unused bytes at the end of section .debug_loc

objdump: Warning: There is a hole [0x0 - 0x103] in .debug_loc section.
objdump: Warning: There is a hole [0x10004f - 0x100085] in .debug_loc section.
objdump: Warning: There is a hole [0x100085 - 0x1000bd] in .debug_loc section.

objdump: Warning: There are 13 unused bytes at the end of section .debug_loclists

objdump: Warning: There is an overlap [0x56e091 - 0xc] in .debug_loc section.
objdump: Warning: There is an overlap [0x56e0b5 - 0xc] in .debug_loc section.

@pshipton
Copy link
Member Author

@vsebe do you know how gcc 11.2 was created for the alinux docker container? Perhaps it's a bad build.

@keithc-ca
Copy link
Contributor

Telling gcc to generate DWARF version 3 yields fewer issues, but doesn't work either, in particular note the version 5 compilation units:

objdump: Warning: There is a hole [0xe744f - 0xe748b] in .debug_loc section.
objdump: Warning: There is a hole [0xee23f - 0xee27b] in .debug_loc section.
objdump: Warning: There is a hole [0x10097d - 0x1009b9] in .debug_loc section.
objdump: Warning: There is a hole [0x10c78e - 0x10c7ca] in .debug_loc section.
objdump: Warning: There is a hole [0x124b01 - 0x124b3d] in .debug_loc section.

objdump: Warning: Invalid pointer size (13) in compunit header, using 4 instead
objdump: Warning: CU at offset 2286526 contains corrupt or unsupported version number: 5.
objdump: Warning: Invalid pointer size (13) in compunit header, using 4 instead
objdump: Warning: CU at offset 2286526 contains corrupt or unsupported version number: 5.

@vsebe
Copy link
Contributor

vsebe commented Jun 22, 2022

The failure occurs on internal machines too. Internal builds do not use docker containers, thus they use a different gcc11 instance/build.

@pshipton
Copy link
Member Author

@vsebe can you pls give gcc 11.3 a try on aarch64 to see if it works any better.

@keithc-ca
Copy link
Contributor

Attempting to use DWARF version 5 doesn't look promising; there are numerous messages like this in the output of objdump -W:

objdump: Warning: CU at offset 75a8d contains corrupt or unsupported version number: 5.

It's unclear whether that's a deficiency of objdump or whether our use of libdwarf via ddrgen will have similar woes.

@pshipton
Copy link
Member Author

Using 11.3 didn't produce anything different.

16:10:40  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1509 traverse_cu_in_debug_section - Failed to get next dwarf CU header.
16:10:40  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1596 startScan - Failure scanning /home/jenkins/workspace/Build_JDK19_aarch64_linux_Personal/build/linux-aarch64-server-release/vm/runtime/libj9gc29.debuginfo

@pshipton
Copy link
Member Author

pshipton commented Jun 23, 2022

Trying a build from levels ~April 19.
https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_linux_Personal/4 with gcc 10.3 works
https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_linux_gcc11_Personal/6 with gcc11 failed the same way

@keithc-ca
Copy link
Contributor

eclipse/omr#6457 seems a significant change that might have triggered the compiler misbehavior in producing the GC shared library. Can you try a consistent set of versions before that.

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

Seems I picked some inconsistent SHAs. I'll try again later.

@pshipton
Copy link
Member Author

Ah, I pushed the openj9 branch to the wrong repo.
https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/185/

@keithc-ca
Copy link
Contributor

In https://openj9-jenkins.osuosl.org/job/Build_JDK19_aarch64_linux_gcc11_Personal/8/ it seems to have failed even earlier (on the first .debuginfo file instead of the second):

18:13:00  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1509 traverse_cu_in_debug_section - Failed to get next dwarf CU header.
18:13:00  Error: /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1596 startScan - Failure scanning /home/jenkins/workspace/Build_JDK19_aarch64_linux_gcc11_Personal/build/linux-aarch64-server-release/vm/runtime/libj9ddr_misc29.debuginfo

pshipton added a commit to pshipton/openj9 that referenced this issue Sep 13, 2023
alinux can't move to 11.2 due to
eclipse-openj9#15390

Signed-off-by: Peter Shipton <Peter_Shipton@ca.ibm.com>
@keithc-ca
Copy link
Contributor

I suspect the alinux build systems need to be updated; omr_ddrgen running on an xlinux system had no trouble reading the .debuginfo files produced in an alinux build. Do all the alinux build systems have up-to-date DWARF support libraries?

@pshipton
Copy link
Member Author

@AdamBrousseau can you pls check an internal machine.

Externally, attempting a build in #18279 using the latest container (3 days old).

@pshipton
Copy link
Member Author

Still a problem

17:44:07  Error: /home/jenkins/workspace/Build_JDK21_aarch64_linux_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1509 traverse_cu_in_debug_section - Failed to get next dwarf CU header.
17:44:07  Error: /home/jenkins/workspace/Build_JDK21_aarch64_linux_Personal/omr/ddr/lib/ddr-scanner/dwarf/DwarfScanner.cpp:1596 startScan - Failure scanning /home/jenkins/workspace/Build_JDK21_aarch64_linux_Personal/build/linux-aarch64-server-release/vm/runtime/libj9gc29.debuginfo

@keithc-ca
Copy link
Contributor

Still a problem

That doesn't refute my suggestion that libdwarf.so, libelf.so or some other code used by omr_ddrgen needs to be updated. @AdamBrousseau Did you get a chance to check the build machines?

@pshipton
Copy link
Member Author

The "build machine" is a docker container built by Adoptium. It was built fresh 3 days before I tried it. It is cent7 so the latest there may not be the latest available elsewhere. My build used adoptopenjdk/centos7_build_image@sha256:a5c3801ed73e8c68f9edc200387a094e5ca4de4a93cc8cab76bd459ff941fd6b

which is still the latest version atm.
https://hub.docker.com/r/adoptopenjdk/centos7_build_image/tags

The playbook used to create the container is https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/Common/vars/CentOS.yml
you can see libdwarf-devel listed. I assume it would get the latest when the image is created.

@pshipton
Copy link
Member Author

I believe this job creates the container.
https://ci.adoptium.net/job/centos7_docker_image_updater/

r30shah pushed a commit to r30shah/openj9-jit-debug-agent that referenced this issue Oct 27, 2023
alinux can't move to 11.2 due to
eclipse-openj9/openj9#15390

Signed-off-by: Peter Shipton <Peter_Shipton@ca.ibm.com>
@knn-k
Copy link
Contributor

knn-k commented Oct 30, 2023

I can build JDK 21 for AArch64 Linux in my local environment, Ubuntu 22.04.

  • gcc 11.4.0
  • libdwarf-dev 20210528-1
  • libelf-dev 0.186-1build1

@keithc-ca
Copy link
Contributor

Is this still an issue? All the jobs (that are still available) listed at https://openj9-jenkins.osuosl.org/job/Build_JDK21_aarch64_linux_Nightly/ are green.

@pshipton
Copy link
Member Author

That's because they are using gcc 10.3

@keithc-ca
Copy link
Contributor

That's because they are using gcc 10.3

Right, I forgot about that.

@keithc-ca
Copy link
Contributor

objdump from that image still complains about more .debuginfo files from recent build attempts.
The date on libdwarf.so* isn't reassuring:

# ls -l /lib64/libdwarf.so*
lrwxrwxrwx 1 root root     24 Oct  9 12:31 /lib64/libdwarf.so -> libdwarf.so.0.20130207.0*
lrwxrwxrwx 1 root root     24 Oct  9 12:31 /lib64/libdwarf.so.0 -> libdwarf.so.0.20130207.0*
-rwxr-xr-x 1 root root 246640 Mar  6  2015 /lib64/libdwarf.so.0.20130207.0*

All this is consistent with a docker image that's not self-consistent. It's not clear to me that CentOS:7 will ever be a reasonable platform for such recent tools.

@pshipton
Copy link
Member Author

We need to build against glibc 2.17 in order to run on cent7. Either finding another way to do that, or updating the image with more recent versions of the dwarf libraries.

@keithc-ca
Copy link
Contributor

I don't remember the circumstances where I encountered this error

Error: dwarf/DwarfScanner.cpp:239 excludedDie - Getting attr value decl_file: DW_DLE_ATTR_FORM_BAD: In function formudata (internal function) on seeing form 0x21 (DW_FORM_implicit_const)

but even with the latest code in libdwarf, dwarf_formudata() doesn't support form DW_FORM_implicit_const. So it would seem that DwarfScanner.cpp needs to be updated to handle attribute DW_AT_decl_file with form DW_FORM_implicit_const.

@keithc-ca
Copy link
Contributor

Those circumstances were building with gcc 11.4 on Ubuntu and allowing the compiler to generate DWARF version 5 debuginfo: eclipse/omr#7272 fixes those problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants