Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-5935] llvm-link: Missing Dwarf DIE references #48494

Closed
swift-ci opened this issue Sep 19, 2017 · 35 comments
Closed

[SR-5935] llvm-link: Missing Dwarf DIE references #48494

swift-ci opened this issue Sep 19, 2017 · 35 comments
Assignees

Comments

@swift-ci
Copy link
Collaborator

@swift-ci swift-ci commented Sep 19, 2017

Previous ID SR-5935
Radar rdar://problem/34526036
Original Reporter jackcarter (JIRA User)
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Environment

Mac Sierra 10.12.6 (occurs on other versions as well)

Xcode 9.0 (earlier versions also demonstrate the error)

llvm-swift 4.0

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug
Assignee @adrian-prantl
Priority Medium

md5: 07b0b879f83f54a4032f7b7f50bd2679

Issue Description:

I am experiencing an issue combining bitcode files for the purpose of generating the combined bitcodes as a single bitcode file. I would like to have any pointers to help me debug this or maybe it has been seen before and a fix is either being worked on or is done.

I am using the Xcode 9.0 compiler. I believe the Swift code is 3.x. I have reproduced this using the tot llvm-link.

The input modules are from Swift (LucidDreams) and have been compiled -O. The problem doesn't seem to exist when they are compiled -Onone.

The llvm-link occurs without error and the subsequent compilation also seems to go fine, but when the resultant llvm-dwarfdump -verbose -verify is run I get a bunch of the following errors:

warning: could not find referenced DIE
in DIE:

0x0000a33f: DW_TAG_inlined_subroutine [20] *
DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x0c84 => {0x0000a40d})
DW_AT_ranges [DW_FORM_sec_offset] (0x00021960
[0x0000000000001878 - 0x000000000000187c)
[0x00000000000018b0 - 0x0000000000001910)
[0x0000000000001980 - 0x00000000000019e0))
DW_AT_call_file [DW_FORM_data1] ("<mypath>/testprogram_lucidDreams/iOS_APP/LucidDreams/DreamListViewControllerModel.swift")
DW_AT_call_line [DW_FORM_data1] (61)
while processing <mypath>/testprogram_lucidDreams/iOS_APP/DerivedData/iOS_APP/Build/Intermediates.noindex/iOS_APP.build/Debug-iphoneos/iOS_APP.build/Objects-normal/arm64/iOS_APP.bc.o:

All the errors reference DreamListViewControllerModel.swift and have todo with inlining, but if I remove enough of the input bitcode objects from the llvm-link the error goes away even though DreamListViewControllerModel is still included.

Here are the commands I used to generate the error (omitting the original Swift compile):

########################
########################
llvm-link \
-o <mypath>/iOS_APP.bc \
<mypath>/DreamListViewController.o \
<mypath>/TextEntryCollectionViewCell.bc \
<mypath>/ImageDrawable.bc \
<mypath>/DreamScene.bc \
<mypath>/DreamListViewControllerModel.bc \
<mypath>/CreatureCollectionViewCell.bc \
<mypath>/RangeReplaceableCollection+IndexSet.bc \
<mypath>/DreamPreviewHeaderReusableView.bc \
<mypath>/Rendering.bc

########################
########################
xcrun \
--sdk iphoneos \
<mypath>/Xcode_9.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ \
-fembed-bitcode \
<mypath>/iOS_APP.bc \
-arch arm64 \
-O0 \
-c \
-o <mypath>/iOS_APP.bc.o

########################
########################
llvm-dwarfdump -verbose -verify small.o > small.o.dwarfdump

Any insights would be appreciated. Input bitcode files are attached.

Thanks,
Jack

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 19, 2017

I can reproduce this.

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 19, 2017

@swift-ci create

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 19, 2017

This looks like a bug in LLVM, not the swift frontend. I will investigate.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Sep 28, 2017

Comment by JACK C CARTER (JIRA)

We first hit this issue while Xcode was running dsymutil (llvm-dsymutil) on our bits. Dsymutil was crashing with a segv. Applying the patch found in: https://reviews.llvm.org/D38078 dsymutil no longer crashes, but now complains about dangling references to missing DIE records:

while processing /Users/jackcarter/test/eit_3107/iOS_APP/DSYM_40/iOS_APP/DerivedData/iOS_APP/Build/Intermediates.noindex/iOS_APP.build/Debug-iphoneos/iOS_APP.build/Objects-normal/arm64/iOS_APP.bc.o:
warning: could not find referenced DIE

This is consistent with what I found from tot llvm's version llvm-dwarfdump. The odd thing is that swift-llvm's version of llvm-dwarfdump does not show the DIE as missing:

I get the Dwarf DIE errors with tot LLVM llvm-dwarfdump, but with our swift-llvm llvm-dwarfdump I not only don't get the errors, but I also get the symbolic reference:

SWIFT-LLVM:

0x00003298:             DW_TAG_inlined_subroutine [28] *
                          DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x14cb => {0x000014cb} "_T0s22_ContiguousArrayBufferV5countSifg10Foundation9IndexPathV_Tg5")
                          DW_AT_ranges [DW_FORM_sec_offset] (0x00005120
                             [0x0000000000012990 - 0x0000000000012a6c)
                             [0x0000000000012dac - 0x0000000000012db0))
                          DW_AT_call_file [DW_FORM_data1]   ("/Users/jackcarter/depot/products/rabbit/test/testrun/testframework-swift3.0/dSYM_40/testprogram_lucidDreams/iOS_APP/LucidDreams/DreamListViewController.swift")
                          DW_AT_call_line [DW_FORM_data1]   (0)

LLVM:

error: invalid DIE reference 0x000014cb. Offset is in between DIEs:

0x00003298: DW_TAG_inlined_subroutine [28] *
              DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x14cb => {0x000014cb})
              DW_AT_ranges [DW_FORM_sec_offset] (0x00005120
                 [0x0000000000012990 - 0x0000000000012a6c)
                 [0x0000000000012dac - 0x0000000000012db0))
              DW_AT_call_file [DW_FORM_data1]   ("/Users/jackcarter/depot/products/rabbit/test/testrun/testframework-swift3.0/dSYM_40/testprogram_lucidDreams/iOS_APP/LucidDreams/DreamListViewController.swift")
              DW_AT_call_line [DW_FORM_data1]   (0)

This only happens (we believe) when combining Swift bitcode that was using an optimization level higher than Onone. I am wondering if the swift-llvm version of dwarfdump is different from the vanilla llvm dwarfdump, maybe that change needs to be incorporated into dsymutil as well.

I'll keep digging into this, but any insights are appreciated.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 2, 2017

Comment by JACK C CARTER (JIRA)

This may be another red herring, but I have a DW_TAG_subprogram with 4 DW_TAG_inlined_subroutine DIEs that I get the complaint about in dwarfdump. I compiled the one with the complaint with direct object output and another that did not have the complaint with -S and then -c.

The difference between the 2 was the one that complained about the missing references (cu + 0x2646 => {0x00002646} had a forward reference It should have had an additive value of 0x00019c07 whereas the one that didn't complain DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x10b8 => {0x0001997d} "_T0s2eeoiSbxSg_ABts9EquatableRzlFSo7UITouchC_Tg5") had a backward reference from the referring DW_TAG_inlined_subroutine (0x00019b5b).

For some reason "cu" is 0 for the bad one.
Is there a rule that the referenced DIE addresses need to be backward? I would have thought that

Could it be that the Dwarf reader is confusing the base address of the referencing DW_TAG_subprogram with that of the inlined function it is referencing.

BAD:

DW_TAG_subprogram :0x00008085

DW_TAG_inlined_subroutine: (cu + 0x10b0 =\> {0x000010b0}) (should be 0x00019c07)

GOOD

DW_TAG_subprogram :0x00019b40

DW_TAG_inlined_subroutine: (cu + 0x10b8 =\> {0x0001997d}) (correct)

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 2, 2017

Comment by JACK C CARTER (JIRA)

The referencing function is defined in 2 bitcode files (ButtonOverlay.ll and PadOverlay.ll) and is marked as "linkonce_odr". The referenced inlined function is also defined in both files.

I am wondering if there is some confusion if the referencing function gets the referenced function's definition from the other .bc file when the bc files are combined into a new aggregate bc file. If it thinks that it is in the same original CU, but isn't, that could be the problem.

@JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Oct 3, 2017

Jack, can you walk me through what you did to reach these conclusions? I figure you compiled the two bitcode files separately and compared the debug info with that in the object file for the linked bitcode file?

In particular I'm curious about why you say "it have had an additive value of 0x00019c07"? I'm also not sure what you mean by the "cu" being zero.

The referencing function is defined in 2 bitcode files (ButtonOverlay.ll and PadOverlay.ll) and is marked as "linkonce_odr". The referenced inlined function is also defined in both files.

Is the function really defined in both files, or is it declared in both and defined in one? The last sentence would also means that the referenced function is not yet inlined in the bitcode modules, is that correct?

PS: I think the two bitcode files you're looking at are not (yet) attached. Could you add them so we can both look at the same files?

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 3, 2017

Comment by JACK C CARTER (JIRA)

I am putting together the sample. I want to rerun what I package before sending to make sure I don't leave anything out.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 4, 2017

Comment by JACK C CARTER (JIRA)

dwarf.tar.gz

This is a different example that fails with both a direct object compile and one that produces a .s and then .o. The complaint is different as well and may point to a more obvious answer. It seems that minor differences alter the outcome and when I go back to reproduce the problem it has shifted or gone away.

I am very motivated to get this solved and am willing to help in anyway possible.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 4, 2017

Comment by JACK C CARTER (JIRA)

The bitcode files in the obj directory are mostly from a swift demo program sources. They are compiled -O. If we compile them -Onone the problem seems to go away. The sources were compiled to bitcode using Xcode 9.0.

We use the same LLVM methods to create a unified bitcode file as llvm-link. The situation I was discussing yesterday was from bitcode generated from our tool. The one I just attached is combined purely from llvm-link. It seems that if I sneeze the results change.

My suspicion is that the large size of the combined bitcode along with inlined functions from modules that are located far from the inlined subroutines are causing the problem(s). The earlier instance of the problem (which I struggle to reproduce) was dealing with a subroutine that was defined in 2 modules and marked as linkonce_odr which I believe means the compiler can remove one of them. Also in each of the modules the inlined function was defined and only one of them ended up in the combine object file. The function from one of the modules was inlined in the subroutine of the other module. I think that is the gist of the problem.

But I keep making wrong conclusions from my observations. I am reading up on Dwarf to try to be a bit more intelligent in this conversation and not mis-lead. I would say, see if you can reproduce a dwarfdump error from my attached example and if you can, ignore my guesses and look at the facts.

Let me know what I can do to make this easier.

@JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Oct 4, 2017

Thanks Jack, your help is very much appreciated!

So far I was able to reduce the problem to two files from your last attachment: GameController.ll and Swift4.ll:

llvm-link GameController.ll Swift4.ll -o linked.bc

xcrun --sdk iphoneos clang++ linked.bc -arch arm64 -O0 -c -o linked.o

llvm-dwarfdump -verify linked.o

One interesting observation is that changing the order of the input files for LLVM link changes the outcome. This might explain why you sometimes end up with different results?

Anyway, the problematic subroutines are inlined in two functions:

  • 'T0s10DictionaryVAByxq_Gqd20uniqueKeysWithValues_tcs8SequenceRd_x_q_t7ElementRtd__lufCSS_Sds04Zip2F0VySaySSGSaySdGGTg5Tf4gXd_n'

  • 'T0s10DictionaryVAByxq_Gqd_q_qq_tKc16uniquingKeysWithtKcs8SequenceRdx_q_t7ElementRtd_lufCSS_Sis04Zip2E0VySaySSGs8RepeatedVySiGGTg5Tf4nnd_n'

Using llvm-extract I was able to reduce the testcase further (Swift4_reduced.ll) and figure out the subroutines that should have been referenced.

  • '_T0s37_HashableTypedNativeDictionaryStorageCySSSdGMa'

  • '_T0s23_NativeDictionaryBufferVAByxq_GSi8capacity_s04_RawaB7StorageC7storagetcfCSS_SdTg5'

  • '_T0s23_NativeDictionaryBufferVAByxq_GSi8capacity_s04_RawaB7StorageC7storagetcfCSS_SdTg5Tf4nnd_n'

  • '_T0Sp10initializeyx2to_Si5counttFSu_Tgq5'

The offsets are indeed correct if you compile this file and check the debug info with dwarfdump.

I used the same approach to re-extract these two functions from the linked bitcode file. Just like the original file, the offsets are correct. This tells me that it's not the bitcode linker that is changing some property that is causing the invalid offsets. I'll have to debug the backend to figure out why these particular offsets are emitted for the linked file.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 4, 2017

Comment by JACK C CARTER (JIRA)

What explains my different results is a messy programming environment. Mixing TOT LLVM, TOT Swift-LLVM, our flavor of Swift-LLVM and our bitcode engine doesn't help. Along with having to test multiple ongoing releases of Xcode.
I need to be more narrow and rigorous in my approach going forward.

I am so happy you are looking at this.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 4, 2017

Comment by JACK C CARTER (JIRA)

I guess I don't know what is really meant by CU.

If the address for the DW_AT_abstract_origin DIE is CU+offset, is the CU the original computational unit compiled (file_1.swift -> file_1.bc) or something else like the full aggregate object (file_1.bc+file_2.bc)? If it is the former then the CU may be different from the parent DW_TAG_subprogram.

@JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Oct 5, 2017

There is one CU per compilation unit, and a bitcode module can have multiple (look for DICompileUnit). GameController.ll for example has 1 Swift CU and 3 Objective-C CUs.

For example, for the function '_T0Sp10initializeyx2to_Si5counttFSu_Tgq5', which appears in both GameController.ll and Swift4_rediuced.ll, we have two DISubprogram in the linked LLVM-assembly file, each point to a different CU. If you compile the linked file and check the dwarfdump output you can see that there's a CU for Swift4.swift and one for GameController.swift, each with their corresponding instance of the DW_TAG_subprogram for the inlined function.

0x00000000: Compile Unit: length = 0x0000b55f version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x0000b563)

0x0000000b: DW_TAG_compile_unit
DW_AT_producer ("Apple Swift version 4.0 (swiftlang-900.0.65 clang-900.0.37)")
DW_AT_language (DW_LANG_Swift)
DW_AT_name ("/Users/jackcarter/test/eit_3107/Plan_A/testprogram_fox2/iOS_APP/Swift/Shared/GameController.swift")

...

0x00007432: DW_TAG_subprogram
DW_AT_linkage_name ("_T0Sp10initializeyx2to_Si5counttFSu_Tgq5")
DW_AT_name ("_T0Sp10initializeyx2to_Si5counttFSu_Tgq5")

...

0x0000b6c0: Compile Unit: length = 0x00000196 version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x0000b85a)

0x0000b6cb: DW_TAG_compile_unit
DW_AT_producer ("Apple Swift version 4.0 (swiftlang-900.0.65 clang-900.0.37)")
DW_AT_language (DW_LANG_Swift)
DW_AT_name ("/Users/jackcarter/test/eit_3107/Plan_A/testprogram_fox2/iOS_APP/Swift4.swift")

...

0x0000b71e: DW_TAG_subprogram
DW_AT_linkage_name ("_T0Sp10initializeyx2to_Si5counttFSu_Tgq5")
DW_AT_name ("_T0Sp10initializeyx2to_Si5counttFSu_Tgq5")

So while writing this up I noticed something interesting. So far I've been assuming that the offsets have been wrong, but 0x0000b71e - 0x0000b6c0 = 0x5e, which is exactly the CU-relative offset we see in one of the invalid inlined subroutines, if it weren't for the fact that the offending DIE is in another CU.

0x0000a7ce: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (cu + 0x005e)

The same is true for the other offsets. I didn't notice this earlier because I was focussing on the function that actually appears in both CUs, but it looks like the referencing function (and as a consequence its inlined functions) is just ending up in the wrong CU. In the linked LLVM-assembly file this is still correct (DISubprogram ![](7025 -> DICompileUnit )74 -> DIFile !75 = Swift4.swift)

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 5, 2017

Comment by JACK C CARTER (JIRA)

So are you saying that the problem is in the code generator, or are the Dwarf reader routines making wrong assumptions and printing incorrect "facts"?

@JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Oct 5, 2017

The backend is indeed the problem, the DWARF is claiming it belongs to the GameController CU while it really belongs to the Swift4 one. In the assembly file the DIE already shows up under the wrong CU. I've been looking at the DWARF backend and I have a rough idea of where it's likely going wrong, but I still haven't been able to pinpoint it.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 10, 2017

Comment by JACK C CARTER (JIRA)

Are you closer to an answer? If not, is there anything I can do to help at this point?

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 10, 2017

Yes. Our working theory is that this is a bug in Swift's IRGenDebugInfo. The methods of the Dictionary class have a FwdDecl of Dictionary as their scope instead of the full definition, which has odds sideeffects when llvm-link does type-uniquing.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 10, 2017

Comment by JACK C CARTER (JIRA)

If this is the culprit, doe it mean that the Swift compiler that produces the original bitcode would need to change? Do you need me to send the original sources or do you have an example already?

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 10, 2017

If it is easy for you to share the source code, this would save me a lot of time! Thanks

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 10, 2017

Comment by JACK C CARTER (JIRA)

This is overkill, but hopefully just GameController.swift Swift4.swift will do it.

swift_files.tar.gz

swift4.compile.txt

Swift and Xcode is never simple. If I need to send you the full Xcode project let me know and I will see about getting permission.

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 11, 2017

Compiling the following with optimizations works:

func use<T>(_ t:T){}

public func useDict() {

{{ let names = ["adrian"]}}

{{ let numbers = [23]}}

{{ let dict = Dictionary(uniqueKeysWithValues: zip(names, numbers))}}{{ use(dict)}}

{{}}}

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 11, 2017

This is a bug in the generation of debug info for specialized functions.

The fundamental problem seems to be that when the scope for a SILFunction is created for the debug info, we look at the DeclContext. From the DeclContext in this case we get to the unbound generic type Dictionary<KeyT,ValueT> or to the bound generic type Dictionary<Archetype0, Archetype1>, but not to the bound generic type of Dictionary<String, Int> (because all instances of Dictionary share the same DeclContext). We need to find a way to pass the actual (for lack of a better word) parent Swift type of the SILFunction to IRGenDebugInfoImpl::emitFunction(). We probably need to replace SILFunction::getDeclContext() with a mechanism that returns the parent type rather than the lexical context.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 11, 2017

Comment by JACK C CARTER (JIRA)

So from our (Arxan's) point of view I think this means that until there is an Xcode with a swift compiler that fixes this we will have to do with a dsymutil that incorporates the change (https://reviews.llvm.org/D38078) of warning about the bad Dwarf instead of crashing.

Is that a fair representation?

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 11, 2017

Yes that workaround is your best option at this point,

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 11, 2017

Comment by JACK C CARTER (JIRA)

I want to thank you, Adrian and Jonas, for getting to the heart of this. I was going down rabbit holes based on my incorrect analysis of the symptoms.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Oct 13, 2017

Comment by JACK C CARTER (JIRA)

A couple of questions:

1) Do I need to sign up for another list to see if and when a fix for this occurs? That way I can patch it in, build the Swift compiler and see if that solves our problems or determine if there are others hiding under this one.

2) Do you see a fix for ths getting into a Xcode 9.1 beta?

Thanks

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Oct 14, 2017

  1. I will make a note here.

  2. I cannot comment on future Xcode releases. It is clearly something that has to be fixed, but it only affects debug information of optimized code, so other bugs may have a higher priority in the short term. I'm definitely open to collaborating with an interested party from the community for writing a patch for this.

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Jul 7, 2018

I finally got around to design proper debug info for specialized functions.

#17810

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Sep 26, 2019

Comment by Jason Holajter (JIRA)

We had an increase in reports of this issue (specifically `warning: could not find referenced DIE`) when using Xcode 11.0. Has there been any more progress toward getting the underlying issue resolved?

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 26, 2019

This particular bug should be resolved.

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 26, 2019

Resolved back in summer 2018.

@adrian-prantl
Copy link
Member

@adrian-prantl adrian-prantl commented Sep 26, 2019

jholajter (JIRA User) can you file a separate bug with a reproducer for your issue?

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Sep 26, 2019

Comment by Jason Holajter (JIRA)

Thank you for the quick reply. I'll put together a reproduction that I can share and open a separate bug tomorrow. I'll post that bug number here for reference once opened.

@swift-ci
Copy link
Collaborator Author

@swift-ci swift-ci commented Sep 27, 2019

Comment by Jason Holajter (JIRA)

I opened https://bugs.swift.org/browse/SR-11539 to track the new issue we are seeing.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants