Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

[WIP] Fix PIC relocation issues #6847

Closed
wants to merge 1 commit into from

Conversation

MichalStrehovsky
Copy link
Member

Progress towards #4988.

The relocation issues we're having are due to two things: we emit all symbols as global, and we don't do the dance that is necessary to access symbols someone else could redefine in a different module and make them "too far away" to fit in a 32bit relocation. Ideally, we should find a way to ban this because I think it's breaking our ability to have static libraries built by different versions of CoreRT in the same process (they're going to redefine each other's symbols and that sounds bad).

I ran out of time I allocated for myself for this. This is enough to actually build SharedLibrary.so, but the test is failing (because dlopen fails for reasons that I didn't investigate) and I didn't validate this doesn't break Windows or other (non-dynamic library) scenarios.

I'll need to read up more on ELF relocs at some point.

Putting it up in case someone would like to pick this up. I don't know when I'll be able to make time for this again.

Cc @tonerdo @tim241 @sebastianulm

@tonerdo
Copy link
Contributor

tonerdo commented Jan 21, 2019

This is enough to actually build SharedLibrary.so, but the test is failing (because dlopen fails for reasons that I didn't investigate)

Will take some time to investigate dlopen failing, hopefully it's something simple enough for me to fix. Happy to know that it's building at least

@MichalStrehovsky
Copy link
Member Author

Will take some time to investigate dlopen failing

It's possible something else fails. I fell for the trap of thinking that different exit codes mean different failures in the test, but the SharedLibrary test exits with 1 for all failures.

@hc4
Copy link
Contributor

hc4 commented Mar 26, 2019

I've tried it in my project and got error from dlopen(): undefined symbol: g_compilerEmbeddedSettingsBlob

@hc4
Copy link
Contributor

hc4 commented Mar 26, 2019

I've creatred simple stub in C with just this exported variable:

extern struct {
unsigned char data0[4];

 } g_compilerEmbeddedSettingsBlob = {
                {0x00,0x00,0x00,0x00},
                };

Then added this stub to linking and got working .so library, which was sucsessfully loaded.
Also I can call methods exported from managed library (however some of them causes sigfaults)

@josephmoresena
Copy link
Contributor

josephmoresena commented Mar 26, 2019

I've creatred simple stub in C with just this exported variable:

extern struct {
unsigned char data0[4];

 } g_compilerEmbeddedSettingsBlob = {
                {0x00,0x00,0x00,0x00},
                };

Then added this stub to linking and got working .so library, which was sucsessfully loaded.
Also I can call methods exported from managed library (however some of them causes sigfaults)

Why this struct? May the nature of that symbol is CompilerEmbeddedSettingsBlob defined at https://github.com/dotnet/corert/blob/master/src/Native/Runtime/RhConfig.cpp

@hc4
Copy link
Contributor

hc4 commented Mar 26, 2019

Why this struct?

I've just took it from generated cpp files (like tests\src\Simple\Delegates\obj\Release\x64\native\Delegates.cpp)
Also I've trying this patched objwriter with latest corert and it looks like it already generates this exoprt, so my stub is not required.

@hc4
Copy link
Contributor

hc4 commented Mar 27, 2019

On master I also get SIGABRT.
A little debugging shows, that error comes from GetCodeManager()->UnwindStackFrame()

   0x00007ffffd0b037c in Assert (expr=0x7ffffdc97468 "ASSERT_UNCONDITIONALLY",
    file=0x7ffffdc9927e "/mnt/x/Projects/corert/src/Native/Runtime/StackFrameIterator.cpp", line_num=1331,
    message=0x7ffffdc99844 "GetCodeManager()->UnwindStackFrame(&m_methodInfo, &m_RegDisplay, &pPreviousTransitionFrame)")
    at /mnt/x/Projects/corert/src/Native/Runtime/rhassert.cpp:25
#3  0x00007ffffcfcda54 in StackFrameIterator::NextInternal (this=0x7ffffffea1f8) at /mnt/x/Projects/corert/src/Native/Runtime/StackFrameIterator.cpp:1331
#4  0x00007ffffcfcd8a9 in StackFrameIterator::Next (this=0x7ffffffea1f8) at /mnt/x/Projects/corert/src/Native/Runtime/StackFrameIterator.cpp:1303
#5  0x00007ffffcfd1661 in Thread::GcScanRootsWorker (this=0x61c940, pfnEnumCallback=0x7ffffd039470 <WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int)>,
    pvCallbackData=0x7ffffffea708, frameIterator=...) at /mnt/x/Projects/corert/src/Native/Runtime/thread.cpp:540
#6  0x00007ffffcfd13b4 in Thread::GcScanRoots (this=0x61c940, pfnEnumCallback=0x7ffffd039470 <WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int)>,
    pvCallbackData=0x7ffffffea708) at /mnt/x/Projects/corert/src/Native/Runtime/thread.cpp:412
#7  0x00007ffffcfbe2c5 in GCToEEInterface::GcScanRoots (fn=0x7ffffd039470 <WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int)>, condemned=0, max_gen=2,
    sc=0x7ffffffea708) at /mnt/x/Projects/corert/src/Native/Runtime/gcrhscan.cpp:85
#8  0x00007ffffd08a789 in GCScan::GcScanRoots (fn=0x7ffffd039470 <WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int)>, condemned=0, max_gen=2,
    sc=0x7ffffffea708) at /mnt/x/Projects/corert/src/Native/gc/gcscan.cpp:170
#9  0x00007ffffd0150ac in WKS::gc_heap::mark_phase (condemned_gen_number=0, mark_only_p=0) at /mnt/x/Projects/corert/src/Native/Runtime/Full/../../gc/gc.cpp:19600
#10 0x00007ffffd00ca37 in WKS::gc_heap::gc1 () at /mnt/x/Projects/corert/src/Native/Runtime/Full/../../gc/gc.cpp:15270
#11 0x00007ffffd029a33 in WKS::gc_heap::garbage_collect (n=0) at /mnt/x/Projects/corert/src/Native/Runtime/Full/../../gc/gc.cpp:16861

@@ -859,7 +859,7 @@ public void EmitSymbolDefinition(int currentOffset)
AppendExternCPrefix(_sb);
name.AppendMangledName(_nodeFactory.NameMangler, _sb);

EmitSymbolDef(_sb);
EmitSymbolDef(_sb, false);
Copy link
Contributor

@hc4 hc4 Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least g_compilerEmbeddedSettingsBlob BlobNode must be global

@hc4
Copy link
Contributor

hc4 commented Mar 27, 2019

After making BlobNode Global I can compile valid .so without stub.

But now I geting SegFault in switch:

#0  0x00007ffffd452b35 in S_P_CoreLib_System_Marvin__ComputeHash32_0 (data=@0x7fffe300e064: 83 'S', count=44,
    p0=1969575079, p1=3617844451) at /mnt/x/Projects/corert/src/System.Private.CoreLib/shared/System/Marvin.cs:47
47                  switch (ucount)
(gdb) l
42
43                      byteOffset += 8;
44                      ucount -= 8;
45                  }
46
47                  switch (ucount)
48                  {
49                      case 4:
50                          p0 += Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref data, byteOffset));
51                          Block(ref p0, ref p1);

@MichalStrehovsky
Copy link
Member Author

But now I geting SegFault in switch:

If I had to venture a guess, it's a problem with how RyuJIT-generated code refers to the jump table data. We're probably not doing the right reloc for it with this change. I would look at how the reloc referenced from there looks like in assembly and what it's pointing at.

@hc4
Copy link
Contributor

hc4 commented Mar 27, 2019

Maybe disassemble of ComputeHash32, which faults will help:

   0x00007ffffd452ae5 <+293>:   mov    %rdi,-0x48(%rbp)
   0x00007ffffd452ae9 <+297>:   mov    -0x48(%rbp),%rdi
   0x00007ffffd452aed <+301>:   mov    %rdi,-0x80(%rbp)
   0x00007ffffd452af1 <+305>:   mov    -0x80(%rbp),%rdi
   0x00007ffffd452af5 <+309>:   mov    %rdi,-0x88(%rbp)
   0x00007ffffd452afc <+316>:   mov    $0x7,%edi
   0x00007ffffd452b01 <+321>:   movslq %edi,%rdi
   0x00007ffffd452b04 <+324>:   cmp    %rdi,-0x48(%rbp)
   0x00007ffffd452b08 <+328>:   jbe    0x7ffffd452b10 <S_P_CoreLib_System_Marvin__ComputeHash32_0+336>
   0x00007ffffd452b0a <+330>:   nop
   0x00007ffffd452b0b <+331>:   jmpq   0x7ffffd452de3 <S_P_CoreLib_System_Marvin__ComputeHash32_0+1059>
   0x00007ffffd452b10 <+336>:   cmpl   $0x7,-0x88(%rbp)
   0x00007ffffd452b17 <+343>:   ja     0x7ffffd452b37 <S_P_CoreLib_System_Marvin__ComputeHash32_0+375>
   0x00007ffffd452b19 <+345>:   mov    -0x88(%rbp),%edi
   0x00007ffffd452b1f <+351>:   mov    %edi,%edi
   0x00007ffffd452b21 <+353>:   lea    -0x4b35c8(%rip),%rsi        # 0x7ffffcf9f560 <__readonlydata_S_P_CoreLib_System_Marvin__ComputeHash32_0@plt>
   0x00007ffffd452b28 <+360>:   mov    (%rsi,%rdi,4),%esi
   0x00007ffffd452b2b <+363>:   lea    -0x135(%rip),%rax        # 0x7ffffd4529fd <S_P_CoreLib_System_Marvin__ComputeHash32_0+61>
   0x00007ffffd452b32 <+370>:   add    %rax,%rsi
=> 0x00007ffffd452b35 <+373>:   jmpq   *%rsi
   0x00007ffffd452b37 <+375>:   nop
   0x00007ffffd452b38 <+376>:   jmpq   0x7ffffd452de3 <S_P_CoreLib_System_Marvin__ComputeHash32_0+1059>
   0x00007ffffd452b3d <+381>:   mov    -0x20(%rbp),%edi
   0x00007ffffd452b40 <+384>:   mov    %edi,-0xc0(%rbp)
   0x00007ffffd452b46 <+390>:   mov    -0x18(%rbp),%rdi
   0x00007ffffd452b4a <+394>:   mov    -0x38(%rbp),%rsi
   0x00007ffffd452b4e <+398>:   callq  0x7ffffd8c6b60 <S_P_CoreLib_Internal_Runtime_CompilerServices_Unsafe__AddByteOff

SigFault caused by 0x00007ffffd452b35 <+373>: jmpq *%rsi where rsi contains 0x8000526f4ffc

@hc4
Copy link
Contributor

hc4 commented Mar 28, 2019

Maybe I'm wrong, but it looks like __readonlydata_S_P_CoreLib_System_Marvin__ComputeHash32_0 reloc type must be R_X86_64_GLOB_DAT, but currently it is generated as
00000000015dea98 000005b700000007 R_X86_64_JUMP_SLOT 0000000000fb78e0 __readonlydata_S_P_CoreLib_System_Marvin__ComputeHash32_0 + 0

@@ -391,10 +396,12 @@ int ObjectWriter::EmitSymbolRef(const char *SymbolName,
case RelocType::IMAGE_REL_BASED_REL32:
Size = 4;
IsPCRel = true;
Kind = MCSymbolRefExpr::VK_PLT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VK_PLT is for code symbols, and for data symbols it should be VK_GOTPCREL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this! You probably know more about ELF relocs than me at this point.

Is there any way to express "this reloc points to something that is part of the ELF module and will for sure fit in the 32 bit reloc" because it's in the same file?

Part of the reason why I marked this pull request WIP is because I don't particularly like we have to go through jump stubs to access symbols that are defined within the same module (some of these relocs literally point to things that are in the same .o file, the others point to things that are in libRuntime.so). GOTPCREL will similarly require an indirection to access things that are right there.

If this is something that we have to live with, we can probably introduce a --pic command line option for ILC and have it generate the necessary indirections under that switch.

Copy link
Contributor

@hc4 hc4 Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just studied all this relocation tricks for ELF :)
But, as I understand - this indirections is a standard way of making shared libs in *nix.
There is question on StackOverflow about that:
here https://stackoverflow.com/questions/53086636/linking-a-static-library-into-a-shared-library-without-fpic
and here https://stackoverflow.com/questions/51417700/relocation-errors-when-linking-a-static-library-with-shared-library

Copy link
Contributor

@hc4 hc4 Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe symbol type R_X86_64_PC32 means "32 bit offset relative to procedure call".
But LD will allow this type only for local symbols.
I've tried to make all symbols LOCAL, except ExternSymbolNode. Everything get linked, but still have a problem with stack unwinding. It can't find FDE in getInfoFromDwarfSection()

@jkotas
Copy link
Member

jkotas commented Apr 1, 2019

I've tried to make all symbols LOCAL, except ExternSymbolNode.

Yes, making symbols to be local is the way to go. The equivalent C/C++ construct is __attribute__ ((visibility ("hidden"))) or -fvisibility=hidden command line option - https://gcc.gnu.org/wiki/Visibility .

We may need to have a flag on ExternSymbolNode to distinguish non-exported symbol with extern C name (that should still be local) and symbol that is exported from the .so (that should be global).

@hc4
Copy link
Contributor

hc4 commented Apr 2, 2019

Hm. I've found 2 code branches, which searches for UnwindSection by PC.
One is LocalAddressSpace::findUnwindSections, and another is UnwindHelpers.LocateUnwindSections

And it looks like second one (which is used in GC) has searched wrong UnwindSection.

@hc4
Copy link
Contributor

hc4 commented Apr 2, 2019

It's funny, but function UnwindHelpers.LocateSectionsCallback just didn't check result :)

Fixed it and got working lib, Hurray!
I think, that second function must be removed.

@jkotas
Copy link
Member

jkotas commented Apr 2, 2019

LocateUnwindSections was added as a optimization to avoid performance issues with LocalAddressSpace::findUnwindSections.

@hc4 Thank you for looking into this. This is great work. Do you plan to PR the fixes you have made?

@hc4
Copy link
Contributor

hc4 commented Apr 2, 2019

I still have SIGFAULT in RhpUniversalTransition_DebugStepTailCall when library loaded with RTLD_LAZY option.
Also I didn't made flag in ExternSymbolNode.

@hc4
Copy link
Contributor

hc4 commented Apr 2, 2019

I've got it working with help of hack at

jmp EXTERNAL_C_FUNC(RhpUniversalTransition_DebugStepTailCall)

replaced with

jmp qword ptr [rip + RhpUniversalTransition_DebugStepTailCall@GOTPCREL]

to avoid PLT relocationat at runtime

Now it works in RTLD_LAZY mode

@hc4
Copy link
Contributor

hc4 commented Apr 3, 2019

Should I create new PR into coret/master or just my fixes into MichalStrehovsky:exper?

@MichalStrehovsky
Copy link
Member Author

Should I create new PR into coret/master or just my fixes into MichalStrehovsky:exper?

Please make a new one - I'll just close this one then.

@MichalStrehovsky
Copy link
Member Author

Closing in favor of #7249.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants