Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix #4496 #4503

Merged
merged 3 commits into from May 2, 2016
Merged

Fix #4496 #4503

merged 3 commits into from May 2, 2016

Conversation

parjong
Copy link

@parjong parjong commented Apr 22, 2016

This commit revises legecy codegen to insert a bkpt instruction if the current BB
and next BB is not in the same exception handling region.

This code is backported from ARM64 codegen to fix #4496.

This commit revises legecy codegen to inserts a bkpt instruction if the current BB
and next BB is not in the same exception handling region.

This code is backported from ARM64 codegen.
@parjong
Copy link
Author

parjong commented Apr 26, 2016

\cc @myungjoo @leemgs @janvorli

@janvorli
Copy link
Member

@dotnet/jit-contrib - can someone familiar with the jit take a look?

@russellhadley
Copy link

@JosephTremoulet @BruceForstall Can you take a peek?

@JosephTremoulet
Copy link

From the comment and the doc, this sounds like something needed for the funclet-based unwinder, and I gather from this and #4496 that 32-bit arm is using the funclet-based unwinder. Why, then, is FEATURE_EH_FUNCLETS, which already has the check, not #defined on 32-bit arm?

@BruceForstall
Copy link
Member

I wonder if this failure exists/existed in our older ARM32 Windows builds/products. I believe this failure wouldn't occur then because the unwinder always subtracted 2 from LR when unwinding, which would put the PC address after unwind in the middle of the "bl " instruction, which would be in the correct EH region. The AMD64 OS unwinder didn't do this, hence this is required on AMD64.

It's not clear why this code you're copying is required on ARM64; from my software archaeology it looks like it might have been put there unconditionally (that is, the #ifdef _TARGET_AMD64_ removed) by accident.

@BruceForstall
Copy link
Member

(BTW, it looks like the CLR ABI doc could be improved to be more explicit and detailed about the various padding requirements)

@BruceForstall
Copy link
Member

@jkotas @rahku Do you remember where the logic is/was for ARM to "always subtract two from PC after unwind to a frame"? I don't immediately see it in the unwinder_arm.cpp in the repo, and I'm not sure where I would look for this in the VM itself. I also don't have access to Windows sources to see if it really is there (which I assume would be copied in unwinder_arm.cpp).

@rahku
Copy link

rahku commented Apr 26, 2016

I don't recall seeing this

@BruceForstall
Copy link
Member

Ok, the code is in ExceptionTracker::InitializeCrawlFrame():

    // If the OS indicated that the IP is a callsite, then adjust the ControlPC by decrementing it
    // by two. This is done because unwinding at callsite will make ControlPC point to the
    // instruction post the callsite. If a protected region ends "at" the callsite, then
    // not doing this adjustment will result in a one-off error that can result in us not finding
    // a handler.
    //
    // For async exceptions (e.g. AV), this will be false.
    //
    // We decrement by two to be in accordance with how the kernel does as well.
    if (pDispatcherContext->ControlPcIsUnwound)
    {
        ControlPCForEHSearch -= STACKWALK_CONTROLPC_ADJUST_OFFSET;

For ARM, it's defined in src\debug\inc\arm\primitives.h:

// Given a return address retrieved during stackwalk,
// this is the offset by which it should be decremented to lend somewhere in a call instruction.
#define STACKWALK_CONTROLPC_ADJUST_OFFSET 2

@parjong
Copy link
Author

parjong commented Apr 26, 2016

@BruceForstall When I checked the execution trace, the code that you mentioned is not executed since pDispatcherContext->ControlPcIsUnwond is 0.

Here is the content of pDispatcherContext at that point:

(gdb) p/x *pDispatcherContext
$5 = {ControlPc = 0x71c2d8be, ImageBase = 0x71c2d010, FunctionEntry = 0x71c4537c, EstablisherFrame = 0x7effe950, TargetPc = 0x0, ContextRecord = 0x7effe398, LanguageHandler = 0x0, HandlerData = 0x0,
  HistoryTable = 0x0, ScopeIndex = 0x0, ControlPcIsUnwound = 0x0, NonVolatileRegisters = 0x0, Reserved = 0x0}

@parjong
Copy link
Author

parjong commented Apr 26, 2016

@BruceForstall I tried to find when ControlPcIsUnwound becomes true, but it seems that there is no code for that:

coreclr/src$ grep -nR ControlPcIsUnwound
inc/crosscomp.h:168:    BOOLEAN ControlPcIsUnwound;
inc/crosscomp.h:308:    BOOLEAN ControlPcIsUnwound;
pal/inc/rt/palrt.h:1646:    BOOLEAN ControlPcIsUnwound;
pal/inc/rt/palrt.h:1664:    BOOLEAN ControlPcIsUnwound;
vm/exceptionhandling.cpp:1438:    if (pDispatcherContext->ControlPcIsUnwound)
vm/exceptionhandling.cpp:5309:    // note, also clear out the ControlPcIsUnwound field. Post discussion with
vm/exceptionhandling.cpp:5318:    pDispatcherContext->ControlPcIsUnwound = FALSE;

@parjong
Copy link
Author

parjong commented Apr 26, 2016

I looks like that the change of UnwindManagedExceptionPass1 also resolves this issue.

@@ -4541,6 +4541,7 @@ VOID DECLSPEC_NORETURN UnwindManagedExceptionPass1(PAL_SEHException& ex, CONTEXT

             dispatcherContext.EstablisherFrame = establisherFrame;
             dispatcherContext.ContextRecord = frameContext;
+            dispatcherContext.ControlPcIsUnwound = TRUE;

             // Find exception handler in the current frame
             disposition = ProcessCLRException(&ex.ExceptionRecord,

@BruceForstall
Copy link
Member

BruceForstall commented Apr 27, 2016

I general, I think this later fix is closer to what should be done. It looks like both UnwindManagedExceptionPass1 and UnwindManagedExceptionPass2 call ProcessCLRException, so both probably need to be changed. I don't think you want to change it where you specify, though, because I think it only needs to be TRUE after the first unwind (and I'm not sure the RtlVirtualUnwind that is done to find the caller-sp counts). You'll presumably need to #ifdef this for ARM. I see that this is PAL-only code, which is why we probably didn't see it on Windows, which sets up the DISPATCHER_CONTEXT correctly before calling the per-frame personality routine during a RtlUnwind, that the PAL code is simulating. @janvorli - what do you think?

@janvorli
Copy link
Member

When we get to one of the UnwindManagedExceptionPassX, we have already unwound the context to the first managed frame in the DispatchManagedException and so the PC would point right after a call except for the case when the exception was a hardware exception.
So it seems we should set it in both UnwindManagedExceptionPass1 and UnwindManagedExceptionPass2 right before we call the RtlVirtualUnwind to the result of (context->ContextFlags & CONTEXT_EXCEPTION_ACTIVE) == 0, which means that frame was not a source of hardware exception. That should cover both the case where we came there from a hardware exception and the case where we have walked through a frame with hardware exception that was still on the stack e.g. due to the fact that a hardware exception catch block has thrown another exception.

@parjong
Copy link
Author

parjong commented Apr 27, 2016

@BruceForstall @janvorli I would like to ask your opinion on the following patch:

@@ -1435,7 +1435,7 @@ void ExceptionTracker::InitializeCrawlFrame(CrawlFrame* pcfThisFrame, Thread* pT
     // For async exceptions (e.g. AV), this will be false.
     //
     // We decrement by two to be in accordance with how the kernel does as well.
-    if (pDispatcherContext->ControlPcIsUnwound)
+    if ( (pDispatcherContext->ContextRecord->ContextFlags & CONTEXT_UNWOUND_TO_CALL) != 0 )
     {
         ControlPCForEHSearch -= STACKWALK_CONTROLPC_ADJUST_OFFSET;
         if (fAdjustRegdisplayControlPC == true)

In my understanding, PC should be adjusted if the frame comes from any function call, and CONTEXT_UNWOUND_TO_CALL seems to be the flag that indicates such a case.

When I checked, the above patch also resolves this issue (without the first patch that inserts bkpt after throw instruction).

@BruceForstall
Copy link
Member

I like the change suggested by @janvorli better, because it isolates it to the PAL case, and doesn't affect the Windows side, and also makes PAL unwind work "more" like Windows already does.

@parjong
Copy link
Author

parjong commented Apr 27, 2016

@BruceForstall @janvorli I revised the patch per feedback. It also resolves this issue when I checked.

Enable the patch only for ARM32 in order to fix build break in other
architecture.
@BruceForstall
Copy link
Member

Looks good to me, but @janvorli should have the final say.

@janvorli
Copy link
Member

LGTM

@janvorli
Copy link
Member

@mmitche something weird started to happen for the Ubuntu tests - bunch of them started to fail due to an attempt to load windows dll api-ms-win-core-file-l1-1-0.dll. So it seems that we're somehow using windows assemblies instead of the Linux ones for the test. Do you have any idea how that could happen?

@parjong
Copy link
Author

parjong commented Apr 28, 2016

@dotnet-bot test Ubuntu x64 Checked Build and Test please

@parjong
Copy link
Author

parjong commented Apr 28, 2016

@dotnet-bot test OSX x64 Checked Build and Test please

@parjong
Copy link
Author

parjong commented Apr 29, 2016

@janvorli Is there any issue with Ubuntu x64 Checked Build and Test? I tried to find the details why it fails, but I can't find. Please let me know how to check the issue.

@jkotas
Copy link
Member

jkotas commented Apr 29, 2016

Yes, we have issue with failing tests on Ubuntu and MacOSX. To find more info about the failures, click: Details / Console Output / . You will see that there are about 20 tests failing because of they are trying to load Windows .dlls - it is the known issue.

@parjong
Copy link
Author

parjong commented Apr 29, 2016

@jkotas Oh, thank you. I finally find the log that shows why it fails.

@dnfclas
Copy link

dnfclas commented Apr 29, 2016

@parjong, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@parjong
Copy link
Author

parjong commented May 1, 2016

@dotnet-bot test Ubuntu x64 Checked Build and Test please

@parjong
Copy link
Author

parjong commented May 2, 2016

@dotnet-bot test OSX x64 Checked Build and Test please

@parjong
Copy link
Author

parjong commented May 2, 2016

@janvorli @jkotas It looks like that CI works for Ubuntu x64 Checked, but does not work for OSX x64. For OSX x64, it seems that CI failed to access required artifacts:

Unable to access upstream artifacts area /jenkins/jobs/dotnet_coreclr/jobs/master/jobs/checked_windows_nt_bld_prtest/builds/2126/archive. Does source project archive artifacts?

Is there any update on OSX CI?

@jkotas
Copy link
Member

jkotas commented May 2, 2016

The OS X failures look like infrastructure issue. Sorry about that... .

@jkotas jkotas merged commit 0a88936 into dotnet:master May 2, 2016
@parjong
Copy link
Author

parjong commented May 2, 2016

@jkotas Thank you for kind answer 👍

@parjong parjong deleted the fix/issue_4496 branch May 2, 2016 02:25
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Fix dotnet/coreclr#4496

This commit revises legecy codegen to inserts a bkpt instruction if the current BB
and next BB is not in the same exception handling region.

This code is backported from ARM64 codegen.

* Update ControlPcIsUnwound in UnwindManagedExceptionPassX

* Enable only for ARM32

Enable the patch only for ARM32 in order to fix build break in other
architecture.


Commit migrated from dotnet/coreclr@0a88936
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
9 participants