Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

mikem8361
Copy link

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning. This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Issue: https://github.com/dotnet/coreclr/issues/8606

@mikem8361 mikem8361 added this to the 2.0.0 milestone Jan 13, 2017
@mikem8361 mikem8361 self-assigned this Jan 13, 2017
@mikem8361 mikem8361 requested a review from noahfalk January 13, 2017 23:52
@mikem8361
Copy link
Author

FYI @gregg-miskelly. If you could somehow verify this fixes your race, that would be great. The change isn't that big (only two files and localized); you could apply it to your VS copy/branch/version of coreclr.

@mikem8361
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked Build and Test

@mikem8361
Copy link
Author

@dotnet-bot test OSX x64 Checked Build and Test

@gregg-miskelly
Copy link

@mikem8361 It will take me a little while to get setup again, but I will try to do that next week.

@mikem8361
Copy link
Author

@noahfalk could you review?

@noahfalk
Copy link
Member

I haven't had time to give this deep scrutiny, but it looks like a correct implementation of what we discussed and I'd say go ahead and check it in.

@mikem8361
Copy link
Author

@gregg-miskelly @noahfalk I just pushed another commit that addresses the launch race condition pointed out by Gregg for another issue they ran into. Could you both look at the new changes.

@gregg-miskelly
Copy link

                        _ASSERTE(coreclrExists);

Should we change this from an assert to some sort of real failure since this is what will happen presumably if we ever can't get the continue event for some reason.


Refers to: src/dlls/dbgshim/dbgshim.cpp:551 in 3eb0e03. [](commit_id = 3eb0e03, deletion_comment = False)

Copy link

@gregg-miskelly gregg-miskelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

{
break;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't find anything, would it make more sense to wait for the startup event to get set? (ex:

        CloseCLREnumeration(*ppHandleArrayOut, *ppStringArrayOut, *pdwArrayLengthOut);
        *ppHandleArrayOut = NULL;
        *ppStringArrayOut = NULL;
        *pdwArrayLengthOut = 0;

        return S_OK;

If we do decide to retry, we need to free the memory before doing so or we will leak.

@gregg-miskelly
Copy link

BTW: Not sure if it matters, but we might want to port the dbgshim part of the fix to some release branch to produce a new dbgshim that we can ship with VS 2017 RTW.

@mikem8361
Copy link
Author

mikem8361 commented Jan 18, 2017 via email

@gregg-miskelly
Copy link

Definitely. If you can port this to whole PR to the release branch, I will definitely not complain :) But since the ship vehicles are somewhat different (.NET Core vs. VS) figured I would call it out.

@mikem8361
Copy link
Author

mikem8361 commented Jan 18, 2017 via email

@gregg-miskelly
Copy link

StartupHelperThread callings InvokeStartupCallback up to twice.

The first call is good as you said -- it will just wait for the startup event.
The second call is good as long as coreclrExists comes back true. But if we have some, yet unknown, bug where we can't get the continue event (or otherwise are unable to find a coreclr), currently dbgshim will assert, but otherwise just slightly exit without invoking the callback. See line 551.

@mikem8361
Copy link
Author

mikem8361 commented Jan 18, 2017 via email

HANDLE h = (*ppHandleArray)[i];
if (h != NULL && h != INVALID_HANDLE_VALUE)
{
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you intending to break out of the while loop? This is going to break out of the for loop only. This version looks like it is looping 25 times and then giving S_OK assuming it was successful on the last iteration?

I think you also wanted to retry if any handle was NULL, but if the break applied to the outer while loop this code retries only if all handles are NULL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I've redone the changes and should be correct now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I've redone the changes and should be correct now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I've redone the changes and should be correct now.

@mikem8361
Copy link
Author

mikem8361 commented Jan 18, 2017 via email

for (int i = 0; i < (int)arrayLength; i++)
{
HANDLE h = handleArray[i];
if (h == NULL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code treat NULL and INVALID_HANDLE_VALUE the same?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really sure if it should check for INVALID_HANDLE_VALUE. It looks to me like the only way it could be INVALID_HANDLE_VALUE if the CreateEvent fails on the runtime side. For that unlikely condition (the runtime has a CONSISTENCY_CHECK for it which I think is DEBUG only), dbgshim should probably fail instead of sleep/retry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but the comment says that it could. Fix the comment?

            // fill in event handle -- if GetContinueStartupEvent fails, it will still return 
            // INVALID_HANDLE_VALUE in hContinueStartupEvent, which is what we want.  we don't
            // want to bail out of the enumeration altogether if we can't get an event from
            // one telesto.

            HANDLE hContinueStartupEvent = INVALID_HANDLE_VALUE;
            HRESULT hr = GetContinueStartupEvent(debuggeePID, pStringArray[idx], &hContinueStartupEvent);
            _ASSERTE(SUCCEEDED(hr) == (hContinueStartupEvent != INVALID_HANDLE_VALUE));

            pEventArray[idx] = hContinueStartupEvent;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like there are at least some code paths where failure would be INVALID_HANDLE_VALUE (unless I am missing something)

// If EnumerateCLRs succeeded but any of the handles are null, then sleep and retry also. This fixes a race
// condition where dbgshim catches the coreclr module just being loaded but before g_hContinueStartupEvent
// has been initialized.
if (NoNullHandles(*ppHandleArray, *pdwArrayLength))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading EnumerateCLRs, it looks like we just set the handles to NULL on Unix. So should this code be ifdefed to always return hr; on Unix?

#ifndef FEATURE_PAL
            // fill in event handle -- if GetContinueStartupEvent fails, it will still return 
            // INVALID_HANDLE_VALUE in hContinueStartupEvent, which is what we want.  we don't
...
#else
            pEventArray[idx] = NULL;
#endif // FEATURE_PAL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On unix, this function isn't used by the RegisterForRuntimeStartup and used in general. The only thing we can return is the array of module name strings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay.

@mikem8361
Copy link
Author

mikem8361 commented Jan 18, 2017 via email

@mikem8361
Copy link
Author

It looks like I have to back out the "if any handle is null, wait/retry" logic because it will get hit when for a late attach (the runtime NULLs the continue event after the check if launched/wait logic). This will cause delays for late attach.

And it seems to be causing almost all of the debugger tests to fail. They are launching their debuggees and the logging I've added shows that everything is normal (no null handles/no retries). I'm still trying to come up with an explanation.

{
HANDLE h = handleArray[i];
if (h == NULL)
if (h == INVALID_HANDLE_VALUE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of older CoreCLR's, this will still be NULL, so should we check both?

@mikem8361
Copy link
Author

mikem8361 commented Jan 19, 2017 via email

@gregg-miskelly
Copy link

Ah. Makes sense. Thanks for the clarification.

@mikem8361
Copy link
Author

@noahfalk (and @gregg-miskelly optionally). This is ready for a final code review now. I decided to leave initializing g_hContinueStartupEvent = INVALID_HANDLE_VALUE (-1) because it looks like it does get loaded that way by the loader. It should be just all the changes in dbgshim.cpp and in debugger.cpp.

{
return hr;
}
// If EnumerateCLRs succeeded but any of the handles are null, then sleep and retry also. This fixes a race
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null [](start = 73, length = 4)

I think you meant to say INVALID_HANDLE_VALUE

@noahfalk
Copy link
Member

        _ASSERTE(SUCCEEDED(hr) == (hContinueStartupEvent != INVALID_HANDLE_VALUE));

This assert is no longer true


Refers to: src/dlls/dbgshim/dbgshim.cpp:1166 in 383cc0c. [](commit_id = 383cc0c, deletion_comment = False)

@noahfalk
Copy link
Member

:shipit:

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
@mikem8361
Copy link
Author

@dotnet-bot test Windows_NT x64 Debug Build and Test

@mikem8361 mikem8361 merged commit 3d76888 into dotnet:master Jan 21, 2017
@mikem8361 mikem8361 deleted the fixlaunchrace branch January 21, 2017 01:46
mikem8361 pushed a commit to mikem8361/coreclr that referenced this pull request Jan 23, 2017
…#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit to mikem8361/coreclr that referenced this pull request Jan 23, 2017
…#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit to mikem8361/coreclr that referenced this pull request Jan 23, 2017
…#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit to mikem8361/coreclr that referenced this pull request Jan 23, 2017
…#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit to mikem8361/coreclr that referenced this pull request Jan 23, 2017
…#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit that referenced this pull request Feb 1, 2017
…#9062)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
mikem8361 pushed a commit that referenced this pull request Feb 1, 2017
…#9060)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…/coreclr#8951)

The attached flag was been set asynchronously relative to the DebugActiveProcess
returning.  This could cause a race where the initial module load notification being
missed/not sent to the debugger.

This fix sets the attached flag before any notifications sent during launch if the runtime was
launched/attached using the startup handshake after dbgshim tells the runtime to "continue"
when the runtime startup API callback returns.

Also fixes another race condition in dbgshim where EnumerateCLRs returns a NULL continue event
handle because the coreclr module was loaded but the g_hContinueStartupEvent wasn't initialized
on the runtime side yet. Changed the static initialization of g_hContinueStartupEvent to
INVALID_HANDLE_VALUE and the InternalEnumerateCLRs sleep/retry loop to retry when any of the
handles are INVALID_HANDLE_VALUE. This fixes the race only when you have the latest dbgshim
and coreclr binaries and the old/new mixes still function but don't fix the race.

Commit migrated from dotnet/coreclr@3d76888
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants