Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in InteractionProtocolHandler_Session_Connect in debug builds on failed connections. #502

Closed
dantraMSFT opened this issue Mar 28, 2018 · 17 comments · Fixed by #540

Comments

@dantraMSFT
Copy link
Contributor

Connecting through PowerShell/PSRP using basic auth with invalid credentials causes a segfault in debug builds.

The problem is due to an invalid pointer dereference.

At the lowest level, when the error is posted, PostResult has the following lines at the end:

Message_AddRef(&errorMsg->base);
Strand_ScheduleAux(&self->strand, PROTOCOLSOCKET_STRANDAUX_POSTMSG);
PostResultMsg_Release(errorMsg);

After calling Strand_ScheduleAux, the operation that was passed into InteractionProtocolHandler_Session_Connect has been freed (in debug builds, the memory block is filled with 0xdddddddddddddddd). This causes the seg fault for the following reason:

1: MI_RESULT_ACCESS_DEFINED gets returned to WsmanClient_New_Connector

2: WsmanClient_New_Connector jumps to finished2 and posts another error result but then returns MI_RESULT_OK to the caller.

3: InteractionProtocolHandler_Session_Connect receives the MI_RESULT_OK and attempts to continue but it is referencing a stale memory pointer; it checks for a non-null pointer in one of the fields. In release builds, this check succeeds, the memory block is zero filled. In debug builds, the field is non-null (0xdddddddddddddddd) and the pointer is dereferenced causing a segfault.

Returning MI_RESULT_OK appears to be suspect since it requires dereferencing a stale memory pointer. Alternatively, the operation is getting deleted when it should not be.

@JumpingYang001
Copy link
Contributor

I will check it today.

@JumpingYang001
Copy link
Contributor

we repro the issue today, will continue to investigate it tomorrow.

@JumpingYang001
Copy link
Contributor

I have fixed the previous issue cause this issue, and now it will never hit or go into this issue, so just close it.

@JumpingYang001
Copy link
Contributor

PowerShell/psl-omi-provider#117 can repro this issue, will fix it soon.

@JumpingYang001 JumpingYang001 reopened this Apr 3, 2018
@JumpingYang001
Copy link
Contributor

JumpingYang001 commented Apr 3, 2018

Found the root cause now: DEBUG build enable ENABLE_CLEARING
https://github.com/Microsoft/omi/blob/edbe231042173018c52903e4170dd8b782fc4f8a/Unix/pal/alloc.c#L383-L385
https://github.com/Microsoft/omi/blob/edbe231042173018c52903e4170dd8b782fc4f8a/Unix/miapi/InteractionProtocolHandler.c#L560

debug stack:

InteractionProtocolHandler_Operation_Strand_Finish (self_=0x7f77940d35d8) at InteractionProtocolHandler.c:561
561         PAL_Free(operation);
(gdb) p operation->protocolConnection
$23 = (InteractionProtocolHandler_ProtocolConnection *) 0x7f77940d35c8
(gdb) s
__PAL_Free (file=0x7f7776fbd786 "InteractionProtocolHandler.c", line=561,
    func=0x7f7776fbdac0 <__FUNCTION__.19210> "InteractionProtocolHandler_Operation_Strand_Finish",
    ptr=0x7f77940d5e68) at alloc.c:377
377     {
(gdb) finish
Run till exit from #0  __PAL_Free (file=0x7f7776fbd786 "InteractionProtocolHandler.c", line=561,
    func=0x7f7776fbdac0 <__FUNCTION__.19210> "InteractionProtocolHandler_Operation_Strand_Finish",
    ptr=0x7f77940d5e68) at alloc.c:377
InteractionProtocolHandler_Operation_Strand_Finish (self_=0x7f77940d35d8) at InteractionProtocolHandler.c:562
562     }
(gdb) p operation->protocolConnection
$24 = (InteractionProtocolHandler_ProtocolConnection *) 0xdddddddddddddddd

@JumpingYang001
Copy link
Contributor

@paulcallen @palladia , just remove the ENABLE_CLEARING flag to fix this issue?

@dantraMSFT
Copy link
Contributor Author

That flag is a good thing, it's telling us we're using a stale memory pointer. If you remove the flag, it simply masks the problems.
This is what is occurring in release mode. right now, its a stale pointer problem and the code is getting away with it. It could easily turn into memory corruption problem later on if the code changes.

@JumpingYang001
Copy link
Contributor

@dantraMSFT , so we needn't do anything for this debug crash issue? and so I can close this issue now?

@JumpingYang001
Copy link
Contributor

@dantraMSFT , can I close this? since we have nothing need to fix for the flag.

@dantraMSFT
Copy link
Contributor Author

@JumpingYang001: no, the flag is indicating there is a bug in the code because it's using a stale memory pointer. This needs to be fixed.

@JumpingYang001
Copy link
Contributor

JumpingYang001 commented Apr 9, 2018

@dantraMSFT but that only occurs on debug build, the flag cause the pointer to stale,am I misunderstanding?
If we disable the flag, the stale pointer issue will disappear and everything works fine for debug build.
operation->protocolConnection become 0xdd when it was in PAL_Free() function.
We only use that flag on this function for the whole project.

@JumpingYang001
Copy link
Contributor

@dantraMSFT we have verified omi 1.4.1-0 exist this crash issue and it is fixed on omi 1.4.2-2, so I just close it, thank you!
Please update the submodule of PSRP to omi 1.4.2-2, thanks.

@dantraMSFT
Copy link
Contributor Author

What is the PR that fixes this?

Regarding the flag, disabling the flag does not make the stale pointer issue go away. it's there in both release and debug builds. the reason you see it with the flag is the flag explicit fills memory at free time with a value that is guaranteed to be an invalid pointer. That's why the segfault only occurs in debug build. There's still a stale memory reference in both release and debug builds.

As I mentioned above, this appears to only occur creating the connection fails. The operation passed by InteractionProtocolHandler_Session_Connect is getting deleted but InteractionProtocolHandler_Session_Connect has no way of knowing this.

@JumpingYang001
Copy link
Contributor

@dantraMSFT yes, you are right, disable the flag doesn't fix the crash, we just verify 1.4.2-2 fixed the issue, Let me check the PR, thanks.

@JumpingYang001
Copy link
Contributor

@dantraMSFT We fixed a message processing issue on this PR: ba1958e that fixed the issue, too.

@dantraMSFT
Copy link
Contributor Author

dantraMSFT commented May 18, 2018

@JumpingYang001 and @palladia: This issue needs to be reactivated. I'm seeing a double free fault in release builds that are failing our CI builds and can also repro the above invalid pointer segfault in 100% of the time in debug builds in private builds. I'm building against omi label 1.4.2-2

The repro for debug builds is connsistent when using an invalid computer name:
$session = new-pssession -ComputerName invalidcomputername -Credential $cred -Authentication Negotiate (or Kerberos)

I have a core dump if needed.

@JumpingYang001
Copy link
Contributor

Requested by Dan to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants