Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure winexec is 16 byte aligned and add nCmdShow option #2308

Merged
merged 9 commits into from Dec 14, 2023

Conversation

kizzx2
Copy link
Contributor

@kizzx2 kizzx2 commented Nov 26, 2023

Pwntools Pull Request

Thanks for contributing to Pwntools! Take a moment to look at CONTRIBUTING.md to make sure you're familiar with Pwntools development.

WinExec currently errors out when tested on Windows 11 x64. After some diagnosis it appears that pushstr aligns to 8 bytes but Windows calling convention requires 16 byte. Perhaps this was not enforced in previous versions of Windows but apparently that is the standard requirement.

Example programs to reproduce the issue:

import pwn

pwn.context.arch = "amd64"
sc = pwn.asm(pwn.shellcraft.amd64.windows.winexec("calc.exe")+"ret")
print("""unsigned char buf[] = {""" + ", ".join(map(hex, sc)) + """};""")
#include <Windows.h>
#include <stdio.h>

unsigned char buf[] = {}; // ... paste the shell code here

int main() {
    void *exec = VirtualAlloc(0, sizeof(buf), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(exec, buf, sizeof(buf));
    ((void(*)())exec)();
    printf("Done!\n");
    return 0;
}

Screenshot:

image

As seen above, rsp is not 16 byte aligned causes movaps to error out

Additionally, nCmdShow argument support is added

Testing

Pull Requests that introduce new code should try to add doctests for that code. See TESTING.md for more information.

Target Branch

Depending on what the PR is for, it needs to target a different branch.

You can always change the branch after you create the PR if it's against the wrong branch.

Branch Type of PR
dev New features, and enhancements
dev Documentation fixes and new tests
stable Bug fixes that affect the current stable branch
beta Bug fixes that affect the current beta branch, but not stable
dev Bug fixes for code that has never been released

Changelog

After creating your Pull Request, please add and push a commit that updates the changelog for the appropriate branch.
You can look at the existing changelog for examples of how to do this.

@Arusekk
Copy link
Member

Arusekk commented Nov 26, 2023 via email

@kizzx2
Copy link
Contributor Author

kizzx2 commented Nov 26, 2023

Thanks for the reply!

as there are better tools for that (metasploit for instance)

My experience suggests otherwise. It is extremely convoluted, if not impossible (I never managed to get it work after some trials and errors) to get any Metaploit shellcode to work on modern Windows without Defender flagging it, due to its popularity. (That's how I landed here ;)

Metasploit's address loader (to get position independence) with its customized "hashing algorithm" is probably too popular. Even something like msfvenom -p windows/x64/exec CMD=calc.exe just wouldn't work even with layers of encoding / encryptions that bloat a 900 byte shellcode into 5000 (Happy to learn if there's something I missed. I guess it might work if the msf payload is broken into parts with some tricky jmps in between -- but then might as well handcode the whole thing)

I would prefer the stack move to be in a separate variable, and then the extra alignment to be just stack_move & 8, because the tons of math involved seem too convoluted to read.

Anyway, regarding this PR -- happy to change it in a way that's more obvious but not sure what you meant by a "separate variable"? Anyway to shed some clarity this is what the math is basically trying to do:

  1. align(8, len(cmd) + 1) This how much amd64.pushstr would have advanced the stack. I believe the original code was also bugged that the null byte was not accounted for. pushstr in the default param would have added a null byte so we need to account for it here as well, that's why +1.
  2. align(8, len(cmd) + 1) // 8 % 2

To see what it does:

align(8, len(cmd)+1) align(8,len(cmd)+1)//8%2
8 1
16 0
24 1
32 0
  1. Basically, if pushstr moved us forward an 8-byte aligned value on the stack (instead of 16), then the above would evaluate to 1

  2. Now, before pushstr, the stack is already 8-byte aligned due to the shellcraft not doing the push rbp "function prolog". Therefore, we can say that the existing code is also bugged in that way that it does not align the stack before calling WinExec. (for handcoded x64 windows assembly withotu function prologs, people sometimes manually sub rsp, 8 before function calls) (see below for an explanation of why this bug was not noticed earlier)

  3. Thus the logic is

    • If pushstr pushed us through a 8-byte multiple, then we'll actually be 16-byte aligned, good to go
    • If pushstr pushed us through an 16-byte multiple, then we'll actually be 8-byte aligned. This will cause an error in WinExec -- we need to push another 8 byte to align ourselves
  4. From 5, we can see that we need to basically invert the result of step 3. We can do that by ^ 1 on step 3's result. So the final expression is

pad = align(8, len(cmd) + 1) // 8 % 2 ^ 1 * 8 # * 8 because we do 8 bytes multiple. Surprisingly the operator precedences just worked out without extra parentheses
  1. Now, we can see why the existing code probably worked for the original use case even though it was "bugged" -- cmd.exe plus null byte is exactly 8 bytes, and probably that was what this function was tested on. amd64.pushstr would have pushed the stack 8 bytes thus "accidentally" aligning the stack to 16-byte alignment. However, the existing implementation would only work for a command that is 1-7 characters long (or 16-23 characters long, etc.). calc.exe plus null byte is incidentally 9 bytes long, thus not accidentally aligning the stack.

and there is no easy way to test the shellcraft payloads on the CI

Yes, thus tried to add screenshots and sufficient PoC codes so this can be hopefully easier to verify.

Copy link
Member

@peace-maker peace-maker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!
The shellcode should be null and newline free and so those dynamic stack shifts feel bad. Could we get away with some and rsp, -32 to align it down?
https://github.com/masthoon/pwintools/blob/master/pwintools.py#L951

Edit: I guess it's not about the sub with the 0x30 or 0x38 constants, but more about the add if it exceeds 0x7f?

pwnlib/shellcraft/templates/amd64/windows/winexec.asm Outdated Show resolved Hide resolved
@kizzx2
Copy link
Contributor Author

kizzx2 commented Dec 8, 2023

Sure, just pushed that please take a look

@peace-maker
Copy link
Member

Thanks, this looks good now. The only problem I still see with this shellcode (while we're at it) is the add instruction inserting null bytes for longer commands. How is this solved in similar places in shellcraft? The easiest solution would be to do

${amd64.mov('rcx', stack_frame + stack_frame_align)}
add rsp, rcx

to let mov handle the value.

@peace-maker peace-maker merged commit c7649c9 into Gallopsled:dev Dec 14, 2023
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants