# XORSTR Generic String Decryption
> Writing a generic string decryptor for this open source library

- toc: true 
- badges: true
- categories: [xorstr,decryption,python]

## Overview
The open source string encryption library [xorstr](https://github.com/JustasMasiulis/xorstr) has been adopted by multiple malware developers (as well as slight variations on the same technique). The string encryptor makes use of the `xmm`/`ymm` registers and `pxor` `pvxor` instructions to decrypt stack strings. In the words of the developer...

- All keys are 64bit and generated during compile time.
- Data blocks go in increments of 16 bytes so some space may be wasted.
- The code has been crafted so that all the data would be embedded directly into code and not stored on .rdata and such.
- The entirety of string encryption and decryption will be inlined.
 
 The following is an example of the library in use.

```
.text:00411E14 C7 44 24 08 25 7B 87 92                       mov     [esp+60h+var_58], 92877B25h
.text:00411E1C 0F 57 C0                                      xorps   xmm0, xmm0
.text:00411E1F C7 44 24 0C B6 10 A7 1F                       mov     [esp+60h+var_54], 1FA710B6h
.text:00411E27 8B 44 24 08                                   mov     eax, [esp+60h+var_58]
.text:00411E2B 8B 4C 24 0C                                   mov     ecx, [esp+60h+var_54]
.text:00411E2F 89 44 24 10                                   mov     dword ptr [esp+60h+var_50], eax
.text:00411E33 89 4C 24 14                                   mov     dword ptr [esp+60h+var_50+4], ecx
.text:00411E37 C7 44 24 08 D1 77 20 5B                       mov     [esp+60h+var_58], 5B2077D1h
.text:00411E3F C7 44 24 0C C5 36 32 7E                       mov     [esp+60h+var_54], 7E3236C5h
.text:00411E47 8B 44 24 08                                   mov     eax, [esp+60h+var_58]
.text:00411E4B 8B 4C 24 0C                                   mov     ecx, [esp+60h+var_54]
.text:00411E4F 89 44 24 18                                   mov     dword ptr [esp+60h+var_50+8], eax
.text:00411E53 89 4C 24 1C                                   mov     dword ptr [esp+60h+var_50+0Ch], ecx
.text:00411E57 C7 44 24 08 6D 1E EB FE                       mov     [esp+60h+var_58], 0FEEB1E6Dh
.text:00411E5F C7 44 24 0C D9 3C 87 48                       mov     [esp+60h+var_54], 48873CD9h
.text:00411E67 8B 44 24 08                                   mov     eax, [esp+60h+var_58]
.text:00411E6B 8B 4C 24 0C                                   mov     ecx, [esp+60h+var_54]
.text:00411E6F C7 44 24 08 BE 05 4C 3F                       mov     [esp+60h+var_58], 3F4C05BEh
.text:00411E77 89 44 24 40                                   mov     dword ptr [esp+60h+var_20], eax
.text:00411E7B C7 44 24 0C E4 36 32 7E                       mov     [esp+60h+var_54], 7E3236E4h
.text:00411E83 8B 44 24 08                                   mov     eax, [esp+60h+var_58]
.text:00411E87 89 4C 24 44                                   mov     dword ptr [esp+60h+var_20+4], ecx
.text:00411E8B 8B 4C 24 0C                                   mov     ecx, [esp+60h+var_54]
.text:00411E8F 89 44 24 48                                   mov     dword ptr [esp+60h+var_20+8], eax
.text:00411E93 8D 44 24 10                                   lea     eax, [esp+60h+var_50]
.text:00411E97 89 4C 24 4C                                   mov     dword ptr [esp+60h+var_20+0Ch], ecx
.text:00411E9B 8D 50 01                                      lea     edx, [eax+1]
.text:00411E9E 0F 28 4C 24 40                                movaps  xmm1, [esp+60h+var_20]
.text:00411EA3 66 0F EF 4C 24 10                             pxor    xmm1, [esp+60h+var_50]
.text:00411EA9 0F 29 4C 24 10                                movaps  [esp+60h+var_50], xmm1
.text:00411EAE 0F 29 44 24 20                                movaps  [esp+60h+var_40], xmm0
.text:00411EB3 C7 44 24 30 00 00 00 00                       mov     [esp+60h+var_30], 0
.text:00411EBB C7 44 24 34 00 00 00 00                       mov     [esp+60h+var_2C], 0
```

### Samples

- maybe cheat
  - `92394d5c170060b09ba4ffba450f44d5d4387693a00ee1aba910a818fa387b31`
- metastealer
  - `6cf8bfba1b221effcb1eccec0c91fb0906d0b8996932167f654680cb3ac53aac`
- meduza stealer x64
  - `2ad84bfff7d5257fdeb81b4b52b8e0115f26e8e0cdaa014f9e3084f518aa6149`
- meduza stealer x32
  - `29cf1ba279615a9f4c31d6441dd7c93f5b8a7d95f735c0daa3cc4dbb799f66d4`
- mpress unpacked risepro
  - `16ae203879efe1912bb8b97ceb0f4645abcde27a987e98a171d59f9c1ec3f764`
- privateloader
  - `1aa2d32ab883de5d4097a6d4fe7718a401f68ce95e0d2aea63212dd905103948`
- rise pro
  - `2cd2f077ca597ad0ef234a357ea71558d5e039da9df9958d0b8bd0efa92e74c9`
  

## Prior Work

There have been a few attempts at decrypting strings from malware that use this library (or a variation of the technique). While these are good references they don't cover all of the cases in a generic way which is our goal. There is also an IDA script that attempts to decrypt the library but we didn't have any success with it.

### X-Junior IDA Script
[X-Junior](https://github.com/X-Junior) has a script that we can try in IDA to decrypt these strings: [GitHub Repo](https://github.com/X-Junior/Malware-IDAPython-Scripts/tree/main/PivateLoader).

### Andre Tavares Python Script
[andretavare5](https://gist.github.com/andretavare5) has a python script using capstone to decrypt the strings: [Script Gist](https://gist.github.com/andretavare5/66ec413cdb4c7c39d35c22d38c7067a8#file-privateloader_str_decrypt-py). We have created our own hybrid of the two, which uses capstone for disassembly, but implements the logic from the IDA script.

### IDA Xorstr Decryption Plugin
[ida-jm-xorstr-decrypt-plugin](https://github.com/yubie-re/ida-jm-xorstr-decrypt-plugin). This plugin was made to directly attack the xorstr library but it is x64 only and in testing we could not get it to work reliably.

### aimware_deobf_str
[aimware_deobf_str](https://github.com/unknowntrojan/aimware_deobf_str) is a similar approach using RUST! lol.


## Regex and Dissassembly Approach

Our first attempt at decryption uses a regex to identify the `pxor`/`pvxor` instructions then disassembles a small chunk of surrounding code which is then scanned for the stack string `mov` immediate pattern. We also use a recursive search for `mov` where there is an intermediate register or temporary memory location used to store the immediate before it is moved into place.


### Decryption Algorithm
- select the first section in the PE file, assume this is the code
- scan code for final `pxor` instruction and truncate at this instruction to remove extra code from scanning (handle packers with large first sections)
- linear disassemble the prior 0x400 bytes of code - **not efficient**
- traverse assembly until `pxor` instruction is located
- scan backwards until all immediate data is located for the `xmm` registers
- decrypt xmm data, this is the string chunk

### Limitations
- In some cases we end up re-dissassembling the same code again and again
- The 0x400 bytes is an arbitrary number and might miss some of the string setup if there is a re-used chunk in a register (risepro)
- The dissassembly can sometimes be misaligned 
- The DWORD chunks are sometimes moved into place out of order, without tracking displacement we cannot correct this

### Possible Improvements
- Run full regex scan upfront and split binary into chunks based on the largest amount of code that covers all `pxor` instructions and only disassemble each once. 
- Use the `pxor` instruction offset to check for alignment when disassembling (still prone to errors... maybe try the x64dbg test for max valid instructions, could be slow)
- Instead of just scanning for `mov` instructions to collect the DWORD chunks use a recursive scan to track the displacement for each `mov`

**TODO**
- We still need to implement the optimization of only dissassembling big chunks of code the contain the strings instead of re-disassembling the same chunks again and again
- We need to implement EBP shift tracing 
- We need to check for other types of ESP shift and handle them
- We need to expand the xor types to handle more than pxor
- We need to expand the xor operands to handle more than just a register and an memory address 
- ****BUGS**** Currently we are missing a few strings still for rise.bin (check the sql statements)
- ****BUGS**** Our string builder sometimes adds strings that are seperate look for a better solution (tighter address spacing maybe, or something in the code???)
- ****BUGS**** Doesn't work for priv.bin likely just need to handle more xor operands

In [None]:
def unhex(hex_string):
    import binascii
    if type(hex_string) == str:
        return binascii.unhexlify(hex_string.encode('utf-8'))
    else:
        return binascii.unhexlify(hex_string)

def tohex(data):
    import binascii
    if type(data) == str:
        return binascii.hexlify(data.encode('utf-8'))
    else:
        return binascii.hexlify(data)


In [124]:
import pefile
import struct
from capstone import *
from capstone.x86 import *
import re
import time
import logging

log_level = logging.ERROR

# Create logger
logger = logging.getLogger()
logger.setLevel(log_level)



# Create console handler and set level to error
ch = logging.StreamHandler()
ch.setLevel(log_level)

# Create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Add formatter to console handler
ch.setFormatter(formatter)

# Add console handler to logger
logger.addHandler(ch)

# Hack to clear handlers for jupyter notebook
for h in logger.handlers:
    logger.removeHandler(h)


def is_ascii(s):
    return all(c < 128 or c == 0 for c in s)


def string_builder(strings):
    out = []
    last_addr = 0
    last_string = ""
    for s in strings[::-1]:
        diff = last_addr - s[0]
        if diff <= 98 and last_string is not None:
            last_string = s[1] + last_string
        else:
            out.append((last_addr,last_string))
            last_string = s[1]
        last_addr = s[0]
    out.append((last_addr,last_string))
    return out[::-1]


def xor(data, key):
    out = []
    for i in range(len(data)):
        out.append(data[i] ^ key[i % len(key)])
    return bytes(out)


def print_unique_strings(strings):
    string_dict = {}  
    last_string = ''
    for s in strings:
        if last_string != s[1]:
            string_dict[s[0]] = s[1]
        last_string = s[1]
    print(f"Found strings: {len(string_dict.keys())}\n")
    for o in string_dict.keys():
        print(f"{hex(o)} {string_dict[o]}")


def get_imm_data_recursive(instructions, opr, displacement=0):
    logger.debug(f"get_imm_data_recursive: {opr}, displacement={displacement}")
    # Sanity check
    if opr.type != X86_OP_MEM and opr.type != X86_OP_REG:
        logger.debug(f"ERROR Operand type is not X86_OP_MEM or X86_OP_REG")
        return None

    # Determin the operand type
    # If the operand is a memory address search with displacement else search with register name
    instruction_count = len(instructions)
    for ptr in range(instruction_count):
        inst = instructions[ptr]
        logger.debug(f"recursive testing {inst} at {hex(inst.address)} ")

        # If the opr memory we are searching for is ESP then we need to check for stack changes
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'sub' and inst.operands[0].type == X86_OP_REG and inst.operands[0].reg == X86_REG_ESP:
            if ptr+1 < instruction_count and instructions[ptr+1].mnemonic == 'call':
                logger.debug(f"Assuming stack change is for call at {hex(inst.address)}")
            else:
                if inst.operands[1].type != X86_OP_IMM:
                    logger.debug(f"ERROR: Expected immediate value for stack change")
                    return None
                displacement -= inst.operands[1].value.imm
                logger.debug(f"{hex(inst.address)}: sub esp New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'add' and inst.operands[0].type == X86_OP_REG and inst.operands[0].reg == X86_REG_ESP:
            if inst.operands[1].type != X86_OP_IMM:
                logger.debug(f"ERROR: Expected immediate value for stack change")
                return None
            displacement += inst.operands[1].value.imm
            logger.debug(f"{hex(inst.address)}: add esp New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'push':
            displacement -= 4
            logger.debug(f"{hex(inst.address)}: push New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'pop':
            displacement += 4
            logger.debug(f"{hex(inst.address)}: pop New displacement: {hex(displacement)}")
        
        # Track the movs to see if they have the memory/register we are looking for
        if inst.mnemonic == 'mov':
            if opr.type == X86_OP_MEM and inst.operands[0].type == X86_OP_MEM:
                
                inst_op_mem = inst.operands[0].value.mem
                if inst_op_mem.disp == opr.value.mem.disp + displacement and inst_op_mem.base == opr.value.mem.base and inst_op_mem.index == opr.value.mem.index and inst_op_mem.scale == opr.value.mem.scale:
                    logger.debug(f"Found mov mem for our mem at {hex(inst.address)}")
                    if inst.operands[1].type == X86_OP_IMM:
                        return inst.operands[1].value.imm
                    else:
                        return get_imm_data_recursive(instructions[ptr:], inst.operands[1])
            elif opr.type == X86_OP_REG and inst.operands[0].type == X86_OP_REG:
                if inst.operands[0].reg == opr.reg:
                    logger.debug(f"Found mov reg for our reg at {hex(inst.address)}")
                    if inst.operands[1].type == X86_OP_IMM:
                        return inst.operands[1].value.imm
                    else:
                        return get_imm_data_recursive(instructions[ptr:], inst.operands[1])
        elif inst.mnemonic == 'lea':
            if opr.type == X86_OP_REG and inst.operands[0].type == X86_OP_REG:
                if inst.operands[0].reg == opr.reg:
                    logger.debug(f"Found lea reg for our reg at {hex(inst.address)}")
                    if inst.operands[1].type == X86_OP_MEM:
                        return get_imm_data_recursive(instructions[ptr:], inst.operands[1])
    return None
            



def get_string_from_pxor_ex(instructions, xor_reg, xor_mem):
    # Get the memory address moved into the xor_reg
    xor_reg_mem = None
    xor_reg_mem_offset = 0
    for ptr in range(len(instructions)):
        inst = instructions[ptr]
        logger.debug(f"Testing {hex(inst.address)}")
        if inst.mnemonic[:3] == 'mov' and inst.operands[0].type == X86_OP_REG and inst.operands[0].reg == xor_reg.reg:
            if inst.operands[1].type != X86_OP_MEM:
                logger.debug(f"Error mov xor_reg is not a memory address at {hex(inst.address)}: {inst.op_str}")
                return None
            else:
                # Get the memory address moved into the xor_mem
                logger.debug(f"Found mov xor_reg at {hex(inst.address)}: {inst.op_str}")
                xor_reg_mem = inst.operands[1]
                xor_reg_mem_offset = ptr
                break
    if xor_reg_mem is None:
        logger.debug(f"Error mov xor_reg not found")
        return None
    # Break xor_reg_mem memory base into 4 DWORD chunks
    logger.debug("Getting data chunks for xor_reg_mem")
    op_mem = xor_reg_mem.value.mem
    op_disp = op_mem.disp
    op_base = op_mem.base
    op_index = op_mem.index
    op_scale = op_mem.scale
    logger.debug(f"op_disp:{op_disp} op_base:{op_base} op_index:{op_index} op_scale:{op_scale}")

    data1 = b''
    for i in [0, 4, 8, 12]:
        tmp_chunk = get_imm_data_recursive(instructions[xor_reg_mem_offset:], xor_reg_mem, displacement=i)
        if tmp_chunk is None:
            logger.debug(f"Error no imm found for xor_reg_mem {xor_reg_mem} at displacement {i}")
            return None
        logger.debug(f"data1 chunk: {hex(tmp_chunk)}")
        data1 += struct.pack('<I', tmp_chunk)   

    # Recursive scan for each chunk and get the imm value
    data2 = b''
    for i in [0, 4, 8, 12]:
        tmp_chunk = get_imm_data_recursive(instructions, xor_mem, displacement=i)
        if tmp_chunk is None:
            logger.debug(f"Error no imm found for xor_mem {xor_mem} at displacement {i}")
            return None
        logger.debug(f"data2 chunk: {hex(tmp_chunk)}")
        data2 += struct.pack('<I', tmp_chunk)   
    
    out = xor(data1, data2)
    logger.debug(out)
    out = out.replace(b'\x00',b'')
    if len(out) == 0:
        return None
    #print(out.decode('utf-8'))

    if not is_ascii(out):
        return None
    
    return out.decode('utf-8')


def get_strings(filename):
    # Capstone setup
    md = Cs(CS_ARCH_X86, CS_MODE_32) 
    md.detail = True
    md.skipdata = True

    # Get code from PE
    pe = pefile.PE(filename)
    # Assume the first section is code
    txt = pe.sections[0]

    image_base = pe.OPTIONAL_HEADER.ImageBase
    section_rva = txt.VirtualAddress
    section_offset = txt.PointerToRawData
    section_data = txt.get_data()


    strings = []
        
    pxor_vpxor_vxorps_egg = rb'(\x66\x0F\xEF|\xC5\xFD\xEF|\xC5\xF8\x57)'

    for m in re.finditer(pxor_vpxor_vxorps_egg, section_data, re.DOTALL):
        xor_start = m.start() 

        # Determine the instruction length
        xor_instruction = list(md.disasm(section_data[xor_start:xor_start+0x10], image_base + section_rva + xor_start))[0]
        xor_instruction_address = image_base + section_rva + xor_start

        # if xor_instruction_address != 0x09910F2:
        #     continue

        if xor_instruction.mnemonic != 'pxor' and xor_instruction.mnemonic != 'vpxor' and xor_instruction.mnemonic != 'vxorps':
            #print(f"Found {xor_instruction.mnemonic} instead of pxor")
            continue

        xor_len = xor_instruction.size
        if xor_instruction.operands[0].type == X86_OP_REG and xor_instruction.operands[1].type == X86_OP_REG:
            # Skip xor reg, reg
            continue
            
       
        if xor_instruction.operands[0].type != X86_OP_REG or xor_instruction.operands[1].type != X86_OP_MEM:
            # Skip anything that is not xor reg, [mem]
            continue
        
        scan_length = 0x2000
        if scan_length > xor_start:
            scan_length = xor_start
        instructions = []
        for inst in md.disasm(section_data[xor_start-scan_length:xor_start], xor_instruction_address - scan_length):
            instructions.append(inst)
        logger.debug(f"checking {len(instructions)} instructions")
        if xor_instruction.mnemonic == 'pxor':
            logger.debug(f"Testing pxor at {hex(xor_instruction_address)}, {xor_instruction}")
            tmp_string = get_string_from_pxor_ex(instructions[::-1], xor_instruction.operands[0], xor_instruction.operands[1])
            #tmp_string = get_string_from_pxor_xmm(instructions[::-1], xor_instruction.operands[0], xor_instruction.operands[1])
            if tmp_string is not None:
                strings.append((xor_instruction_address,tmp_string))

    return strings





filename = '/tmp/xorstr/work/rise.bin'

t = time.time()
strings = get_strings(filename)
# Benchmark 50.47584390640259 metastealer    
print(f"Benchmark {time.time() - t}")




ss= string_builder(strings)
print(f"Strings recovered: {len(ss)}")
print_unique_strings(ss)
       

print("done")


Benchmark 40.022075176239014
Strings recovered: 739
Found strings: 661

0x4010ec 0.1
0x4011ab 50500
0x4064ef RisePro
Telegram: https://t.me/RiseProSUPPORT
0x458e88 winhttp.dll
0x458f88 wininet.dll
0x459060 LocalSimbl
0x459123 LocalSimba
0x459463 grab_screen
0x4594d4 grab_tg
0x459542 grab_ds
0x4595b9 grab_wallets
0x459631 grab_ihistory
0x459717 logins
0x45977b Vault_IE
0x45982f logins
0x4598a4 WindowsCredentials
0x459adb .zip_
0x459bc0 .zip
0x459df4 \screenshot.png
0x459e97 \Files
0x459f2f \FileZilla
0x45a0eb \Plugins
0x45a30a \
0x45a526 IndexedDB
0x45a59c Sync
0x45a606 Local
0x45a66e \
0x45a823 \Wallets
0x45aa28 \
0x45ac41 IndexedDB
0x45acb7 Sync
0x45ad21 Local
0x45ad98 \
0x45afb0 \History
0x45b10f _
0x45b198 wb.txt
0x45b25f \
0x45b4ed url
0x45b5ea time
0x45b7bc \CC
0x45b944 _
0x45b9d6 wb.txt
0x45ba9d \
0x45be5d nickname
0x45bed6 name_on_card
0x45bfd0 card_number
0x45c0f5 last_four
0x45c246 **** **** **** 
0x45c393 billing_address_id
0x45c46c -
0x45c555 exp_month
0x45c5ce exp_year
0x45

### Speed Optimizations

Instead of iterativly scanning with a regex and disassembling for each `pxor` instruction we can perform one full scan and then split the binary into chunks that contain all the `pxor` instructions. These chunks can then be disassembled a single time and the results re-used for each `pxor` hit. 

In [125]:
def get_strings_fast(filename):
    # Capstone setup
    md = Cs(CS_ARCH_X86, CS_MODE_32) 
    md.detail = True
    md.skipdata = True

    # Get code from PE
    pe = pefile.PE(filename)
    # Assume the first section is code
    txt = pe.sections[0]

    image_base = pe.OPTIONAL_HEADER.ImageBase
    section_rva = txt.VirtualAddress
    section_offset = txt.PointerToRawData
    section_data = txt.get_data()


    strings = []
        
    pxor_vpxor_vxorps_egg = rb'(\x66\x0F\xEF|\xC5\xFD\xEF|\xC5\xF8\x57)'

    chunk_offsets = []
    last_end = 0
    for m in re.finditer(pxor_vpxor_vxorps_egg, section_data, re.DOTALL):
        xor_start = m.start() 

        # Determine the instruction length
        xor_instruction = list(md.disasm(section_data[xor_start:xor_start+0x10], image_base + section_rva + xor_start))[0]
        xor_instruction_address = image_base + section_rva + xor_start

        if xor_instruction.mnemonic == 'pxor' and xor_instruction.operands[0].type == X86_OP_REG and xor_instruction.operands[1].type == X86_OP_MEM:
            scan_length = 0x2000
            if scan_length > xor_start:
                scan_length = xor_start
            if xor_start - scan_length < last_end:
                # Update last chunk with new end
                chunk_offsets[-1] = (chunk_offsets[-1][0], xor_start + 10)
            else:
                chunk_offsets.append((xor_start - scan_length, xor_start + 10))

            last_end = xor_start + 10

    chunks = []
    for chunk_offset in chunk_offsets:
        chunk_data = section_data[chunk_offset[0]:chunk_offset[1]]
        chunk_instruction_address = image_base + section_rva + chunk_offset[0]
        instructions = []
        for inst in md.disasm(chunk_data, chunk_instruction_address):
            instructions.append(inst)
        chunks.append(instructions)


    for chunk in chunks:
        chunk_len = len(chunk)
        for i in range(chunk_len):
            inst = chunk[i]
            if inst.mnemonic == 'pxor' and inst.operands[0].type == X86_OP_REG and inst.operands[1].type == X86_OP_MEM:
                tmp_string = get_string_from_pxor_ex(chunk[i::-1][:0x2000], inst.operands[0], inst.operands[1])
                if tmp_string is not None:
                    strings.append((inst.address,tmp_string))

    return strings

filename = '/tmp/xorstr/work/rise.bin'

t = time.time()
strings = get_strings_fast(filename)
# Benchmark 50.47584390640259 metastealer    
print(f"Benchmark {time.time() - t}")




ss= string_builder(strings)
print(f"Strings recovered: {len(ss)}")
print_unique_strings(ss)
       

print("done")

Benchmark 15.065591096878052
Strings recovered: 743
Found strings: 665

0x4010ec 0.1
0x4011ab 50500
0x4064ef RisePro
Telegram: https://t.me/RiseProSUPPORT
0x458e88 winhttp.dll
0x458f88 wininet.dll
0x459060 LocalSimbl
0x459123 LocalSimba
0x459463 grab_screen
0x4594d4 grab_tg
0x459542 grab_ds
0x4595b9 grab_wallets
0x459631 grab_ihistory
0x459717 logins
0x45977b Vault_IE
0x45982f logins
0x4598a4 WindowsCredentials
0x459adb .zip_
0x459bc0 .zip
0x459df4 \screenshot.png
0x459e97 \Files
0x459f2f \FileZilla
0x45a0eb \Plugins
0x45a30a \
0x45a526 IndexedDB
0x45a59c Sync
0x45a606 Local
0x45a66e \
0x45a823 \Wallets
0x45aa28 \
0x45ac41 IndexedDB
0x45acb7 Sync
0x45ad21 Local
0x45ad98 \
0x45afb0 \History
0x45b10f _
0x45b198 wb.txt
0x45b25f \
0x45b4ed url
0x45b5ea time
0x45b7bc \CC
0x45b944 _
0x45b9d6 wb.txt
0x45ba9d \
0x45be5d nickname
0x45bed6 name_on_card
0x45bfd0 card_number
0x45c0f5 last_four
0x45c246 **** **** **** 
0x45c393 billing_address_id
0x45c46c -
0x45c555 exp_month
0x45c5ce exp_year
0x45

### Hard Limitations

The current approach will only work if the memory taint/trace is built in order in the code. For samples other than RisePRO compiler optimizations can lead to the "base" of the memory being located after the conents of the memory are filled - making a reverse linear search impossible. See the following example with private loader.

```
.text:00991077 B8 2D 33 69 8B                          mov     eax, 8B69332Dh
.text:0099107C C7 45 B8 05 4F 11 7C                    mov     dword ptr [ebp-48h], 7C114F05h
.text:00991083 89 45 BC                                mov     [ebp-44h], eax
.text:00991086 8B 4D B8                                mov     ecx, [ebp-48h]
.text:00991089 8B 55 BC                                mov     edx, [ebp-44h]
.text:0099108C 89 4D A0                                mov     [ebp-60h], ecx  ; base offset
.text:0099108F 89 55 A4                                mov     [ebp-5Ch], edx  ; base offset + 4 (next chunk)
```

The following is our failed attempt at a reverse linear search. This code may be of interest for future projects but it cannot acomplish the task above.


In [115]:

def get_xmm_data_recursive(instructions, opr, displacement=0):
    logger.debug(f"get_xmm_data_recursive: {opr}, displacement={displacement}")
    # Sanity check
    if opr.type != X86_OP_MEM and opr.type != X86_OP_REG:
        logger.debug(f"ERROR Operand type is not X86_OP_MEM or X86_OP_REG")
        return None
    
    # Track the opr memory as it moves between register and memory
    instruction_count = len(instructions)
    for ptr in range(instruction_count):
        inst = instructions[ptr]
        #logger.debug(f"recursive testing {inst} at {hex(inst.address)} ")

        # If the opr memory we are searching for is ESP then we need to check for stack changes
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'sub' and inst.operands[0].type == X86_OP_REG and inst.operands[0].reg == X86_REG_ESP:
            if ptr+1 < instruction_count and instructions[ptr+1].mnemonic == 'call':
                logger.debug(f"Assuming stack change is for call at {hex(inst.address)}")
            else:
                if inst.operands[1].type != X86_OP_IMM:
                    logger.debug(f"ERROR: Expected immediate value for stack change")
                    return None
                displacement -= inst.operands[1].value.imm
                logger.debug(f"{hex(inst.address)}: sub esp New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'add' and inst.operands[0].type == X86_OP_REG and inst.operands[0].reg == X86_REG_ESP:
            if inst.operands[1].type != X86_OP_IMM:
                logger.debug(f"ERROR: Expected immediate value for stack change")
                return None
            displacement += inst.operands[1].value.imm
            logger.debug(f"{hex(inst.address)}: add esp New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'push':
            displacement -= 4
            logger.debug(f"{hex(inst.address)}: push New displacement: {hex(displacement)}")
        
        if opr.type == X86_OP_MEM and opr.value.mem.base == X86_REG_ESP and inst.mnemonic == 'pop':
            displacement += 4
            logger.debug(f"{hex(inst.address)}: pop New displacement: {hex(displacement)}")
        
        # Track the movs to see if they have the memory/register we are looking for
        if inst.mnemonic[:3] == 'mov':
            if opr.type == X86_OP_MEM and inst.operands[0].type == X86_OP_MEM:
                inst_op_mem = inst.operands[0].value.mem
                if inst_op_mem.disp == opr.value.mem.disp + displacement and inst_op_mem.base == opr.value.mem.base and inst_op_mem.index == opr.value.mem.index and inst_op_mem.scale == opr.value.mem.scale:
                    if inst.operands[1].type == X86_OP_IMM:
                        # This might be the base with the first immedate move test of missing 3 chunks
                        logger.debug(f"Test new base at with imm at  {hex(inst.address)}")
                        out_data = struct.pack('<I', inst.operands[1].value.imm)
                        flag_failed = False
                        for i in [4, 8, 12]:
                            tmp_chunk = get_imm_data_recursive(instructions[ptr:], inst.operands[1], displacement=i)
                            if tmp_chunk is None:
                                logger.debug(f"Error no xmm found for {opr} at displacement {i}")
                                flag_failed = True
                                break
                            logger.debug(f"Found chunk {tmp_chunk}")
                            out_data += struct.pack('<I', tmp_chunk)   

                        if not flag_failed:
                            return out_data
                        logger.debug("Base incorrect continuing search...")
                    if inst.operands[1].type == X86_OP_REG:
                        logger.debug(f"Found memory switch at {hex(inst.address)} {inst} recursive search...")
                        return get_xmm_data_recursive(instructions[ptr:], inst.operands[1])
            elif opr.type == X86_OP_REG and inst.operands[0].type == X86_OP_REG:
                if inst.operands[0].reg == opr.reg:
                    if inst.operands[1].type == X86_OP_MEM:
                        if inst.operands[1].value.mem.base == X86_REG_ESP or inst.operands[1].value.mem.base == X86_REG_EBP:
                            # This is as the new base if it doesn't work keep tracking with the new opr
                            logger.debug(f"Test new base at {hex(inst.address)} {inst}")
                            out_data = b''
                            flag_failed = False
                            for i in [0, 4, 8, 12]:
                                tmp_chunk = get_imm_data_recursive(instructions[ptr:], inst.operands[1], displacement=i)
                                if tmp_chunk is None:
                                    logger.debug(f"Error no xmm found for {opr} at displacement {i}")
                                    flag_failed = True
                                    break
                                logger.debug(f"Found chunk {tmp_chunk}")
                                out_data += struct.pack('<I', tmp_chunk)   

                            if not flag_failed:
                                return out_data
                            logger.debug("Base incorrect continuing search...")
                        else:
                            new_op = X86Op()
                            new_op.value = X86OpValue(inst.operands[1].value.mem.base)
                            new_op.type = X86_OP_REG
                            logger.debug(f"Found memory switch with new register at {hex(inst.address)} {inst} recursive search...")
                            return get_xmm_data_recursive(instructions[ptr:], new_op)
                        logger.debug(f"Found memory switch at {hex(inst.address)} {inst} recursive search...")
                        return get_xmm_data_recursive(instructions[ptr:], inst.operands[1])
    return None


def get_string_from_pxor_xmm(instructions, xmm0, xmm1):
    data0 = get_xmm_data_recursive(instructions, xmm0, displacement=0)
    if data0 is None:
        logger.debug(f"Error no data found for xmm0 {xmm0}")
        return None
    logger.debug(f"Found data for xmm0 {data0.hex()}")
    data1 = get_xmm_data_recursive(instructions, xmm1, displacement=0)  
    if data1 is None:   
        logger.debug(f"Error no data found for xmm1 {xmm1}")
        return None 
    logger.debug(f"Found data for xmm1 {data0.hex()}")
    out = xor(data0, data1)
    logger.debug(out)
    out = out.replace(b'\x00',b'')
    if len(out) == 0:
        return None

    if not is_ascii(out):
        return None
    
    return out.decode('utf-8')


## Memory-Only Emulation Trace (@mishap)

A more comprehensive approach may be to use this novel memory-only emulation approach by [@mishap](https://github.com/oopsmishap). 

**[pxor_string_decrypt_wip.py](https://gist.github.com/oopsmishap/d8f72e3f2324691f0067ed473278dff3)**

### Algorithm
- disassemble the target code block
- setup a global stack and registers (maintain state as we "emulate")
- iterate through each instruction
- for instructions that manipulate memory implement a handler
- when we hit a pxor instruction the operands should be populated with the correct data due to our memory emulation

## Ideas for Improvement 
- For the stack use a dict where ESP is the key 
- Implement instructions that have an effect on ESP and track this register in the instruction handlers
- Hack - track calls and don't adjust ESP if done directly after a call (assume this is stack cleanup)
- Make sure instruction handlers are an exact match for the instructions 
- Rest the stack and registers when a `ret` or ??? instruction is hit



In [52]:
import time
from typing import List
import pefile
from capstone import *
from capstone.x86 import *
import re
import struct

STACK_SIZE = 0x10000
CHUNK_SIZE = 0x400

# 128 bit pack/unpack
def pack_128(val):
    a = val & 0xFFFFFFFFFFFFFFFF
    b = (val >> 64) & 0xFFFFFFFFFFFFFFFF
    return struct.pack('<QQ', a, b)

def unpack_128(val):
    try:
        a, b = struct.unpack('<QQ', val)
        return a | (b << 64)
    except:
        return 0

class Env:
    def __init__(self):
        self.stack = bytearray(STACK_SIZE)

        # Create a list that will represent the registers
        # The list index will be the register number as defined in X86_REG_*
        # Example.  self.reg[X86_REG_EAX] = 0x0
        self.reg = [0]*X86_REG_ENDING

    def clear(self):
        self.stack = bytearray(STACK_SIZE)
        self.reg = [0] * X86_REG_ENDING

    # save data to the stack as little endian at the given offset
    def save_stack(self, offset, data, size):
        offset = offset + STACK_SIZE//2
        print(f"adjusting stack for ESP {hex(self.reg[X86_REG_ESP])} and offset {hex(offset)} for stack size {hex(len(self.stack))}")
        # Crazy hack to add in a stack pointer
        offset += self.reg[X86_REG_ESP]
        print(f"saving data to stack: {data} {size}")
        # if offset is negative, wrap around
        # if offset < 0:
        #     offset = STACK_SIZE + offset
        # if offset + size > STACK_SIZE:
        #     offset = offset % STACK_SIZE

        if size == 1:
            self.stack[offset] = data
        elif size == 2:
            self.stack[offset:offset+2] = struct.pack('<h', data)
        elif size == 4:
            if -2147483648 <= data <= 2147483647:
                self.stack[offset:offset+4] = struct.pack('<i', data)
            else:
                self.stack[offset:offset+4] = struct.pack('<I', data)
        elif size == 8:
            self.stack[offset:offset+8] = struct.pack('<q', data)
        elif size == 16:
            self.stack[offset:offset+16] = pack_128(data)

    # load data from the stack as little endian at the given offset
    def load_stack(self, offset, size):
        offset = offset + STACK_SIZE//2
        print(f"adjusting stack for ESP {hex(self.reg[X86_REG_ESP])} and offset {hex(offset)} for stack size {hex(len(self.stack))}")
        # Crazy hack to add in a stack pointer
        offset += self.reg[X86_REG_ESP]
       
        # if offset < 0:
        #     offset = STACK_SIZE + offset
        # if offset + size > STACK_SIZE:
        #     offset = offset % STACK_SIZE
        try:
            if size == 1:
                return self.stack[offset]
            elif size == 2:
                return struct.unpack('<H', self.stack[offset:offset+2])[0]
            elif size == 4:
                return struct.unpack('<I', self.stack[offset:offset+4])[0]
            elif size == 8:
                return struct.unpack('<Q', self.stack[offset:offset+8])[0]
            elif size == 16:
                return unpack_128(self.stack[offset:offset+16])
        except:
            return 0
        


def setup_capstone():
    md = Cs(CS_ARCH_X86, CS_MODE_32)
    md.detail = True
    md.skipdata = True
    md.syntax = CS_OPT_SYNTAX_INTEL
    return md

# find all pxor instructions using regex, then goes up chunk size and disassembles
def find_all_pxor(md: Cs, pe: pefile.PE):
    txt_section = pe.sections[0]
    txt_data = txt_section.get_data()
    image_base = pe.OPTIONAL_HEADER.ImageBase
    section_rva = txt_section.VirtualAddress

    pxor_egg = b'\x66\x0F\xEF'
    pxor_size = 6

    scan_end = txt_data.rfind(pxor_egg)
    txt_data = txt_data[:scan_end+pxor_size]

    # get a chunk of instructions starting from the given offset
    def get_chunk(start, size):
        instructions = []
        for inst in md.disasm(txt_data[start-size:start+pxor_size], image_base + section_rva + start + pxor_size - size):
            # we only care about pxor and mov instructions
            if inst.mnemonic == 'pxor' or inst.mnemonic == 'mov' or inst.mnemonic == 'movaps':
                instructions.append(inst)
        # skip if no instructions
        if len(instructions) == 0:
            return []
        # skip if first instruction is not pxor
        if instructions[-1].mnemonic != 'pxor':
            return []
        return instructions

    # get pxor chunks
    chunks = []
    for m in re.finditer(pxor_egg, txt_data, re.DOTALL):
        scan_end = m.start()
        chunks.append(get_chunk(scan_end, CHUNK_SIZE))
    return chunks

#simple xor function
def xor(data, key):
    out = []
    for i in range(len(data)):
        out.append(data[i] ^ key[i % len(key)])
    return bytes(out)

def emulate_chunk(chunk : List[CsInsn], env: Env):
    inst : CsInsn  # auto complete
    strings_out = []
    for inst in chunk:

        regs_read, regs_write = inst.regs_access()
        print(f'{hex(inst.address)}: {inst.mnemonic} {inst.op_str}, regs_read: {regs_read}, regs_write: {regs_write}')
        if inst.mnemonic == 'mov':

            # Skip array access
            if len(regs_read) == 3:
                continue 
            # if first op is stack pointer and second op is register
            if len(regs_read) == 2 and inst.operands[1].type == X86_OP_REG and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP):
                #print(f'0x{inst.address:x}: write {hex(env.reg[regs_read[1]])} to stack at {hex(inst.disp)}')
                env.save_stack(inst.disp, env.reg[regs_read[1]], inst.operands[1].size)

            # if first op is register and second op is stack pointer
            elif len(regs_read) == 1 and inst.operands[0].type == X86_OP_REG and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP) and len(regs_write) == 1:
                #print(f'0x{inst.address:x}: read {hex(env.load_stack(inst.disp, inst.operands[0].size))} from stack at {hex(inst.disp)}')
                env.reg[regs_write[0]] = env.load_stack(inst.disp, inst.operands[0].size)

            # if first op is stack pointer and second op is immediate
            elif len(regs_read) == 1 and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP) and inst.operands[1].type == X86_OP_IMM:
                #print(f'0x{inst.address:x}: write {hex(inst.operands[1].imm)} to stack at {hex(inst.disp)}')
                env.save_stack(inst.disp, inst.operands[1].imm, inst.operands[1].size)

            # if first op is register and second op is immediate
            elif len(regs_write) == 1 and inst.operands[1].type == X86_OP_IMM:
                #print(f'0x{inst.address:x}: write {hex(inst.operands[1].imm)} to {hex(inst.operands[0].reg)}')
                env.reg[regs_write[0]] = inst.operands[1].imm

            # # if first op is stack pointer and second op is register
            # elif len(regs_read) == 2 and inst.operands[1].type == X86_OP_REG and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP):
            #     #print(f'0x{inst.address:x}: write {hex(env.reg[regs_read[1]])} to stack at {hex(inst.disp)}')
            #     env.save_stack(inst.disp, env.reg[regs_read[1]], inst.operands[1].size)

        elif inst.mnemonic == 'movaps':
            # Skip array access
            if len(regs_read) == 3:
                continue 
            # if first op is stack pointer and second op is register
            if len(regs_read) == 2 and inst.operands[1].type == X86_OP_REG  and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP):
                #print(f'0x{inst.address:x}: write {hex(env.reg[regs_read[1]])} to stack at {hex(inst.disp)}')
                env.save_stack(inst.disp, env.reg[regs_read[1]], inst.operands[1].size)

            # if first op is register and second op is stack pointer
            elif len(regs_read) == 1 and inst.operands[0].type == X86_OP_REG  and (regs_read[0] == X86_REG_ESP or regs_read[0] == X86_REG_EBP) and len(regs_write) == 1:
                #print(f'0x{inst.address:x}: read {hex(env.load_stack(inst.disp, inst.operands[0].size))} from stack at {hex(inst.disp)}')
                env.reg[regs_write[0]] = env.load_stack(inst.disp, inst.operands[0].size)

        # looking for pxor with:
        # "pxor xmm0, xmmword ptr [esp+0x10]"
        # "pxor xmm0, xmmword ptr [ebp-0x10]"
        # e.g.
        elif inst.mnemonic == 'pxor':
            # grab the two operand values
            val1 = env.reg[regs_read[0]]
            val2 = env.load_stack(inst.disp, inst.operands[1].size)

            #print(f'0x{inst.address:x}: xor {hex(val1)} with {hex(val2)} to get {hex(val1 ^ val2)}')

            if val1 == 0 or val2 == 0:
                continue
            # pack them into 128 bit values
            data = pack_128(val1)
            key = pack_128(val2)
            out = xor(data, key)

            #print(f'0x{inst.address:x}: xor {data} with {key} to get {out}')

            env.reg[regs_write[0]] = unpack_128(out)
            strings_out.append((inst.address, out))
        
        # Handle push by updating the stack pointer
        elif inst.mnemonic == 'push':
            env.reg[X86_REG_ESP] -= 4

        elif inst.mnemonic == 'pop':
            env.reg[X86_REG_ESP] += 4
            

        else:
            raise Exception(f'Unknown instruction {inst.mnemonic}')
    # strings_out = b''.join(strings_out)
    # strings_out = strings_out.split(b'\x00')
    return strings_out



# SAMPLE_PATH = '/tmp/xorstr/work/rise.bin'

# pe = pefile.PE(SAMPLE_PATH)
# md = setup_capstone()

# t = time.time()

# chunks = find_all_pxor(md, pe)

# print(f"found {len(chunks)} chunks")

# env = Env()
# strings = []

# # loop through each chunk and "emulate" it
# for chunk in chunks:
#     if len(chunk) == 0:
#         continue
#     # extend the strings list with the strings found in this chunk
#     strings.extend(emulate_chunk(chunk, env))
#     env.clear()

# print(f"Benchmark {time.time() - t}")

# # hack to remove duplicates
# # strings = list(dict.fromkeys(strings))

# for a, s in strings:
#     # tidy up some of the strings
#     s = s.rstrip(b'\x00')
#     s = s.decode('utf-8', 'ignore')
#     if len(s) == 0:
#         continue
#     print(f'0x{a:08x}: {s}')

In [53]:

SAMPLE_PATH = '/tmp/xorstr/work/rise.bin'
# Capstone setup
md = Cs(CS_ARCH_X86, CS_MODE_32) 
md.detail = True
md.skipdata = True

# Get code from PE
pe = pefile.PE(SAMPLE_PATH)
# Assume the first section is code
txt = pe.sections[0]

image_base = pe.OPTIONAL_HEADER.ImageBase
section_rva = txt.VirtualAddress
section_offset = txt.PointerToRawData
section_data = txt.get_data()
    
pxor_vpxor_vxorps_egg = rb'(\x66\x0F\xEF|\xC5\xFD\xEF|\xC5\xF8\x57)'

chunk_offsets = []
last_end = 0
for m in re.finditer(pxor_vpxor_vxorps_egg, section_data, re.DOTALL):
    xor_start = m.start() 

    # Determine the instruction length
    xor_instruction = list(md.disasm(section_data[xor_start:xor_start+0x10], image_base + section_rva + xor_start))[0]
    xor_instruction_address = image_base + section_rva + xor_start

    if xor_instruction.mnemonic == 'pxor' and xor_instruction.operands[0].type == X86_OP_REG and xor_instruction.operands[1].type == X86_OP_MEM:
        scan_length = 0x2000
        if scan_length > xor_start:
            scan_length = xor_start
        if xor_start - scan_length < last_end:
            # Update last chunk with new end
            chunk_offsets[-1] = (chunk_offsets[-1][0], xor_start + 10)
        else:
            chunk_offsets.append((xor_start - scan_length, xor_start + 10))

        last_end = xor_start + 10

chunks = []
for chunk_offset in chunk_offsets:
    chunk_data = section_data[chunk_offset[0]:chunk_offset[1]]
    chunk_instruction_address = image_base + section_rva + chunk_offset[0]
    instructions = []
    for inst in md.disasm(chunk_data, chunk_instruction_address):
        if inst.mnemonic == 'pxor' or inst.mnemonic == 'mov' or inst.mnemonic == 'movaps' or inst.mnemonic == 'push' or inst.mnemonic == 'pop':
            instructions.append(inst)
    chunks.append(instructions)


t = time.time()


print(f"found {len(chunks)} chunks")

env = Env()
strings = []

# loop through each chunk and "emulate" it
for chunk in chunks:
    if len(chunk) == 0:
        continue
    # extend the strings list with the strings found in this chunk
    strings.extend(emulate_chunk(chunk, env))
    env.clear()

print(f"Benchmark {time.time() - t}")

# hack to remove duplicates
# strings = list(dict.fromkeys(strings))


def string_builder(strings):
    out = []
    last_addr = 0
    last_string = ""
    for s in strings[::-1]:
        diff = last_addr - s[0]
        if diff <= 88 and last_string is not None:
            last_string = s[1] + last_string
        else:
            out.append((last_addr,last_string))
            last_string = s[1]
        last_addr = s[0]
    out.append((last_addr,last_string))
    return out[::-1]

tmp_strings = []
for a, s in strings:
    # tidy up some of the strings
    s = s.rstrip(b'\x00')
    s = s.decode('utf-8', 'ignore')
    if len(s) == 0:
        continue
    tmp_strings.append((a,s))

strings = string_builder(tmp_strings)

for a, s in strings:
    print(f'0x{a:08x}: {s}')


found 7 chunks
0x401000: push 0x4dc1c0, regs_read: [30], regs_write: [30]
0x401005: mov ecx, 0x4f0598, regs_read: (), regs_write: [22]
0x40100f: push 0x4c7caa, regs_read: [30], regs_write: [30]
0x401019: pop ecx, regs_read: [30], regs_write: [30, 22]
0x40101b: push 0x4c7cb4, regs_read: [30], regs_write: [30]
0x401025: pop ecx, regs_read: [30], regs_write: [30, 22]
0x401027: push 2, regs_read: [30], regs_write: [30]
0x401029: push 0x4f064c, regs_read: [30], regs_write: [30]
0x401033: push 0x4c7cbe, regs_read: [30], regs_write: [30]
0x401041: push 0, regs_read: [30], regs_write: [30]
0x401043: mov ecx, 0x4f0620, regs_read: (), regs_write: [22]
0x40104d: push 0x4c7d1e, regs_read: [30], regs_write: [30]
0x401057: pop ecx, regs_read: [30], regs_write: [30, 22]
0x401059: push 0, regs_read: [30], regs_write: [30]
0x40105b: mov ecx, 0x4f06a8, regs_read: (), regs_write: [22]
0x401065: push 0x4c7d02, regs_read: [30], regs_write: [30]
0x40106f: pop ecx, regs_read: [30], regs_write: [30, 22]
0x401

IndexError: bytearray index out of range

'0x80000000'