# The Refinery Files 0x03: SedUpLoader C2s

This is a tutorial about extracting the C2 domains from [SedUpLoader] samples. We will be working with the following one:
```
2396c9dac2184405f7d1f127bec88e56391e4315d4d2e5b951c795fdc1982d59
```
As always, remember that this is **malware**, do not execute it unless you know exactly what you are doing. For instructions about how to set up [refinery], see the main page and documentation.

[refinery]: https://github.com/binref/refinery/
[SedUpLoader]: https://malpedia.caad.fkie.fraunhofer.de/details/win.seduploader

In [1]:
from tutorials import boilerplate
boilerplate.store_sample(
    name='a.bin',
    hash='2396c9dac2184405f7d1f127bec88e56391e4315d4d2e5b951c795fdc1982d59'
)

In [1]:
%ls

42.496 kB 2396c9dac2184405f7d1f127bec88e56391e4315d4d2e5b951c795fdc1982d59 a.bin


## String Decryption

After some reverse engineering, you discover that the function at `0x403FBA` implements the string decryption, which is an XOR with the following 13-byte sequence, stored at the virtual address `0x408b78`:
```
5f19362c533e6f1a0c6a202e34
```
Most calls to the string decryption function decrypt a constant string. Let us first decrypt those constant strings. The string decryption functions receives its two arguments (the encrypted string buffer and its length) on the stack, and the opcodes for such a call look similar to this:
```
00404a8f  6a XX                  PUSH  X
00404a91  68 YY YY YY YY         PUSH  Y
00404a96  ...
00404a98  e8 1d f5 ff ff         CALL  STRING_DECRYPT
```
where `X` is the length and `Y` is the string address. We will first try to find all these call sequences. First, we use [emit] to output the contents of the malware sample by using the [rex] unit to search for the opcode sequence of pushing a nonzero byte and a 32bit-integer address to the stack:
```
rex "\x6A([^\0])\x68(.{4})" {1}{2}
```
The second argument to [rex] is the format string `{1}{2}` which means to simply concatenate the first and second match group - in this case, this will be the single byte encoding the string length and the four bytes encoding its address. We then use the [struct] unit to parse the integers from the opcode sequence; the struct format `{n:B}{a:L}` contains two format fields: `{n:B}` to read the one-byte string length value into the variable `n`, and `{a:L}` to read the 4-byte string address value into the variable `a`. Finally, we use [pf] to pretty-print the output.

[emit]: https://binref.github.io/#refinery.emit
[rex]: https://binref.github.io/#refinery.rex
[struct]: https://binref.github.io/#refinery.struct
[pf]: https://binref.github.io/#refinery.pf

In [1]:
%emit a.bin | rex "\x6A([^\0])\x68(.{4})" {1}{2} [| struct {n:B}{a:L} | pf address=0x{a:08X}, length={n} ]]

address=0x1FC0EAEE, length=7
address=0xA48D6762, length=43
address=0x000000A9, length=1
address=0x00408150, length=14
address=0x00000093, length=1
address=0x00000244, length=1
address=0x00408160, length=9
address=0x00000094, length=1


address=0x0040816C, length=8
address=0x00000097, length=1
address=0x00000095, length=1
address=0x00408144, length=12
address=0x000001F2, length=9
address=0x00000239, length=1
address=0x0000017E, length=4
address=0x0000017F, length=4
address=0x000001A4, length=5
address=0x000001A2, length=5
address=0x00408AEC, length=1
address=0x00408AF0, length=1
address=0x00408AEC, length=1
address=0x01010101, length=255
address=0x40000000, length=2
address=0x00408B88, length=12
address=0x19F78C90, length=92
address=0x5BC1D14F, length=94
address=0xC930EA1E, length=93
address=0x0D89AD05, length=75
address=0x00408BA4, length=43
address=0x00408D84, length=4
address=0x00408D6C, length=5
address=0x00408D74, length=6
address=0x00408D7C, length=6
address=0x00408BE2, length=1
address=0x00408BE4, length=12
address=0x00408D8C, length=14
address=0x00408D84, length=4
address=0x00408E04, length=12
address=0x00408E10, length=2
address=0x00408DA0, length=67
address=0x00408DE4, length=6
address=0x00408BF0, length=44


It is already quite clear that some of these are probably false positives; for example, the "address" `0x00008088` is invalid. That should not be a problem for our next step, though. We now want to adjust the pipeline so that we actually extract the encrypted strings rather than just their addresses. At the end of our current pipeline, we are working on a stream of 5-byte sequences which encode a length (as one byte) and an address: We have already lost the data of the original sample when we ran the [rex] command. To correct this, we will first use [put] to store a backup of the sample data in a variable called `bin`. This variable will still be attached to the results of [rex] when they pass to the [struct] unit. We then alter the [struct] command as follows:
```
struct {n:B}{a:L} {bin}
```
We will still parse out the string length and address as variables `n` and `a`, respectively. The second argument of struct is an optional string format expression that defines the output body. In this case, we are instructing it to output the contents of the previously defined variable `bin`. After this command, the output will be several copies of the malware sample, each of which has meta variables `a` and `n` defined, specifying the virtual address and length of what is potentially an encrypted string. To extract the actual strings, we use the [vsnip] unit, which can extract data from executable formats based on virtual addresses. We specify the `--quiet` flag for [vsnip] because we already know that some addresses will be bogus and we want to simply ignore those warnings.

[peek]: https://binref.github.io/#refinery.peek
[put]: https://binref.github.io/#refinery.put
[rex]: https://binref.github.io/#refinery.rex
[struct]: https://binref.github.io/#refinery.struct
[vsnip]: https://binref.github.io/#refinery.vsnip

In [1]:
%emit a.bin [| put bin | rex "\x6A([^\0])\x68(.{4})" {1}{2} | struct {n:B}{a:L} {bin} | vsnip -Q a:n | peek -b ]

00.014 kB: 18 7C 42 7C 21 51 0C 7F 7F 19 68 4B 55 2F                                         .|B|!Q....hKU/             
00.009 kB: 17 7C 57 5C 12 52 03 75 6F                                                        .|W\.R.uo                  
00.008 kB: 17 7C 57 5C 15 4C 0A 7F                                                           .|W\.L..                   
00.012 kB: 13 76 57 48 1F 57 0D 68 6D 18 59 6F                                               .vWH.W.hm.Yo               
00.001 kB: 32                                                                                2                          
00.001 kB: 34                                                                                4                          
00.001 kB: 32                                                                                2                          
00.012 kB: 2D 6C 58 48 3F 52 5C 28 22 0F 58 4B                                               -lXH?R\(".XK               
00.043 kB: 0C 40 65 78 16 73 33 

This looks promising already. Now all we have to do is to apply the actual [xor] operation to decrypt the strings:

[xor]: https://binref.github.io/#refinery.xor

In [1]:
%emit a.bin [| put bin | rex "\x6A([^\0])\x68(.{4})" {1}{2} | struct {n:B}{a:L} {bin} | vsnip -Q a:n | xor h:5f19362c533e6f1a0c6a202e34 | peek -be ]]

00.014 kB: GetProcessHeap
00.009 kB: HeapAlloc
00.008 kB: HeapFree
00.012 kB: LoadLibraryA
00.001 kB: m
00.001 kB: k
00.001 kB: m
00.012 kB: rundll32.exe
00.043 kB: SYSTEM\\CurrentControlSet\\Services\\Disk\\Enum
00.004 kB: POST
00.005 kB: disk=
00.006 kB: build=
00.006 kB: inject
00.001 kB: w
00.012 kB: 0Wf9896@2?91
00.014 kB: /%s%s%s/?%s=%s
00.004 kB: POST
00.012 kB: s3j3hj4g5gy3
00.002 kB: ==
00.067 kB: Software\\Microsoft\\Windows\\CurrentVersion\\Internet Settings\\Servers
00.006 kB: Domain
00.044 kB: google.com\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
00.014 kB: www.google.com
00.006 kB: search
00.003 kB: GET
00.002 kB: q=
00.044 kB: google.com\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
00.044 kB: google.com\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
00.002 kB: id
00.011 kB: \xf4\xfa\xfb\x1d\t\xc1\xfa\xebx\x80\v
00.067 kB: Software\\Microsoft\\Windows\\CurrentVersion\\Internet Settings\\Servers
00.006 kB

That is a little disappointing; this doesn't look like we found the C2 servers. Looks like we will have to do a little more digging.

## C2 Servers

After looking around some more, it turns out that there is a single call to the string decryption function that does not receive a constant argument. The call is at `0x405837` and it is used to decrypt four chunks of size `44` each, starting at offset `0x408bf0`. No need to be coy about it - this is indeed the C2 server list, except for the first entry, which is a domain used for connectivity checks (it's `google.com` in this sample). Decrypting the C2 servers is now fairly straightforward:

[push]: https://binref.github.io/#refinery.push
[pop]: https://binref.github.io/#refinery.pop

In [1]:
%emit a.bin | vsnip 0x408bf0:4*44 | chop 44 [| xor h:5f19362c533e6f1a0c6a202e34 | trim h:00 | defang | peek -be ]]

00.012 kB: google[.]com
00.027 kB: microsoftstoreservice[.]com
00.017 kB: servicetlnt[.]net
00.019 kB: windowsdefltr[.]net


We have again used the [vsnip] unit to read data from a virtual address, in this case we read `4*44` (this expression will be evaluated to `176`) bytes from the address where the encrypted C2 array is stored, and then we [chop] this buffer into 4 buffers, each of which has length `44`. Then, we use a frame to decrypt each of these buffers with the XOR key. To make the output prettier, we [trim] trailing null bytes and [defang] the network indicators before we have a [peek].

[chop]: https://binref.github.io/#refinery.chop
[defang]: https://binref.github.io/#refinery.defang
[peek]: https://binref.github.io/#refinery.peek
[trim]: https://binref.github.io/#refinery.trim
[vsnip]: https://binref.github.io/#refinery.vsnip

This is nice and all, but let's use this opportunity to learn about [push] and [pop]. The goal is to avoid having to hard-code the key into the pipeline, so we would like to use [vsnip] to first extract the key, and then again to extract the C2 server list. The finished pipeline looks as follows:

[push]: https://binref.github.io/#refinery.push
[pop]: https://binref.github.io/#refinery.pop
[vsnip]: https://binref.github.io/#refinery.vsnip

In [1]:
%%emit a.bin | push [
    | vsnip 0x408b78:13
    | pop key
    | peek -l5
    | vsnip 0x408bf0:4*44
    | chop 44 [
        | xor var:key
        | trim h:00
        | defang
        | peek -be ]]

------------------------------------------------------------------------------------------------------------------------
42.496 kB; 79.26% entropy; PE32 executable (GUI) Intel 80386, for MS Windows
  key = h:5f19362c533e6f1a0c6a202e34
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 D8 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 29 9B A4 67 6D FA CA 34 6D FA CA 34  mode....$.......)..gm..4m..4
---------------------------------------

The [push] unit creates a hidden copy of the current chunk and inserts it at the end of the current frame. Hence, after the [push] instruction, the frame contains a visible copy of the sample data, and one invisible copy. Invisible chunks are passed on along the frame, but refinery units to not operate on them. Hence, the first [vsnip] command is only executed on the visible chunk, extracting the 13 key bytes from their known address. The invocation of [pop] does two things: It takes the first visible chunk in the current frame and associates it with the variable `key`. Then, it makes all remaining chunks visible again and attaches the variable `key` to them. In this case, we end up with a copy of the original sample, with a variable named `key`, containing the decryption key. Everything after that is identical to the previous pipeline, with the exception that we can now pass the variable `key` to the [xor] unit rather than the hardcoded value.

[push]: https://binref.github.io/#refinery.push
[pop]: https://binref.github.io/#refinery.pop
[vsnip]: https://binref.github.io/#refinery.vsnip
[xor]: https://binref.github.io/#refinery.xor

## Automatic C2 Extraction

Finally, let us combine the techniques we have seen into a pipeline that can (in some cases) extract the C2 configuration data from SedUpLoader samples:

In [1]:
%%emit a.bin [
    | push
    | put bin
    | rex "\xc7\x45(.)(.\0{3}).{0,4}\xf7\x75\1\x8a\x82(....)\x32\x04\x0F" {2}{3}
    | struct {kl:L}{ka:L} {bin}
    | vsnip ka:kl
    | pop key
    | put bin
    | rex "\xB8(....).{0,10}\x6A(.)\x50\xE8" {1}{2}
    | struct {a:L}{n:B} {bin}
    | vsnip a
    | chop n [
        | xor var:key
        | trim h:00
        | iffp domain
        | defang
        | peek -be ]]

00.012 kB: google[.]com
00.027 kB: microsoftstoreservice[.]com
00.017 kB: servicetlnt[.]net
00.019 kB: windowsdefltr[.]net


This combines the techniques from the previous two sections. Here is a quick overview of how the pipeline works. The first regular expression looks for the following opcode sequence from the string decryption function, where `X` is the stack offset of the chunk size variable, `Y` is the chunk size value, and `Z` is the address of the key string:
```
00403fe0  c7 45 XX YY YY YY YY   MOV   dword ptr [EBP + X], Y
          ...
00403fe9  f7 75 XX               DIV   dword ptr [EBP + X]
00403fec  8a 82 ZZ ZZ ZZ ZZ      MOV   AL, byte ptr [EDX + Z]
00403ff2  32 04 0f               XOR   AL, byte ptr [EDI + ECX*0x1]
```
We use [push]/[pop] as in the previous pipeline, except that we did not hard-code the offset of the key buffer, but instead searched for a characteristic opcode sequence to determine the address. The second regular expression looks for the following opcode sequence from the code that decrypts the C2 servers:
```
0040582a  b8 XX XX XX XX         MOV   EAX, X
          ...
00405834  6a YY                  PUSH  Y
00405836  50                     PUSH  EAX
00405837  e8 7e e7 ff ff         CALL  STRING_DECRYPT
```
The value `X` is the address of the list and `Y` contains the size of each chunk. The former is then stored in the variable `a`, the latter in the variable `n`. After having determined these values, we can again proceed as in the previous pipeline with a few modifications: We [vsnip] _all_ memory starting at `a`, then [chop] it into chunks of size `n` and decrypt them. Now, we have likely extracted and decrypted quite a few chunks that are not actually C2 domains. To filter them out, we use the [iffp] unit: It takes as its parameter the name of any pattern known to [carve] and [xtp] and removes any chunk from the frame that does not match this pattern. In this case, we will only forward chunks that look like a domain. And that's it - a somewhat automatic SedUpLoader config extractor in refinery!

[push]: https://binref.github.io/#refinery.push
[pop]: https://binref.github.io/#refinery.pop
[vsnip]: https://binref.github.io/#refinery.vsnip
[chop]: https://binref.github.io/#refinery.chop
[iffp]: https://binref.github.io/#refinery.iffp
[carve]: https://binref.github.io/#refinery.carve
[xtp]: https://binref.github.io/#refinery.xtp
[xor]: https://binref.github.io/#refinery.xor