# The Refinery Files 0x06: Qakbot Decoder

This is a short tutorial on how to extract the configuration from an unpacked Qakbot sample.
We will be working with the following sample:

In [1]:
from tutorials import boilerplate as bp
bp.store_sample('84669a2a67b9dda566a1d8667b1d40f1ea2e65f06aa80afb6581ca86d56981e7', 'q.bot')

The full pipeline to extract the C2 configuration from this sample is the following.
The tutorial will step through each part and explain what is going on.

In [1]:
%%emit q.bot [[
        | put backup [
            | rex yara:5168([4])BA([2]0000)B9([4])E8 {1}{3}{2}
            | struct {ka:L}{da:L}{dl:L}
            | put key vsnip[ka:128]:var:backup
            | emit vsnip[da:dl]:var:backup 
            | xor var:key ]
        | resplit h:00 
        | swap key
        | swap backup
        | perc RCDATA [| max size ]
        | rc4 sha1:var:key
        | put required x::20
        | put computed sha1:c:
        | iff required -eq computed
        | rc4 x::20 
        | snip 20:
        | rex '(\x01.{7})+' ]
    | struct -m !xBBBBHx {1}.{2}.{3}.{4}:{5} [| sep ]
    | peek -d ]

------------------------------------------------------------------------------------------------------------------------
02.222 kB; 44.01% entropy; ASCII text
---------------------------------------------------------------------------------------------------------------[utf8]---
181.118.183.103:443
92.239.81.124:443
174.58.146.57:443
73.223.248.31:443
86.129.13.178:2222
47.34.30.133:443
89.216.114.179:443
41.44.11.227:995
66.180.227.170:2222
46.229.194.17:443
------------------------------------------------------------------------------------------------------------------------


## Configuration Resource Format

Qakbot contains two configuration resources which are doubly encrypted with RC4:

In [1]:
%emit q.bot | perc RCDATA [| peek -l2 ]

------------------------------------------------------------------------------------------------------------------------
01.020 kB; 97.77% entropy; data
    lcid = Neutral Locale Language
  offset = 0x2878C
    path = RCDATA/3C91E539/0
------------------------------------------------------------------------------------------------------------------------
00000: 32 16 4A 40 17 69 8D CB 73 9A 6B D7 86 C7 06 0F 2A 5C 16 EE E4 12 7E 3E 56 5D BB C7  2.J@.i..s.k.....*\....~>V]..
0001C: 5E 42 C9 E2 23 D5 98 74 57 62 89 7B 19 1F 90 35 BD 6B 10 47 71 F9 76 0E E6 CE 84 22  ^B..#..tWb.{...5.k.Gq.v...."
------------------------------------------------------------------------------------------------------------------------
00.083 kB; 75.02% entropy; data
    lcid = Neutral Locale Language
  offset = 0x28B88
    path = RCDATA/89290AF9/0
------------------------------------------------------------------------------------------------------------------------
00000: 64 2D C5 1A 99 AF E3 A4 23 2D 1C 72 5

The key in this sample is derived from the following secret:
```
bUdiuy81gYguty@4frdRdpfko(eKmudeuMncueaN
```
The key derivation is a single round of SHA1. After decrypting the resources with the derived key, the data is in the format
```
[checksum][key2][data]
```
where `[checksum]` is the SHA1 hash of `[key2][data]`, and `[key2]` is the RC4 key for the second decryption layer.
There are some Qakbot variants where the format of this second layer is different and the second decoding step has to work differently,
but the purpose of this tutorial is not to be comprehensive Qakbot overview, but rather a demonstration of how you can build configuration decoders in refinery. We will therefore assume that we are only dealing with samples that use the aforementioned format.
When the secret is known, decrypting the resources is straightforward:

In [1]:
%%emit q.bot [
    | perc RCDATA
    | rc4 sha1:bUdiuy81gYguty@4frdRdpfko(eKmudeuMncueaN
    | snip 20:
    | rc4 x::20
    | snip 20:
    | peek -d ]

------------------------------------------------------------------------------------------------------------------------
00.960 kB; 72.57% entropy; data
    lcid = Neutral Locale Language
  offset = 0x2878C
    path = RCDATA/3C91E539/0
------------------------------------------------------------------------------------------------------------------------
00000: 01 B5 76 B7 67 01 BB 00 01 5C EF 51 7C 01 BB 01 01 AE 3A 92 39 01 BB 00 01 49 DF F8  ..v.g....\.Q|.....:.9....I..
0001C: 1F 01 BB 00 01 56 81 0D B2 08 AE 00 01 2F 22 1E 85 01 BB 00 01 59 D8 72 B3 01 BB 00  .....V......./"......Y.r....
00038: 01 29 2C 0B E3 03 E3 01 01 42 B4 E3 AA 08 AE 00 01 2E E5 C2 11 01 BB 00 01 BE 4A F8  .),......B................J.
00054: 88 01 BB 01 01 58 7A D0 C5 7D 64 00 01 4E A1 26 F2 01 BB 01 01 59 73 C4 63 01 BB 00  .....Xz..}d..N.&.....Ys.c...
00070: 01 AE 00 E0 D6 01 BB 01 01 AF CD 02 36 01 BB 01 01 88 E8 B8 86 03 E3 01 01 D5 C2 EA  ............6...............
0008C: 4B 03 E3 00 01 69 9A 70 4D 01 B

The large resource contains entries of the following form, where each cell represents a byte:
```
    0   1   2   3   4   5   6   7   8
    .---.---.---.---.---.---.---.---.
    | A |   IP ADDRESS  |  PORT | B |
    '---'---'---'---'---'---'---'---'
```
The byte `A` specifies the C2 type. If its value is `1`, then the C2 entry is an IPv4 address. A value of `2` indicates an IPv6 entry type `3` is some unknown 20 byte long entry. The end byte `B` is an identifier for an internal priority list. Since we are currently observing only IPv4 C2s in Qakbot, we will restrict to parsing out those. To reduce the risk that this pipeline extracts junk from future samples, we use the regular expression `(\x01.{7})+` to filter the C2 buffer down to only IPv4 type entries:

In [1]:
%%emit q.bot [[
        | perc RCDATA
        | rc4 sha1:bUdiuy81gYguty@4frdRdpfko(eKmudeuMncueaN
        | snip 20:
        | rc4 x::20
        | snip 20:
        | rex '(\x01.{7})+' ]
    | struct -m !xBBBBHx {1}.{2}.{3}.{4}:{5} [| sep ]
    | peek -d ]

------------------------------------------------------------------------------------------------------------------------
02.222 kB; 44.01% entropy; ASCII text
---------------------------------------------------------------------------------------------------------------[utf8]---
181.118.183.103:443
92.239.81.124:443
174.58.146.57:443
73.223.248.31:443
86.129.13.178:2222
47.34.30.133:443
89.216.114.179:443
41.44.11.227:995
66.180.227.170:2222
46.229.194.17:443
------------------------------------------------------------------------------------------------------------------------


The [struct][] command might need a little explanation here: With the `-m` switch, the unit will not parse a single struct, but multiple subsequent structs. Each struct is requested to have the format `!xBBBBHx`; the exclamation mark `!` [specifies big endian byte order][struct-byte-order-size-and-alignment] and the remaining characters extract a C2 record using [struct format characters][struct-format-characters]. The second (and optional) argument of [struct][] is a format string expression that can be used to format parsed data. In this case, the format expression `{1}.{2}.{3}.{4}:{5}` will print the four ocets of the parsed IPv4 address separated by dots, and add the port value separated by a colon.

[struct-byte-order-size-and-alignment]: https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment
[struct-format-characters]: https://docs.python.org/3/library/struct.html#format-characters
[struct]: https://binref.github.io/#refinery.struct

## Decrypting The Strings

The previous section explains the bottom part of the full pipeline, namely how to parse and format the configuration resource once it has been decrypted. In order to decrypt the resource, we need the key. The key is stored as an encrypted string, and so we'll have to decrypt those. After some reversing, you notice that the decryption of the string table follows the following opcode pattern:
```
  51                 PUSH  ECX
  51                 PUSH  ECX
  68 50 F0 0E 00     PUSH  TABLE_DECRYPTION_KEY
  BA B1 05 00 00     MOV   EDX, 0x5b1
  B9 D8 F0 0E 00     MOV   ECX, TABLE_ENCRYPTED_DATA
  E8 98 7F 00 00     CALL  _DECRYPT
  83 C4 0C           ADD   ESP, 0xc
  C3                 RET
```
This converts to the following [yara][]-esque pattern:
```
  51 68 [4] BA [2]0000 B9 [4] E8
```
The regular expression argument for the [rex][] unit has a special [yara handler][yara-handler] which can be useful to convert the above type of pattern into a regular expression (it saves you a bunch of `\x`); we put capture group parentheses around the address of the table decryption key, the size of the table, and the address of the encrypted table data:
```
  5168([4])BA([2]0000)B9([4])E8
```
We then run [rex][] as follows:
```
  rex yara:5168([4])BA([2]0000)B9([4])E8 {1}{3}{2}
```
This will search for a regular expression matching the above opcode pattern, and the format string `{1}{3}{2}` as the second argument means that each match found by [rex][] will lead to one output chunk that contains, in this order:
1. the bytes that constitute the table decryption key address
2. the bytes that constitute the table data address
3. the lower two bytes of the little-endian integer containing the table size

We can compose this unit with [struct][] to extract each of these values as an integer meta variable into the current chunk:

[yara]: https://yara.readthedocs.io/en/stable/index.html
[rex]: https://binref.github.io/#refinery.rex
[struct]: https://binref.github.io/#refinery.struct
[yara-handler]: https://binref.github.io/lib/argformats.html#refinery.lib.argformats.DelayedRegexpArgument.yara

In [1]:
%%emit q.bot [
    | rex yara:5168([4])BA([2]0000)B9([4])E8 {1}{3}{2}
    | struct {ka:L}{da:L}{dl:L}
    | peek -l0 ]

------------------------------------------------------------------------------------------------------------------------
      da = 0xEF0D8
      dl = 0x5B1
      ka = 0xEF050
  offset = 0x1081
------------------------------------------------------------------------------------------------------------------------
      da = 0xEF7A8
      dl = 0x1107
      ka = 0xEF720
  offset = 0xA9C8
------------------------------------------------------------------------------------------------------------------------


After having extracted those addresses, we would like to first extract the key, and then extract the table data to then decrypt it with the key. 

- First, we will use [put][] to store the input data in a meta variable called `backup`. 
- Then, we use the above pipeline to populate the variables that contain all the necessary addresses and sizes. 
- Next, we use [put][] again to populate a variable named `key` with the key bytes. This is done by using the [multibin expression][multibins] `vsnip[ka:128]:var:backup` which first uses the [var handler][var-handler] to extract the contents of `backup`, and then obtains the result of piping that data to the [vsnip][] unit with the argument `ka:128`, which will extract `128` bytes from the input executable, starting that the offset that corresponds to the virtual address `ka`. In this case, it will be the decryption key.
- We then [emit][] the contents of another [multibin expression][multibins], this time extracting the contents of the encrypted table.
- Finally, we use [xor][] to decrypt the string table with the key that is stored in the meta variable `key`.

The result will be two buffers of strings separated by null bytes. We use [resplit][] to split, and for the sake of not flooding this notebook, we only [pick][] the first 10 of the decrypted strings and have a [peek][]:

[emit]: https://binref.github.io/#refinery.emit
[put]: https://binref.github.io/#refinery.put
[pick]: https://binref.github.io/#refinery.pick
[peek]: https://binref.github.io/#refinery.peek
[resplit]: https://binref.github.io/#refinery.resplit
[xor]: https://binref.github.io/#refinery.xor
[vsnip]: https://binref.github.io/#refinery.vsnip
[multibins]: https://binref.github.io/lib/argformats.html
[var-handler]: https://binref.github.io/lib/argformats.html#refinery.lib.argformats.DelayedArgument.var

In [1]:
%%emit q.bot [
        | put backup [
            | rex yara:5168([4])BA([2]0000)B9([4])E8 {1}{3}{2}
            | struct {ka:L}{da:L}{dl:L}
            | put key vsnip[ka:128]:var:backup
            | emit vsnip[da:dl]:var:backup 
            | xor var:key ]
    | resplit h:00 
    | pick :10
    | peek -be ]

00.011 kB: ProgramData
00.072 kB: ERROR: GetModuleFileNameW() failed with error: ERROR_INSUFFICIENT_BUFFER
00.081 kB: schtasks.exe /Create /RU "NT AUTHORITY\\SYSTEM" /SC ONSTART /TN %u /TR "%s" /NP /F
00.011 kB: route print
00.033 kB: powershell.exe -encodedCommand %S
00.040 kB: bUdiuy81gYguty@4frdRdpfko(eKmudeuMncueaN
00.056 kB: SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\ProfileList
00.065 kB:  /c ping.exe -n 6 127.0.0.1 &  type "%s\\System32\\calc.exe" > "%s"
00.009 kB: net share
00.033 kB: nltest /domain_trusts /all_trusts


## Finding The Key

We are able to decrypt the strings, and we know that one of those strings is the decryption key. How to find the right one? Brute force, obviously. We will simply try to decrypt the configuration resource with every single string we find, and fortunately, the Qakbot authors provided us with a straightforward way to check whether a decryption result is valid: The first 20 bytes are a SHA1 checksum of the remaining contents. This will give us the final Qakbot config extraction pipeline. Have another look, you should already recognize some components:

In [1]:
%%emit q.bot [[
        | put backup [
            | rex yara:5168([4])BA([2]0000)B9([4])E8 {1}{3}{2}
            | struct {ka:L}{da:L}{dl:L}
            | put key vsnip[ka:128]:var:backup
            | emit vsnip[da:dl]:var:backup 
            | xor var:key ]
        | resplit h:00 
        | swap key
        | swap backup
        | perc RCDATA [| max size ]
        | rc4 sha1:var:key
        | put required x::20
        | put computed sha1:c:
        | iff required -eq computed
        | rc4 x::20 
        | snip 20:
        | rex '(\x01.{7})+' ]
    | struct -m !xBBBBHx {1}.{2}.{3}.{4}:{5} [| sep ]
    | peek -d ]

------------------------------------------------------------------------------------------------------------------------
02.222 kB; 44.01% entropy; ASCII text
---------------------------------------------------------------------------------------------------------------[utf8]---
181.118.183.103:443
92.239.81.124:443
174.58.146.57:443
73.223.248.31:443
86.129.13.178:2222
47.34.30.133:443
89.216.114.179:443
41.44.11.227:995
66.180.227.170:2222
46.229.194.17:443
------------------------------------------------------------------------------------------------------------------------


Now after extracting all the strings, each chunk in the frame contains a potential key. We use [swap][] to move this data into a variable called `key`. After this operation, the chunk body is empty. We use [swap][] again to move the contents of the previously populated `backup` variable back into the chunk body so we can use [perc][] and [max][] to extract the larger of the two configuration resources. Afterwards, we use [rc4][] to attempt decryption with the current `key`. We then compute two variables `required` and `computed`, one containing the first 20 bytes of the chunk and the other containing the SHA1 hash of the remaining bytes. The [iff][] unit allows us to filter out all chunks where the two values are not the same.

After this, the only chunk that should remain is the decrypted config blob, which we can parse out as before.

[swap]: https://binref.github.io/#refinery.swap
[perc]: https://binref.github.io/#refinery.perc
[max]: https://binref.github.io/#refinery.max
[iff]: https://binref.github.io/#refinery.iff
[rc4]: https://binref.github.io/#refinery.rc4