# The Refinery Files 0x0A: Layer Cake

There is a [Malcat blog post][src] discussing an infection chain from an equation editor exploit document to a Formbook payload.
It seemed like a very good candidate for a refinery tutorial.
I got to show a glimpse of this on [video][yyt], but this tutorial details how to get from the first stage to final payload using refinery.

[src]: https://malcat.fr/blog/exploit-steganography-and-delphi-unpacking-dbatloader/
[yyt]: https://www.youtube.com/live/-B072w0qjNk

In [1]:
import tutorials.boilerplate as bp

## Stage 1 - Exploit Document

We begin our journey with the following sample:

In [1]:
bp.store_sample(
    '13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05', 'eqn.doc')

Let's have a very first [peek][] at it, only displaying some metadata and no hex dump of the contents.

[peek]: https://binref.github.io/#refinery.peek

In [1]:
%emit eqn.doc | peek -mml0

------------------------------------------------------------------------------------------------------------------------
    crc32 = 36d72a79
  entropy = 99.55%
    magic = CDFV2 Encrypted
   sha256 = 13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05
     size = 00.271 MB
------------------------------------------------------------------------------------------------------------------------


This is an encrypted office document that can be decrypted using the [officecrypt][] unit:

[officecrypt]: https://binref.github.io/#refinery.officecrypt

In [1]:
%emit eqn.doc | officecrypt | peek

------------------------------------------------------------------------------------------------------------------------
00.265 MB; 97.92% entropy; Microsoft Excel 2007+
------------------------------------------------------------------------------------------------------------------------
00000: 50 4B 03 04 14 00 06 00 08 00 00 00 21 00 21 5D 2F 7E 2F 02 00 00 EE 09 00 00 13 00  PK..........!.!]/~/.........
0001C: E4 01 5B 43 6F 6E 74 65 6E 74 5F 54 79 70 65 73 5D 2E 78 6D 6C 20 A2 E0 01 28 A0 00  ..[Content_Types].xml....(..
00038: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00054: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
.....:                                     15 repetitions
00214: 00 C4 56 4D 6F DA 40 10 BD 57 EA 7F B0 7C 8D EC 05 2A 55 55 05 E4 10 92 53 D5 44 4A  ..VMo.@..W...|...*UU....S.DJ
00230: FA 03 96 DD 01 36 EC 57 77 16 02 FF BE 

The first impulse might be to inspect the contents using [xlxtr][] or look for VBA macros using the [xtvba][] unit,
but there's nothing there:

[xlxtr]: https://binref.github.io/#refinery.xlxtr
[xtvba]: https://binref.github.io/#refinery.xtvba

In [1]:
%emit eqn.doc | officecrypt | xtvba | peek -m

--------------------------------------------------------------------------------------------------------[empty chunk]---
  entropy = 00.00%
    magic = empty
     size = 00.000 kB
------------------------------------------------------------------------------------------------------------------------


In [1]:
%emit eqn.doc | officecrypt | xlxtr | peek -m

--------------------------------------------------------------------------------------------------------[empty chunk]---
  entropy = 00.00%
    magic = empty
     size = 00.000 kB
------------------------------------------------------------------------------------------------------------------------


Frustrated, we turn to simply extracting the document contents to look for something interesting.
The [xt][] unit aims to extract most archive formats that refinery can handle.
All archive extraction units follow a common interface;
The `--list` parameter (or `-l` for short) causes these units to list the paths of all items the unit is able to extract from the input:

[xt]: https://binref.github.io/#refinery.xt

In [1]:
%emit eqn.doc | officecrypt | xt -l

[Content_Types].xml
_rels/.rels
xl/diagrams/data1.xml
xl/_rels/workbook.xml.rels
xl/workbook.xml
xl/styles.xml
xl/media/image6.emf
xl/diagrams/colors1.xml
xl/diagrams/quickStyle1.xml
xl/diagrams/layout1.xml
xl/worksheets/sheet3.xml
xl/worksheets/sheet2.xml
xl/worksheets/_rels/sheet1.xml.rels
xl/worksheets/_rels/sheet2.xml.rels
xl/drawings/_rels/drawing1.xml.rels
xl/drawings/_rels/vmlDrawing2.vml.rels
xl/theme/theme1.xml
xl/media/image5.jpeg
xl/media/image4.png
xl/drawings/vmlDrawing1.vml
xl/embeddings/oleObject1.bin
xl/drawings/drawing1.xml
xl/worksheets/sheet1.xml
xl/drawings/vmlDrawing2.vml
xl/media/image1.png
xl/media/image3.png
xl/media/image2.png
xl/embeddings/Microsoft_Office_Word_Macro-Enabled_Document1.docm
xl/printerSettings/printerSettings2.bin
xl/printerSettings/printerSettings1.bin
docProps/core.xml
docProps/app.xml


The `oleObject1.bin` stands out among these files.
The [xt][] unit expects filename pattern expressions as positional arguments which specify what items to extract.
The pattern is matched with increasingly fuzzy logic against all available paths until either a match is found or until no match is found using full substring search.
For example, we can find `oleObject1.bin` by extracting any item matching `ole`:

[xt]: https://binref.github.io/#refinery.xt

In [1]:
%emit eqn.doc | officecrypt | xt ole -l

xl/embeddings/oleObject1.bin


And without the `-l` switch, the unit extracts the corresponding item from the archive:

In [1]:
%emit eqn.doc | officecrypt | xt ole | peek

------------------------------------------------------------------------------------------------------------------------
03.584 kB; 52.27% entropy; Composite Document File V2 Document, Cannot read section info
------------------------------------------------------------------------------------------------------------------------
00000: D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3E 00 03 00  ........................>...
0001C: FE FF 09 00 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00  ............................
00038: 00 10 00 00 02 00 00 00 01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 00 FF FF FF FF  ............................
00054: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  ............................
.....:                                     14 repetitions
001F8: FF FF FF FF FF FF FF FF FD FF FF FF FE FF FF FF FE FF FF FF 04 00 00 00 05 00 00 00  ............................
00214:

The result is another OLE object:

In [1]:
%emit eqn.doc | officecrypt | xt ole | xt -l

[1]Ole
[1]oLE10NATive


The pattern matching logic of [xt][] is case sensitive only if there are two paths among the extractible items that would conflict otherwise. In this case, we can extract `[1]oLE10NATive` simply by matching, e.g., `native`:

[xt]: https://binref.github.io/#refinery.xt

In [1]:
%emit eqn.doc | officecrypt | xt ole | xt -l native

[1]oLE10NATive


In [1]:
%emit eqn.doc | officecrypt | xt ole | xt native | peek

------------------------------------------------------------------------------------------------------------------------
01.215 kB; 94.42% entropy; data
------------------------------------------------------------------------------------------------------------------------
00000: 00 2F 1E 02 03 7E 01 EB 47 0A 01 05 75 63 A3 EC 00 00 00 00 00 00 00 00 00 00 00 00  ./...~..G...uc..............
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 50 06 45 00 00 00 00 00 00 00 00 00 00 00 00  .............P.E............
00038: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 29 C3 44 00 00 00 00 57 5F EB 75  .................).D....W_.u
00054: 81 C7 84 01 00 00 8D AF 6D 02 00 00 EB 57 EB 28 EB EE EB 1C 69 C0 C7 3C 00 7C EB 05  ........m....W.(....i..<.|..
00070: B4 73 38 66 E1 05 11 50 CA 57 EB 12 EB 3B EB E4 50 58 50 58 EB 31 EB DC EB 46 EB 70  .s8f...P.W...;..PXPX.1...F.p
0008C: EB 48 EB 71 31 07 9C 53 57 53 81 C3 BC 5F 00 00 81 C3 B7 44 00 00 81 EB 85 2F 00 00  .H.q1..SWS..._.....D...../..


This does look a lot like an exploit document and the bytes down there look like shellcode.
I wouldn't bother to recover the exact logic here since there is a quicker way to get what we are looking for:
The first few bytes of what might be the beginning of the shellcode are `57 5F EB 75`.
We can extract this data using the regular expression unit [rex][].
Feeding this data to the stack string extractor [vstack][] with a very liberal `--wait` parameter already produces some interesting results:

[rex]: https://binref.github.io/#refinery.rex
[vstack]: https://binref.github.io/#refinery.vstack
[yara]: https://yara.readthedocs.io/en/stable/writingrules.html#hexadecimal-strings

In [1]:
%emit eqn.doc | officecrypt | xt ole | xt native | rex 'W_.u.*' | vstack -w100 [| peek -N ]

---------------------------------------------------------------------------------------------------------------------
00.008 kB; 13.27% entropy; Cracklib password index, big endian ("64-bit")
---------------------------------------------------------------------------------------------------------------------
0D 02 00 00 00 00 00 00                                                                 ........                     
---------------------------------------------------------------------------------------------------------------------
00.624 kB; 72.65% entropy; data
---------------------------------------------------------------------------------------------------------------------
81 EC 2C 02 00 00 E8 12 00 00 00 6B 00 65 00 72 00 6E 00 65 00 6C 00 33 00 32 00 00 00  ..,........k.e.r.n.e.l.3.2...
E8 67 01 00 00 89 C3 E8 0D 00 00 00 4C 6F 61 64 4C 69 62 72 61 72 79 57 00 53 E8 C6 01  .g..........LoadLibraryW.S...
00 00 89 C7 E8 0F 00 00 00 47 65 74 50 72 6F 63 41 64 64 72 65 73 73

There is a URL visible at the end of the emulated memory dump, and we can extract it using [xtp][]:

[xtp]: https://binref.github.io/#refinery.xtp

In [1]:
%emit eqn.doc | officecrypt | xt ole | xt native | rex 'W_.u.*' | vstack -w100 [| xtp url | defang ]]

http[:]//104.168.32[.]50/009/vbc.exe


## Stage 2 - Delphi Downloader

The URL is already offline at the time of writing.
Luckily, we know the file that was served at the time when it was active: 

In [1]:
bp.store_sample(
    '3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d', 'vbc.exe')

Some strings and specifically the existence of a very characteristic Linker timestamp betray this sample as having been written in Delphi,
most likely compiled in October 2021:

In [1]:
%emit vbc.exe | pemeta -tT

TimeStamp.Linker : 1992-06-19 22:22:17
TimeStamp.Delphi : 2021-10-24 13:25:36
TimeStamp.RsrcTS : 2014-04-24 01:38:58


The binary unpacks an embedded payload in the function at address `0x46C8F8`.
It picks a small number of bits from each pixel's red, green, and blue channel value.
The number of bits that are taken are computed by the following formula from the first pixel, where `r`, `g`, and `b` represent that pixel's red, green, and blue values, respectively:

    (b % 4) + ((g % 2) * 4) + ((r % 2) * 8)

The [stego][] unit can be used to extract pixel color values from various image formats.
Since the malware is using Delphi's `GetScanline` function which reads the raw bytes from the bitmap,
we have to extract the color channels in reverse (i.e. blue, green, red):

[stego]: https://binref.github.io/#refinery.stego

In [1]:
%emit vbc.exe | perc BBTREX | stego BGR | snip :3 | pack -R

3
2
2


We can tell from the data that the number of bits that will be taken from each channel is `3`.
The [bitsnip][] unit allows us to extract the payload:

[bitsnip]: https://binref.github.io/#refinery.bitsnip

In [1]:
%emit vbc.exe | perc BBTREX | stego BGR | snip 3: | bitsnip :3 | peek -l4

------------------------------------------------------------------------------------------------------------------------
00.106 MB; 82.63% entropy; data
------------------------------------------------------------------------------------------------------------------------
00000: 00 66 01 00 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00  .f..MZ......................
0001C: 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...........................
00038: 00 00 00 00 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68  ....................!..L.!Th
00054: 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20  is.program.cannot.be.run.in.
------------------------------------------------------------------------------------------------------------------------


The output has a length prefix and we can use the [struct][] unit to extract the PE file.
In this case, [struct][] is instructed to first read a 32-bit integer named `n` via `{n:I}`, the `I` here is the Python struct symbol for reading a long integer.
Next, it is instructed to read `n` bytes, i.e. as many bytes as the prefix value indicates.
By default, [struct][] then emits the last byte string field that was parsed.

[struct]: https://binref.github.io/#refinery.struct

In [1]:
%emit vbc.exe | perc BBTREX | stego BGR | snip 3: | bitsnip :3 | struct {n:I}{:n} | peek -mm | dump ldr.exe

------------------------------------------------------------------------------------------------------------------------
    crc32 = b7be8c6d
  entropy = 81.48%
    magic = PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
   sha256 = e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40
     size = 91.648 kB
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 

Finally, if we want to create a pipeline that works more generically against other samples of this kind, we can also use [struct][] to parse out the first three color channels and compute the number of bits to extract programmatically:

[struct]: https://binref.github.io/#refinery.struct

In [1]:
%%emit vbc.exe
 | perc BBTREX
 | stego BGR
 | struct {b:B}{g:B}{r:B}{} [
   | bitsnip :(b%4)+((g%2)*4)+((r%2)*8) ]
 | struct {n:I}{:n}
 | peek -mml0

------------------------------------------------------------------------------------------------------------------------
    crc32 = b7be8c6d
  entropy = 81.48%
    magic = PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
   sha256 = e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40
     size = 91.648 kB
------------------------------------------------------------------------------------------------------------------------


As it turns out, the extracted payload is a downloader that reads its payload URL from the parent sample.
The URL can be found encoded between two occurrences of the magic string `^^Nc`. We can easily extract it using [rex][].
Since `^` is awkward to escape, I opted for a less restrictive regular expression which still works:

[rex]: https://binref.github.io/#refinery.rex

In [1]:
%emit vbc.exe | rex ..Nc(.*?)..Nc {1:add[7]:defang}

https[:]//cdn.discordapp[.]com/attachments/902132472924479511/902136733435592744/Wbjhzkbevojgqfhfalbqxnykvunmobi




Here, we also make use of [rex][]'s power full formatter: 
The second argument `{1:add[7]:defang}` instructs [rex][] to compute its output in the following way:

- take the first match group: `{1`
- [add][] `7` to every byte value: `{1:add[7]`
- [defang][] the result: `{1:add[7]:defang}`

The suffixes that are supported here are the same as the prefixes supported by all [multibin][] expressions,
except that they are applied left to right instead of right to left.

[rex]: https://binref.github.io/#refinery.rex
[add]: https://binref.github.io/#refinery.add
[defang]: https://binref.github.io/#refinery.defang
[multibin]: https://binref.github.io/lib/argformats.html

## Stage 3 - DBatLoader

The file from the malicious link is no longer available, but here's what it served at the time when it was active:

In [1]:
bp.store_sample(
    'bb41df67b503fef9bfd8f74757adcc50137365fbc25b92933573a64c7d419c1b', 'obi.bin')

The function to decode the payload is at `0x413b14` in `ldr.exe` and gets called with the hard-coded key value `328`.
When substituting this key value in the decoder function, a simple expression for the decoding operation can be deduced:
The encoded byte in each block is subjected to an affine linear transformation; multiplied by `0x81F6` and then increased by `0xF3C7`.
The high 8 bits of this 16-bit operation yield the XOR key for the next byte.
Finally, the resulting byte array is reversed.
This sort of simple encoding is best reverted using the [alu][] unit, and [rev][] to reverse the order of bytes:

[alu]: https://binref.github.io/#refinery.alu
[rev]: https://binref.github.io/#refinery.rev

In [1]:
%emit obi.bin | alu B@S -P2 -s64 -e=R(E*0x81F6+0xF3C7,8) | rev | peek | dump yak.{ext}

------------------------------------------------------------------------------------------------------------------------
00.276 MB; 94.63% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 00 01 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

In [1]:
%ls

00.271 MB 13063a496da7e490f35ebb4f24a138db4551d48a1d82c0c876906a03b8e83e05 eqn.doc
00.959 MB 3045902d7104e67ca88ca54360d9ef5bfe5bec8b575580bc28205ca67eeba96d vbc.exe
91.648 kB e232e1cd61ca125fbb698cb32222a097216c83f16fe96e8ea7a8b03b00fe3e40 ldr.exe
00.276 MB bb41df67b503fef9bfd8f74757adcc50137365fbc25b92933573a64c7d419c1b obi.bin
00.276 MB f8fc925d89baa140c9cb436f158ec91209789e9f8e82a0b7252f05587ce8e06f yak.dll


The reason we named the payload `yak.dll` is because of its characteristic PE resource named `YAK`:

In [1]:
%emit yak.dll | perc -l

RCDATA/YAK/0


This resource is a known artifact of the DBatLoader malware and contains the encoded payload.
The following is a refinery pipeline to unpack it; we will discuss the details below:

In [1]:
%%emit yak.dll
 | perc
 | alu '[B,((B+14)%94)+33][32<B<127]'
 | rex '(.*?)\d{3,}\1(.*)' {1} {2} [
    | pop sep
    | resplit rx:v:sep
    | pop key
    | max size
    | put k len(key)
    | alu 7+B@N@A@k v:key ]
 | rev
 | alu '[B,((B+14)%94)+33][32<B<127]'
 | pemeta -t

Header.Machine    : I386
Header.Subsystem  : Windows GUI
Header.MinimumOS  : Windows XP
Header.RICH[0x0]  : [00ab9d1b] 76 STDLIB Visual Studio 2010 10.10 SP1
Header.RICH[0x1]  : [009e9d1b] 03   MASM Visual Studio 2010 10.10 SP1
Header.RICH[0x2]  : [009d9d1b] 01 LINKER Visual Studio 2010 10.10 SP1
Header.Type       : EXE
Header.ImageBase  : 0x00400000
Header.ImageSize  : 167707
Header.Bits       : 32
Header.EntryPoint : 0x00429000
TimeStamp.Linker  : 2010-01-13 06:39:15


The first decoding step is to to mutate every byte `B` in the printable range as `(B+14)%94)+33`.
We use the [alu][] unit for this with the following expression:

    [B,((B+14)%94)+33][32<B<127]

This is a list with two elements being accessed at a bool, so it will return `(B+14)%94)+33` whenever `32<B<127` (i.e. when `B` is printable) and `B` otherwise. The resulting data is separated by a 36-byte delimiter, and the first field is an unknown large number `7826546`. We assume here that the first field will always be a decimal integer and based on this assumption, we devise the following regular expression to determine the string separator:

    (.*?)\d{3,}\1(.*)

The first match group here should contain the separator, followed by a sequence of at least 3 decimal numbers, followed by the same separator string. We let [rex][] emit the separator string as the first output, followed by the remaining data.

Within the newly opened frame, we use [pop][] to transfer the separator string (which is the first chunk in the frame) into a variable named `sep`. We then use [resplit][] to split the remaining data at this separator. Here, the multibin expression `v:sep` represents the contents of the variable `sep`, and the `rx:` handler performs regular-expression escaping so that we split at the verbatim string stored in `sep`. From the remaining fields, all we need to know is that the first one is the decryption key and the largest field is the payload, so we can [pop][] the key, use the [max][] unit to filter down to only the largest chunk, and then decrypt the payload. Decryption works by XOR-ing the data with the key, but also with the length of the data, and the length of the key, and eventually adding the value `7`.

After that is done, we have to reverse the payload one more time and apply DBatLoader's custom decoding scheme, and out comes the payload.

[alu]: https://binref.github.io/#refinery.alu
[rex]: https://binref.github.io/#refinery.rex
[pop]: https://binref.github.io/#refinery.pop
[max]: https://binref.github.io/#refinery.max
[resplit]: https://binref.github.io/#refinery.resplit