# The Refinery Files 0x01: NetWalker Dropper

This is the first tutorial on how to use the [binary refinery][refinery] (or binref for short). It is a command-line toolkit inspired by [cyberchef][], where Unix-style [pipelines][pipeline] are used to combine various transformations. The intended use case is malware triage and analysis. We will be looking at the file with the following SHA-256 hash:
```
ccd495bae43f026e05f00ebc74f989d5657e010854ce4d8870e7b9371b0222b9
```
Spoiler Alert: It contains a NetWalker Ransomware sample. This is **malware**, do not execute it unless you know exactly what you are doing.

[refinery]: https://github.com/binref/refinery/
[cyberchef]: https://github.com/gchq/CyberChef
[pipeline]: https://en.wikipedia.org/wiki/Pipeline_(Unix)

## Installation

You can setup binary refinery like this in a temporary virtual environment:
```
$ python3 -m venv br
$ source ./br/bin/activate
(br) $ pip3 install -U git+git://github.com/binref/refinery.git
 ... PIP MAKES WAR UPON THE FORCES OF DEPENDENCY HELL ...
(br) $ 
```
This Jupyter notebook uses [dark magic](boilerplate.py) to simulate working in a directory with a single file named `nl.ps1`, which has the SHA-256 hash mentioned above. Running this Jupyter notebook locally will cache all files in memory, and none of the malware samples will actually be written to your hard drive. The reason this file is a Jupyter notebook is primarily so that it can be re-run, making sure that the output of the below refinery commands accurately reflects what you would see when using the most recent version of the toolkit.

In [1]:
from tutorials import boilerplate
boilerplate.store_sample('ccd495bae43f026e05f00ebc74f989d5657e010854ce4d8870e7b9371b0222b9', 'nl.ps1')

In [1]:
%ls

00.926 MB ccd495bae43f026e05f00ebc74f989d5657e010854ce4d8870e7b9371b0222b9 nl.ps1


## Extracting The Payload

Our guest today is a PowerShell sample. A brief look into the file reveals that it contains large buffers encoded as arrays of hexadecimal integers, likely byte values. Because we assume that these buffers contain some sort of payload, we'll go ahead and use the [carve][] unit to get them out. The main documentation of refinery units is in their `-h` or `--help` output on the command line. The [carve][] unit has a lot of options, but we will only use two:
```
carve -s intarray
```
The flag `-s` is a shorthand for `--single` which instructs the unit to carve only the largest buffer it can find. The only required argument is the word `intarray`, which denotes the format that we want to carve. The `intarray` format represents a pattern for arrays of integers. We will pipe the result of this operation to the [peek][] unit, which gives us a brief preview of what was extracted. We use the `-d` (aka `--decode`) switch for [peek][] because the result should be plaintext:

[peek]: https://binref.github.io/#refinery.peek
[carve]: https://binref.github.io/#refinery.carve

In [1]:
%emit nl.ps1 | carve -s intarray | peek -dd

------------------------------------------------------------------------------------------------------------------------
00.594 MB; 40.34% entropy; ASCII text, with very long lines, with no line terminators
---------------------------------------------------------------------------------------------------------------[utf8]---
0xfd,0xea,0x20,0xb0,0xb3,0xb0,0xb0,0xb0,0xb4,0xb0,0xb0,0xb0,0x4f,0x4f,0xb0,0xb0,0x08,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,
0xf0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,
0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0x70,0xb0,0xb0,0xb0,0xbe,0xaf,0x0a,0xbe,0xb0,0x04,0xb9,0x7d,
0x91,0x08,0xb1,0xfc,0x7d,0x91,0xe4,0xd8,0xd9,0xc3,0x90,0xc0,0xc2,0xdf,0xd7,0xc2,0xd1,0xdd,0x90,0xd3,0xd1,0xde,0xde,0xdf,
0xc4,0x90,0xd2,0xd5,0x90,0xc2,0xc5,0xde,0x90,0xd9,0xde,0x90,0xf4,0xff,0xe3,0x90,0xdd,0xdf,0xd4,0xd5,0x9e,0xbd,0xbd,0xba,
0x94,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0xb0,0x

Alright, this looks exactly like the buffer we are interested in. Let us now decode this. The unit to turn textual representations of integers to bytes is called [pack][]. I will no longer use the `-d` switch for [peek][] because I don't expect the result to be printable any more:

[pack]: https://binref.github.io/#refinery.pack
[peek]: https://binref.github.io/#refinery.peek

In [1]:
%emit nl.ps1 | carve -s intarray | pack | peek

------------------------------------------------------------------------------------------------------------------------
00.119 MB; 80.32% entropy; data
------------------------------------------------------------------------------------------------------------------------
00000: FD EA 20 B0 B3 B0 B0 B0 B4 B0 B0 B0 4F 4F B0 B0 08 B0 B0 B0 B0 B0 B0 B0 F0 B0 B0 B0  ............OO..............
0001C: B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0  ............................
00038: B0 B0 B0 B0 70 B0 B0 B0 BE AF 0A BE B0 04 B9 7D 91 08 B1 FC 7D 91 E4 D8 D9 C3 90 C0  ....p..........}....}.......
00054: C2 DF D7 C2 D1 DD 90 D3 D1 DE DE DF C4 90 D2 D5 90 C2 C5 DE 90 D9 DE 90 F4 FF E3 90  ............................
00070: DD DF D4 D5 9E BD BD BA 94 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0  ............................
0008C: B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0 B0  ............................


The trained malware analyst easily spots the repeated byte `0xB0` and suspects a single-byte XOR encryption. To XOR the entire extracted buffer with `0xB0`, we use, well, the [xor][] unit:

[peek]: https://binref.github.io/#refinery.peek
[xor]: https://binref.github.io/#refinery.xor

In [1]:
%emit nl.ps1 | carve -s intarray | pack | xor 0xB0 | peek

------------------------------------------------------------------------------------------------------------------------
00.119 MB; 80.32% entropy; PE32+ executable (DLL) (GUI) x86-64, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 C0 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0

Alright, this looks like the payload, and we can now [dump][] it to disk:

[dump]: https://binref.github.io/#refinery.dump

In [1]:
%emit nl.ps1 | carve -s intarray | pack | xor 0xB0 | dump payload.dll

In [1]:
%ls

00.926 MB ccd495bae43f026e05f00ebc74f989d5657e010854ce4d8870e7b9371b0222b9 nl.ps1
00.119 MB 419ab9eaa1c64eed1d6d005ebc0c30bdc4e949ea7ee2cfee5dd34e6b3915bc02 payload.dll


## Extracting The Other Payload

Looking at the extracted executable, it's only 119kB in size. The hexadecimal encoding in the PowerShell script constitutes a blowup by 5: The byte `0` becomes `0x00` plus a comma character. However, even considering that, we only get 594kB even though the loader script is 926kB in size. What of the other 331kB? Scrolling though the file it becomes obvious that there is another buffer in there.

In this section, we will use the framing syntax to extract both buffers. All refinery units can, in principle, produce multiple outputs for one given input. By default, multiple outputs are separated by line break characters. For example, we can use [carve][] with the `printable` format option to extract all printable strings from the payload. I use the options `--min` and `--max` to only return strings of length at least `20` and at most `100`:

[carve]: https://binref.github.io/#refinery.carve

In [1]:
%emit payload.dll | carve --min=20 --max=100 printable

!This program cannot be run in DOS mode.
$
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
 !"#$%&'()*+,-./0123
expand 32-byte kexpand 16-byte k
Launcher.SystemSettings


The [carve][] unit has carved 5 printable substrings and they were separated by line breaks in the output. Since we have used the `-s` switch to carve the largest payload, the [carve][] unit only had a single output in our previous example. However, it can easily be used to extract the two longest matching patterns by specifying the arguments `--longest` and `--take=2`, or `-lt2` for short. We are certainly not interested in having those buffers be printed to the command line, and we do not want them separated by line breaks for any other reason either. We would like to do additional processing on each one of them individually. Let's find out how to do that.

By adding the symbol `[` as the last argument to a refinery unit, you instruct all subsequent refinery units to work on each of the outputs individually and in sequence. Such a stream of multiple items is called a **frame**, and the items themselves are referred to as **chunks**. Internally, this simply means that when a unit receives the `[` argument, the output is serialized in a refinery-specific format so that subsequent units can understand it as a stream of multiple outputs rather than just a single blob. The last unit that performs processing inside the frame should receive the symbol `]` as its last argument: This instructs the unit to concatenate all chunks. When chunks are merged at the end of a frame, no line breaks or other separators are inserted. See also the [module documentation for the frame module][frame].

The following example reads our sample, then [carve][]s the two largest integer array buffers from it, converts this to binary, and then [peek][]s the results:

[carve]: https://binref.github.io/#refinery.carve
[peek]: https://binref.github.io/#refinery.peek
[frame]: https://binref.github.io/lib/frame.html

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | peek ]

------------------------------------------------------------------------------------------------------------------------
60.928 kB; 83.07% entropy; data
------------------------------------------------------------------------------------------------------------------------
00000: 0A 1D D7 47 44 47 47 47 43 47 47 47 B8 B8 47 47 FF 47 47 47 47 47 47 47 07 47 47 47  ...GDGGGCGGG..GG.GGGGGGG.GGG
0001C: 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47  GGGGGGGGGGGGGGGGGGGGGGGGGGGG
00038: 47 47 47 47 FF 47 47 47 49 58 FD 49 47 F3 4E 8A 66 FF 46 0B 8A 66 13 2F 2E 34 67 37  GGGG.GGGIX.IG.N.f.F..f./.4g7
00054: 35 28 20 35 26 2A 67 24 26 29 29 28 33 67 25 22 67 35 32 29 67 2E 29 67 03 08 14 67  5(.5&*g$&))(3g%"g52)g.)g...g
00070: 2A 28 23 22 69 4A 4A 4D 63 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47  *(#"iJJMcGGGGGGGGGGGGGGGGGGG
0008C: 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47  GGGGGGGGGGGGGGGGGGGGGGGGGGGG


We have a bit of a problem here: It looks like different keys were used to encrypt the two payloads. The first buffer was encrypted using the byte `0x47` while the second one (the one we already saw before) was encrypted using `0xB0`. There are several ways to do this in transit, but we will get to that later. For now, let's just [dump][] the two buffers to disk and deal with them individually:

[dump]: https://binref.github.io/#refinery.dump

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | dump encrypted-0x47.bin encrypted-0xB0.bin ]

In [1]:
%ls

00.926 MB ccd495bae43f026e05f00ebc74f989d5657e010854ce4d8870e7b9371b0222b9 nl.ps1
00.119 MB 419ab9eaa1c64eed1d6d005ebc0c30bdc4e949ea7ee2cfee5dd34e6b3915bc02 payload.dll
60.928 kB 120101d5f020c8810074fc65aa2b75c237b3535d16a220e52af108dba9f40f85 encrypted-0x47.bin
00.119 MB 285709f0c66b0d33154bcad6d8e43860dde7bcc63945fc53aeca1cb76d71b18d encrypted-0xB0.bin


In [1]:
%emit encrypted-0x47.bin | xor 0x47 | dump payload1.dll

In [1]:
%emit encrypted-0xB0.bin | xor 0xB0 | dump payload2.dll

A quick sanity check to make sure we used the right keys:

In [1]:
%emit payload1.dll payload2.dll [| peek -ml0 ]

------------------------------------------------------------------------------------------------------------------------
  entropy = 83.07%
    magic = PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
     size = 60.928 kB
------------------------------------------------------------------------------------------------------------------------
  entropy = 80.32%
    magic = PE32+ executable (DLL) (GUI) x86-64, for MS Windows
     size = 00.119 MB
------------------------------------------------------------------------------------------------------------------------


The [emit][] unit emits one chunk for each file that it reads from disk, in this case it will read the two DLL files and produce two chunks. We used `--lines=0` aka `-l0` options of [peek][] to get only a brief summary of their metadata to check that they decrypted to valid PE files. Only a few hours of reverse engineering later, you will be able to confirm your suspicion that `payload1.dll` is the 32bit variant of `payload2.dll`. The loader will deploy one or the other depending on the system architecture.

[emit]: https://binref.github.io/#refinery.mit
[peek]: https://binref.github.io/#refinery.peek

## Extracting Both Payloads

We will now show how to decrypt the two payloads in transit, i.e. without temporarily writing the encrypted buffers to disk. This is a great opportunity to illustrate a powerful feature of refinery. As a spoiler, here's a way to decrypt the two buffers without dumping them to disk:

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | xor copy:3 | peek ]

------------------------------------------------------------------------------------------------------------------------
60.928 kB; 83.07% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

As you can see, we are passing the argument `copy:3` to the [xor][] unit. It does work, but we'll have to dig a little deeper to understand what is going on and why it works.

[xor]: https://binref.github.io/#refinery.xor

### Multibin Arguments

In previous examples, we have called [xor][] with the arguments `0x47` and `0xB0`, which refinery interpreted as an integer representing a byte value with which to xor every byte in the input stream. However, you could also write any of the following:

- `xor h:dff4f503bb` - xor with the hexadecimal encoded byte sequence `DFF4F503BB`
- `xor s:terrordome` - xor with the utf8-encoded string `terrordome`
- `xor 7,3,12,120,8` - xor with the given sequence of values, i.e. `07030C7808` in hexadecimal

These are all examples of so-called **multibin** arguments. A multibin argument starts with a number of **handlers**. A **handler** is a short identifier separated from the rest of the expression by a colon. In the above examples, `h` (for **hex**) and `s` (for **string**) are handlers. Most handlers will process the remaining expression as a multibin again, but both `h` and `s` are **final** handlers, which means that the remaining expression will not be parsed any further. This gives you two very certain ways to pass data to a refinery unit in case you are uncertain about potential multibin parsing:
```
(br) $ emit h:
(br) $ emit s:h:
h:
(br) $ emit h:s:
usage: emit [-h] [-L] [-Q] [-0] [-v] [data [data ...]]
emit: error: argument data: invalid multibin value: 'h:s:'
```
The first emits the empty hexadecimal string (which is empty), the second emits the utf8-string `h:`, and the third example tries to emit the hexadecminal string `s:`, which is nonsense, because neither `s` nor `:` are hexadecimal characters. We get a well-deserved error. When no handlers are given, a multibin value is evaluated based on its default handler:

- Most units use the standard default handler: It first attempts to interpret the given argument as a file name and will use the contents of that file if it exists. If that fails, it will encode the string to a byte sequence using UTF8.
- Arithmetic and bitwise block operations (like [xor][], [sub][], [add][], [shr][], [shl][], [rotr][], [rotl][], [neg][]) will attempt to interpret the given argument as an Python expression representing an integer or a sequence of integers. Only when this fails, they revert to the standard default handler.
- The regular expression units [rex][], [resub][], and [resplit][] do not try to open any files, and they also provide a few additional handlers.

The module documentation of the [argformats][] module contains all handlers and documents their purpose.

[argformats]: https://binref.github.io/lib/argformats.html

[emit]: https://binref.github.io/#refinery.emit
[rotl]: https://binref.github.io/#refinery.rotl
[rotr]: https://binref.github.io/#refinery.rotr
[rex]: https://binref.github.io/#refinery.rex
[shl]: https://binref.github.io/#refinery.shl
[shr]: https://binref.github.io/#refinery.shr
[sub]: https://binref.github.io/#refinery.sub
[add]: https://binref.github.io/#refinery.add
[xor]: https://binref.github.io/#refinery.xor
[neg]: https://binref.github.io/#refinery.neg
[drp]: https://binref.github.io/#refinery.drp
[resplit]: https://binref.github.io/#refinery.resplit
[resub]: https://binref.github.io/#refinery.resub

### The Copy Handler

In our example, we used the `copy` handler for the argument to [xor][]. This handler is final, just like `s` and `h`. It also has the short version `c`, so you could just as well write the following to decrypt both buffers:

[xor]: https://binref.github.io/#refinery.xor

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | xor c:3 | peek ]

------------------------------------------------------------------------------------------------------------------------
60.928 kB; 83.07% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

The `copy` handler is final. It parses the remaining expression as `[offset]:[length]:[step]` with each part optional. This is similar to a Python slice expression with the difference that the second part is a length instead of marking the end of the slice. The value of `copy:0x112:45` will then be a copy of the `45` bytes following the offset `0x112` in the input data. In our example, we simply want to use the **fourth byte of the input** (i.e. the one at index `3`) to be used as the XOR key. Just to demonstrate, we could equally well have copied bytes 5, 6, and 7 (all of which decrypt to zero bytes):

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | xor c:5:3 | peek ]

------------------------------------------------------------------------------------------------------------------------
60.928 kB; 83.07% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Here, `5:3` is the slice starting at index `5` and ranging over `3` bytes.

### Unit Handlers

Every available refinery unit can also be used as a handler. Using `copy:3` as the decryption key does work very well, but we can more succinctly express the heuristic that we used. The [drp][] unit finds and detects frequently repeating patterns in its input data. Hence, if you suspect a single byte XOR to have been used on a buffer that contains a lot of zero bytes (like a PE file), the following will work:

[drp]: https://binref.github.io/#refinery.drp

In [1]:
%emit nl.ps1 | carve -lt2 intarray [| pack | xor drp:c::100 | peek ]

------------------------------------------------------------------------------------------------------------------------
60.928 kB; 83.07% entropy; PE32 executable (DLL) (GUI) Intel 80386, for MS Windows
------------------------------------------------------------------------------------------------------------------------
00000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 00 40 00 00 00  MZ......................@...
0001C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ............................
00038: 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70  ................!..L.!This.p
00054: 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  rogram.cannot.be.run.in.DOS.
00070: 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  mode....$...................
0008C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

The argument to [xor][] now first copies the first 100 bytes from the input using `c::100`. These bytes are passed to the [drp][] unit, which will extract the most frequent repeating byte pattern from it. In our example, the patterns are just single bytes, but this method can also work for longer XOR keys.

[xor]: https://binref.github.io/#refinery.xor
[drp]: https://binref.github.io/#refinery.drp

## Extracting The Configuration

The Netwalker configuration is stored as an RC4 encrypted buffer in a resource called `31337`, which is usually the only PE resource of the file. The buffer starts with a 32bit integer specifying the key length, followed by the key, followed by the encrypted data:

In [1]:
%emit nl.ps1 | carve -ds intarray | xor c:3 | perc | peek -l4

------------------------------------------------------------------------------------------------------------------------
05.434 kB; 99.57% entropy; data
------------------------------------------------------------------------------------------------------------------------
00000: 05 00 00 00 73 23 44 6F 38 8D 3E 4C 31 50 31 BE 51 16 7B 33 81 7A 34 2F 77 50 44 6F  ....s#Do8.>L1P1.Q.{3.z4/wPDo
0001C: 8B DB 55 0A 1D BC F4 5D 23 C6 E1 26 D4 FB FF FD 0D E1 34 4F 08 F5 2C A1 2D C4 7C 04  ..U....]#..&......4O..,.-.|.
00038: D4 BC 70 BB 47 CA 6C 2D E5 3A 45 B6 92 52 74 85 58 69 52 CB 9E 70 C2 26 32 0D 5A 0C  ..p.G.l-.:E..Rt.XiR..p.&2.Z.
00054: 0A D6 65 1F 8E 87 90 77 5E 4A C8 AA EA 56 FD A4 94 FF BB 9F 16 83 4B A7 16 33 00 9E  ..e....w^J...V........K..3..
------------------------------------------------------------------------------------------------------------------------


The decrypted configuration is in JSON format. The following is how we can extract the Netwalker configuration from this dropper without ever writing a single intermediate result to disk:

In [1]:
%emit nl.ps1 [| carve -ds intarray | xor c:3 | perc | put k le:x::4 | rc4 x::k ]| ppjson | peek -d

------------------------------------------------------------------------------------------------------------------------
08.867 kB; 55.91% entropy; ASCII text, with very long lines
---------------------------------------------------------------------------------------------------------------[utf8]---
{
    "mpk": "kzo1XdPfYBYrIPNqwr7YxsVS2rzbhlHusvwLlbNVowc=",
    "mode": 0,
    "spsz": 4,
    "thr": 1500,
    "namesz": 8,
    "idsz": 6,
    "pers": false,
    "onion1": "pb36hu4spl6cyjdfhing7h3pw6dhpk32ifemawkujj4gp33ejzdq3did.onion",
    "onion2": "rnfdsgm6wb6j6su5txkekw4u4y47kp2eatvu7d6xhyn5cs4lt4pdrqqd.onion",
------------------------------------------------------------------------------------------------------------------------


A lot of this already makes sense to us, but a few new things are happening, too. Firstly, we have used the `-d` (short for `--decode`) flag of [carve][]. For most patterns, there is an obvious decoding algorithm, and [carve][] can apply this decoding automatically. In the case of the `intarray` format, the [pack][] unit is invoked. After decrypting the payload, we use the [perc][] unit to extract all PE resources. We can use the `--list` option to get a list of all PE resources in the buffer:

[perc]: https://binref.github.io/#refinery.perc
[pack]: https://binref.github.io/#refinery.pack
[carve]: https://binref.github.io/#refinery.carve

In [1]:
%emit nl.ps1 | carve -ds intarray | xor c:3 | perc -l

1337/31337/0


Since there is only one, we can simply continue processing the complete output of [perc][]. In general, [perc][] can be given a wildcard expression to select only the resources you are interested in, and [perc][] will then extract each of those as one output chunk. The next unit is where it gets interesting. We run the [put][] unit with the parameters `k` and `le:x::4`, and then we process the result using the [rc4][] unit with the argument `x::k`. You may have already guessed it, `k` is a variable containing the length of the RC4 key.

Chunks in a refinery frame can carry a dictionary of metadata, also referred to as **meta variables**. As usual, it is recommended to also read [the official documentation about meta variables][meta]. There are a few units that can generate meta variables, and [put][] is likely the most straightforward way to do so. The [put][] unit takes as its first argument the name of the variable and as its second argument some multibin expression to store in that variable. In this case, we store `le:x::4`, which cuts out the first 4 bytes and decodes them to an integer using little-endian encoding (that's what the `le` handler does). From this point on, the variable `k` is available in the frame and can be used as part of multibin expressions. The [peek][] unit displays the contents of all meta variables that are present on a chunk; in this case there are two variables:

[put]: https://binref.github.io/#refinery.put
[rc4]: https://binref.github.io/#refinery.rc4
[perc]: https://binref.github.io/#refinery.perc
[peek]: https://binref.github.io/#refinery.peek

[meta]: https://binref.github.io/lib/meta.html

In [1]:
%emit nl.ps1 | carve -ds intarray | xor c:3 | perc [| put k le:x::4 | peek -l0 ]

------------------------------------------------------------------------------------------------------------------------
       k = 5
    lcid = Neutral Locale Language
  offset = 0x1B858
    path = 1337/31337/0
------------------------------------------------------------------------------------------------------------------------


Here you can see that the [perc][] unit has also attached a piece of metadata to the chunk, namely the path of the resource that it extracted. 

[perc]: https://binref.github.io/#refinery.perc

## Conclusion

Congratulations, you made it! This tutorial has introduced **framing syntax**, **multibin handlers**, and **meta variables**, and these are all the core concepts of binary refinery toolkit. In combination, they can perform a fairly broad range of data transformations. Future tutorials will focus on extending the binary refinery with custom units and using refinery units within Python code. Stay tuned!