Skip to content
Go to file


Failed to load latest commit information.
Latest commit message
Commit time
Jan 10, 2021
Jan 10, 2021
Jan 10, 2021
Jan 10, 2021
Jan 9, 2021

EMOTET: a State-Machine reversing exercise


Around the 20th of December 2020, there was one of the "usual" EMOTET email campaign hitting several countries. I had the possibility to get some sample and I decided to make this little analysis, to deep dive some specific aspects of the malware itself.

In particular I had a look to how the malware has been written, with an analysis of the interesting techniques used.

There is a very good analysis done by Fortinet in 2019, where the also the first stage has been analyzed. My exercise is more focused on the second stage on a recent sample.

In this repository you will find all the DLLs, scripts and tools used for the analysis, with the annotated Ghidra project file, with all the mapping to my findings (API calls, program logic, etc). You can use this as starting point for additional investigation on it. Enjoy ;-)

The Tools

The infection chain

EMOTET is usually spread by using e-mail campaign (in this case in Italian language)


This particular sample is coming from what we can call the usual infection chain:

  1. delivery of an e-mail with a malicious zipped document
  2. once opened, the document runs an obfuscated powershell script and downloads the 2nd stage
  3. the 2nd stage (in form of a DLL) is then executed
  4. the 2nd stage establish some persistence and try to connect a C2

The initial triage

All the files used for this analysis are in the repository. The "dangerous" ones are password protected (with the usual pwd).

The DLL (sg.dll) has the following characteristics:

File Name: sg.dll    
Size:      340480 
SHA1:      b08e07b1d91f8724381e765d695601ea785d8276

This DLL exports a single function named RunDLL: once executed, it decrypts "in-memory" an additional DLL. This one, dumped as dump_1_0418.bin, is the target of my analysis:

File Name:  dump_1_0418.bin
Size:       122880
SHA1:       57cd8eac09714effa7b6f70b34039bbace4a3e23


An initial overview of the dumped DLL, shows immediately that we don't have any string visible in it, no imports and a first look to the disassembly shows a heavily obfuscated code. We need to do some work here.

I fired up Ghidra and started to snoop around. Starting from the only exported function RunDLL you quickly end up to FUN_10009716 where you can spot a main loop with a kind of "State-Machine":


It looks like that a given double-word (stored in ECX) is controlling what the program is doing. But this looks convoluted and not very easy to unroll, since nothing is really in clear. For example, if you try to isolate the library API call in x64dbg, you will face something like this:


Every single API call is done in this way: there is a bunch of MOV, XOR, SHIFT and PUSH followed by a call to xxx606F (first red box), which decode in EAX the address of the function (called by the second red box). The number of PUSH just before the CALL EAX are the parameters, which could be worth to inspect.

The same "state" approach is also used in several sub-functions, not only in the main loop. So, everything looks time consuming, and I'd like to find a way to get the high level picture of it.


This tool is a little gem: Speakeasy can emulate the execution of user and kernel mode malware, allowing you to interact with the emulated code by using quick Python scripts. What I'd like to do was to map every single state of the machine (ECX value of the main loop), to something more meaningful, like DLL API calls.

I had to work a bit to get what I wanted:

  • the emulation was failing in more than one point, with some invalid read. I investigated a bit the reason, and I saw that sometimes the CALL EAX done in some location was not valid (EAX set to 0). I decided to get the easy way and just skip these calls
  • I had to modify the call to a specific API (CryptStringToBinary)
  • I mapped the machine state
  • added a --state switch to control the flow of the emulation. You can use it to explore all the states (ex. --state 0x167196bc). You may encounter errors if needed parts are not initialized, but you can reconstruct the proper flow by looking at the Ghidra decompilation
  • in a second iteration, knowing where strings are decrypted, I added a dump of all the strings in clear (see following sections)

Then the execution of the final script (python -f sg.dll) gave me something very interesting. The list of the imported DLLs (with related addresses):

0x10017a4c: 'kernel32.LoadLibraryW("advapi32.dll")' -> 0x78000000
0x10017a4c: 'kernel32.LoadLibraryW("crypt32.dll")' -> 0x58000000
0x10017a4c: 'kernel32.LoadLibraryW("shell32.dll")' -> 0x69000000
0x10017a4c: 'kernel32.LoadLibraryW("shlwapi.dll")' -> 0x67000000
0x10017a4c: 'kernel32.LoadLibraryW("urlmon.dll")' -> 0x54500000
0x10017a4c: 'kernel32.LoadLibraryW("userenv.dll")' -> 0x76500000
0x10017a4c: 'kernel32.LoadLibraryW("wininet.dll")' -> 0x7bc00000
0x10017a4c: 'kernel32.LoadLibraryW("wtsapi32.dll")' -> 0x63000000

and a lot of API calls, mapped to the machine state:

[+] State: 1de2d3e5
0x10010ba0: 'kernel32.GetProcessHeap()' -> 0x7280
0x10018080: 'kernel32.HeapAlloc(0x7280, 0x8, 0x4c)' -> 0x72a0
[+] State: 5c80354
0x10010ba0: 'kernel32.GetProcessHeap()' -> 0x7280
0x10018080: 'kernel32.HeapAlloc(0x7280, 0x8, 0x20)' -> 0x72f0
0x10017a4c: 'kernel32.LoadLibraryW("advapi32.dll")' -> 0x78000000
0x10010ba0: 'kernel32.GetProcessHeap()' -> 0x7280
0x10014b3a: 'kernel32.HeapFree(0x7280, 0x0, 0x72f0)' -> 0x1
0x10010ba0: 'kernel32.GetProcessHeap()' -> 0x7280

This list was not complete (because I skipped on purpose some failing calls and probably some calls were not correctly intercepted), but it gave me an overall picture of what was going on. Thanks FireEye!


With the help of Speakeasy output and a combination of dynamic and static analysis (done with x64gdb and Ghidra), I was able to reconstruct the main flows of the Malware. Consider that these flows are not complete, they are high level snapshot of what is going on for some (not all) the "states". I'm sure something is missing. This is the "main" flow


Then we have the "Persistency" flow (the yellow boxes are the interesting ones):


And the initial "C2" communication flow:


Not all the states were explored. I focused on persistence and initial C2. The great thing of this approach is that you can now alter the execution flow, by setting the ECX value you want to explore or execute.

I added a lot of details in the Ghidra file, by renaming the API calls and inserting comments. Every number reported in the graphs (ex 19a) are in the comments, so you can easily track the code section.


I renamed the functions with this standard:

  • a single underscore in front of API calls
  • a double underscore in front of internal function calls

Interesting findings: encrypted strings

All the strings are encrypted in a BLOB, located, in this particular dumped sample, at 0x1C800


The green box is the XOR key and the yellow one is the length of the string. The function used to perform the decryption is the __decrypt_buffer_string_FUN_10006aba and __decrypt_headers_footer_FUN_100033f4


Every single string is decrypted and then removed from memory after usage. This is true even for C format strings. So you will not find anything in memory if you try to inspect the mapped sections at runtime.

As said before, I added a specific section in the Speakeasy script to dump those strings.

Interesting findings: list of C2 servers

IP of C2 are dumped form the same BLOB (in this case at 0x1CA00) just after the decryption in step 20a.


As stated in Fortinet Analysis, this list is made of IP (green box) and port (yellow box). You can decode the whole list if you pass this part of the binary in the following python code:

import sys
import struct

b = bytearray(

for x in range(0,len(b),8):
    print('%u.%u.%u.%u:%u' % (b[x+3],b[x+2],b[x+1],b[x],struct.unpack('<H',bytes(b[x+4:x+6]))[0]))

You can find the full list extracted in IoC section.

Interesting findings: persistence

This particular sample obtain persistency by installing a System Service. This campaign deployed different versions of the DLL using also different techniques: Run Registry Key is one of them.

The section installing the service is the 20a (state 0x204C3E9E). The high level steps are the following:

  • decrypt the format string %s.%s

  • generates random chars to build the service name (which results in something like xzyw.qwe)

  • get one random "Service Description" from the existing ones, and use it as description of the new service

Interesting findings: encrypted communications with C2

In section 8a (state 0x1C904052) we can spot out the load of a RSA public key


After this we have a call to CryptGenKey with algo CALG_AES_128. So it looks that the sample is going to use a symmetric key to encrypt communication.

In section 20a (state 0x386459ce) we see how the communication is encrypted:

  • CryptGenKey
  • CryptEncrypt of the buffer to send, with the previous key
  • CryptExportKey encrypted with the RSA public key
  • the exported and encrypted symmetric key is then prepended to the buffer sent via HTTP

Wrap up

The analysis is far to be complete, there are a lot of unexplored part of the sample. At the end my goal was to build a procedure to make the analysis easier, even for different or future samples, where it would be faster to understand the overall picture.

Appendix: IoC

C2 IP list:


a State-Machine reversing exercise





No releases published


No packages published