# Emotet Deobfuscation
> Removing CFF Obfuscation From Emotet Using Angr and Symbolic Execution

- toc: true 
- badges: true
- categories: [emotet,malware,angr,symbolic execution,deobfuscation,research]

## Overview

Sample: `c7574aac7583a5bdc446f813b8e347a768a9f4af858404371eae82ad2d136a01`

Unpacked: `eeb13cd51faa7c23d9a40241d03beb239626fbf3efe1dbbfa3994fc10dea0827`

References:
- [Malshare Sample (Unpacked)](https://malshare.com/sample.php?action=detail&hash=eeb13cd51faa7c23d9a40241d03beb239626fbf3efe1dbbfa3994fc10dea0827)

Research:
- [DFS and BFS graph traversal tutorial](https://www.youtube.com/watch?v=vf-cxgUXcMk)
- 

## Approach For Identifying Original Basic Blocks (OBB) - Assembly/IDA Only

Shout out to [@mrexodia](https://github.com/sponsors/mrexodia) full credit goes to him for this approach!

![](https://i.imgur.com/KTCpxRx.png)

We are doing a breadth-first search through bb. This is specific to the binary we are analyzing, we just noticed that the `jz`/`jnz` is used for dispatcher control flow (it will differ for other binaries). For our search algorithm we will relie on this to mark a transition from the cf to an obb. 

The generic algorithm.
- Assume two states (not the same as the cff states) **in obb** and **in cf**. 
- Walk the graph in a bredth first search (BFS) and track your current state.
  - When you are **in cf** you can transition to **in obb** on the positive branch of a `jz` or the negative branch of a `jnz`. 
  - If you are **in obb** you don't exit until you hit an already identified **in obb** bb. 
- Mark each bb as you go

This works because we are doing a BFS and the CFF forces a loop back to the dispatcher so we are garunteed to have already seen the dispatcher **in cf** bb before we reach the end of the first **in obb**.

The specific algorithm.
- start at the disptacher entry this is the first dispatcher block
- for each next bb if it doesn't end in a jz/jnz then we mark as dispatcher and search forward
- if we are in a dispatcher and end with a jz the TRUE points to a obb and the FALSE points to another dispatcher
- if we are in a dispatcher and end with a jnz the FALSE points to a obb and the TRUE points to another dispatcher
- mark the blocks and continue our search BFS
- if we are in an obb mark every next bb as an obb until we see a dispatcher block (then end that trace)



```python
import idaapi
import idautils
import idc
from queue import Queue
import struct

# Basic blocks for dispatcher and obb
# bb_states[start_address] = obb = True/False (obb or dispatcher)
bb_states = {}
bb_visited = set()


fn_start = 0x10008784
fn_end = 0x100099D2 

dispatcher_start = 0x1000953A 


function = idaapi.get_func(fn_start)
flowchart = idaapi.FlowChart(function)


# Get bb flowchart starting with dispatcher
dispatcher_flowchart = list(flowchart[0].succs())[0]

# Use a queue for BFS 
q = Queue()

# Push dispatcher start onto queue and add info
q.put(dispatcher_flowchart)
bb_states[dispatcher_flowchart.start_ea] = {'obb':False }

    

# Walk through bb
while not q.empty():
    bb_flowchart = q.get()
    bb_start = bb_flowchart.start_ea
    # Get bb_info
    bb_info = bb_states[bb_start]
    
    #print(f"-> {hex(bb_start)} {bb_info}")
    
    if bb_start in bb_visited:
        # We don't need to re-process this just continue
        continue
    else:
        bb_visited.add(bb_start)
    
    # Check if there are successors
    if len(list(bb_flowchart.succs())) == 0:
        continue
        
    # Check if the bb is conditional 
    if len(list(bb_flowchart.succs())) > 1:
        # Parse the condition
        bb_end = prev_head(bb_flowchart.end_ea) 
        if not bb_info.get('obb') and print_insn_mnem(bb_end) == 'jz': 
            # The true jmp indicated an obb
            # The false indicates more dispatcher
            # We are going to check which next bb matches the
            # true contidion for the jz
            true_bb_address = get_operand_value(bb_end,0)
            for next_bb_flowchart in bb_flowchart.succs():
                # Get the next bb address
                next_bb_start = next_bb_flowchart.start_ea
                # If we have already visited it ignore
                if next_bb_start in bb_visited:
                    continue
                if next_bb_start == true_bb_address:
                    # Put next bb onto the queue
                    q.put(next_bb_flowchart)
                    # Mark the bb as an obb
                    bb_states[next_bb_start] = {'obb':True}
                else:
                    # This is another dispatcher bb
                    # Put next bb onto the queue
                    q.put(next_bb_flowchart)
                    # Mark the bb as an obb
                    bb_states[next_bb_start] = {'obb':False}
        elif not bb_info.get('obb') and print_insn_mnem(bb_end) == 'jnz': 
            # The true jmp indicated an obb
            # The false indicates more dispatcher
            # We are going to check which next bb matches the
            # true contidion for the jz
            true_bb_address = get_operand_value(bb_end,0)
            for next_bb_flowchart in bb_flowchart.succs():
                # Get the next bb address
                next_bb_start = next_bb_flowchart.start_ea
                # If we have already visited it ignore
                if next_bb_start in bb_visited:
                    continue
                if next_bb_start == true_bb_address:
                    # Put next bb onto the queue
                    q.put(next_bb_flowchart)
                    # Mark the bb as an obb
                    bb_states[next_bb_start] = {'obb':False}
                else:
                    # This is another dispatcher bb
                    # Put next bb onto the queue
                    q.put(next_bb_flowchart)
                    # Mark the bb as an obb
                    bb_states[next_bb_start] = {'obb':True}
        else:
            # We can treat all next bb as if there is no condition
            # and propogate the bb type
            for next_bb_flowchart in bb_flowchart.succs():
                # Get the next bb address
                next_bb_start = next_bb_flowchart.start_ea
                # If we have already visited it ignore
                if next_bb_start in bb_visited:
                    continue
                # Add it to the queue and add info same as current block
                q.put(next_bb_flowchart)
                # Set bb type based on this bb
                bb_states[next_bb_flowchart.start_ea] = {'obb':bb_info.get('obb')}
    else:
        # No condition 
        next_bb_flowchart = list(bb_flowchart.succs())[0]
        # If not visited
        if next_bb_flowchart.start_ea not in bb_visited:
            # Push next block on queue and add info
            q.put(next_bb_flowchart)
            # Set bb type based on this bb
            bb_states[next_bb_flowchart.start_ea] = {'obb':bb_info.get('obb')}

  
#### ALL this for debugging 

# Add color to bb just for debugging

    
def set_bb_color(ea, flowchart, color_value):
    for block in flowchart:
        if block.start_ea <= ea and block.end_ea > ea:
            # Loop and add color
            ptr = block.start_ea
            while ptr <= prev_head(block.end_ea):
                set_color(ptr, CIC_ITEM, color_value)
                ptr = next_head(ptr)
            break
        
# Verification conditions:
#  - all bb should be in the visited set
#  - each bb should have a type associated with it in the bb_states 

for bb_addr in bb_states:
    print(f"{hex(bb_addr)}: {bb_states[bb_addr]}")
    if bb_states[bb_addr].get('obb'):
        # Make green for obb
        set_bb_color(bb_addr, flowchart, 0x00ff00)
    else:
        # Make orange for dispatcher
        set_bb_color(bb_addr, flowchart, 0x00A5ff)
```

## Approach For Identifying Original Basic Blocks (OBB) Using Symbolic Execution
The drawback of the assembly approach is that the analyst must first identify what condition causes a transition between the obb and cf blocks. This is a manual process. If we want to fully automate this in a generic way we need to use symbolic execution to identify which bb are cf and which are obb.

For this approach the analyst must still identify how the **state** is tracked (register) but heuristics can also be used to do this automatically. Once we identify the state we can use the same algorithm above but instead of using a `jz`/`jnz` assembly compare to test for a transition between a cf and obb block we can check our symbolic exeuction predicate to see if the state is an equation or a constant (ie. does the state change in the bb or is it constant) if it is constant than we know the state was not modified so this is a cf block if the state can change then this is obb. 

TODO...

## Deobfuscation With Symboic Execution
Once we have identified the cf and obb blocks we can start using the familliar symbolic execution approach we used for Pandora Ransomware CFF.

TODO....