# Luwak TTP Extractor - Extracting TTPs from unstructured threat reports

## Overview
This notebook use [ERNIE 2.0]() English pretrained model to inference MITRE ATTA&CK Enterprise TTPs from unstructured descriptive threat intel data(e.g. blog). It can currently infer common Enterprise Techniques.

## Enviroment

### Key Requirements

Python (64 bit) >= 3.7  
pip (64 bit) >= 20.2.2  
PaddlePaddle (64 bit) >= 2.0   
paddlenlp  
nltk

Processor Arch x86_64 (arm64 is not supported)

### 0. Setup Virtual Env
Clone the repository and setup a virtual env with virthenv in Linux:
```
pip3 install virtualenv
mkdir <VENV_NAME>
cd <VENV_NAME>
virtualenv -p python3 .
source ./bin/activate
cd <path to this notebook>
```

### 1. Install Requirements

#### CPU

**NOT recommmended, time consuming, espacially with large report**

Refer to [offical site](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) to install PaddlePaddle.

For example, install nltk, paddlenlp and the current stable version 2.4.2 of PaddlePaddle for CPU with pip:
```
python -m pip install paddlepaddle==2.4.2 -i https://mirror.baidu.com/pypi/simple
pip install paddlenlp
pip install nltk
```

#### GPU
Refer to [offical site](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) to install PaddlePaddle.

For example, install nltk, paddlenlp and the current stable version 2.4.2 of PaddlePaddle for CUDA 11.2 in Linux with pip:
```
python -m pip install paddlepaddle-gpu==2.4.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
pip install paddlenlp
pip install nltk
```

Or you can install requirements from *requirements.txt*. Note that this *requirements.txt* is for PaddlePaddle 2.4.2 with CUDA 11.2 in Linux.
```
pip install -r requirements.txt
```

### 2. Download and Merge Model
Download nltk punkt and merge pretrained model:
```
python -c "import os, nltk; nltk_data_path = os.path.join(os.getcwd(),'nltk_data'); nltk.download('punkt', nltk_data_path);"

./merge_model.sh
```

Restart the kernel and run the notebook from the **RUN** section.

## RUN

### 1. Load the model

In [None]:
import inference
from inference import predict_text

### 2. Infer TTPs from text

Put the text to infer TTPs in following variable `text`

In [None]:
# text to be predicted
text = """ACNSHELL is sideloaded by a legitimate executable. 
It will then create a reverse shell via ncat.exe to the server closed.theworkpc.com
"""

Call the `predict_text` function to infer related TTPs.

In [None]:
o_text, ttps = predict_text(text)

### 3. Result

The result include two parts:

- `o_text`: The original text.
- `ttps`: TTPs inferred from text, list of `dict` objects. Every `dict` contains one sentence from `o_text`, inferred Tactics and Techniques. 

Example `ttps` for above `o_text`:
```
[{'sent': 'ACNSHELL is sideloaded by a legitimate executable.',
  'tts': [{'tactic_name': 'Persistence',
    'tactic_id': 'TA0003',
    'technique_name': 'DLLSide-Loading',
    'technique_id': 'T1574.002',
    'score': 0.9799978}]},
 {'sent': 'It will then create a reverse shell via ncat.exe to the server closed.theworkpc.com',
  'tts': [{'tactic_name': 'Execution',
    'tactic_id': 'TA0002',
    'technique_name': 'WindowsCommandShell',
    'technique_id': 'T1059.003',
    'score': 0.98543674}]}]
```

In [None]:
print("Original text:")
print(o_text)
print("\n")
print("Predict result:")
for idx, ttp in enumerate(ttps):
    sent = ttp['sent']
    tts = ttp['tts']
    print('idx: {}, sent: {}'.format(idx, sent))
    print('tts: {}'.format(tts))
    print("\n")