This is a dataset repository which is used for model training, evalutaions as a part of our CTI research;
Threat Behavior Textual Search by Attention Graph Isomorphism (Bae et al., EACL 2024)
The dataset consists of pretraining dataset, threat reports per APT groups and a collector tool (which I use for all of this collection, needed to update new reports after our work).
-
Textual corpus of threat reports
-
Collected from 8 vendors
-
A collection of threat reports by APT groups
-
Our evaluation set is well-filtered, manually-verified set
-
We also provide the copied list from two public websites (Malpedia, ThaiCERT).
- Our dataset is as of 2022. 06, we will be releasing our collector as a tool (working on... will be uploaded soon)
-
Copyrights of all dataset belong to original authors or their vendors.
-
Any misuse of attack information is strictly prohibited.
-
Please contact us (Chanwoo Bae, bae68@purdue.edu) for any questions.
-
We kindly request to cite our paper with your use of dataset.