Cyber Threat Intelligence Dataset

This is a dataset repository which is used for model training, evalutaions as a part of our CTI research;

Threat Behavior Textual Search by Attention Graph Isomorphism (Bae et al., EACL 2024)

The dataset consists of pretraining dataset, threat reports per APT groups and a collector tool (which I use for all of this collection, needed to update new reports after our work).

Large-scale Pretraining, Threat Reports Corpus Dataset

Textual corpus of threat reports
Collected from 8 vendors

Threat Reports, Classified by APT Groups

A collection of threat reports by APT groups
Our evaluation set is well-filtered, manually-verified set
We also provide the copied list from two public websites (Malpedia, ThaiCERT).

Threat Report Collector

Our dataset is as of 2022. 06, we will be releasing our collector as a tool (working on... will be uploaded soon)

MISC

Copyrights of all dataset belong to original authors or their vendors.
Any misuse of attack information is strictly prohibited.
Please contact us (Chanwoo Bae, bae68@purdue.edu) for any questions.
We kindly request to cite our paper with your use of dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
threat-reports-large		threat-reports-large
threat-reports-per-group		threat-reports-per-group
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cyber Threat Intelligence Dataset

Large-scale Pretraining, Threat Reports Corpus Dataset

Threat Reports, Classified by APT Groups

Threat Report Collector

MISC

About

Releases

Packages

cwbae10-purdue/CTI-EACL24

Folders and files

Latest commit

History

Repository files navigation

Cyber Threat Intelligence Dataset

Large-scale Pretraining, Threat Reports Corpus Dataset

Threat Reports, Classified by APT Groups

Threat Report Collector

MISC

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages