the-stix-intern

the-stix-intern is a framework developed as part of my Masters' thesis "From Threat-Report to STIX-Bundle".

The framework implements 4 modules to extract STIX Domain Objects (SDOs) and STIX Relationship Objects (SROs) from unstructured threat-reports. It leverages open-source LLMs in various sizes and the library DSPy for automated module optimization. The performance of the implemented modules is on par with comparable approaches from literature that are based on closed-source LLMs from "Open"-AI. The performance of the implemented modules is on par with comparable approaches from literature (e.g. Time for aCTIon) that are based on closed-source LLMs like GPT-3.5:

Features

Threat-Report to STIX-Bundle: Automatically extracts SDOs like Malware, Threat-Actor, Attack-Pattern, Targets and some SROs from Threat-Reports and bundles them as STIX-Bundle
Expanded Dataset: The evaluation and optimization is based on the LADDER-dataset presented in Looking Beyond IoCs. The LADDER-dataset was enriched with the original Threat-Reports (*.html files) and the annotations were converted into STIX-Bundles.
Open-Source LLMs: While the Framework can be universally used with any LLM, the development was focused on Open-Source LLMs that are cost-efficient and privacy-preserving.
DSPy Optimization: The Framework comes with various modules optimized with MIPROv2.
CTI-Metrics: The evaluation is based on CTI-specific metrics that can accurately assess the correctness of the generated STIX bundles, rather than universal NLP metrics, which are not well-suited for this task.

Installation & Requirements

The framework itself is compatible with Python > 3.11. However if you want to load the optimized DSPy-Modules it is recommended to use Python 3.11.9 as cloudpickle may raise exceptions if the versions mismatch. Working on a solution to provide the modules in cloudpickle and more independent format for easy use.

Clone GitHub repository:

git clone https://github.com/cr4kn4x/the-stix-intern
cd the-stix-intern

Install requirements:

python -m venv venv 
.\venv\Scripts\activate
pip install -r requirements.txt

Setup environment variables (API-Keys)

Create a .env file in the root directory and add your API-Keys if required
The example uses Deepinfra as cloud-service for inference but feel free to use any other provider or self hosted LLMs. Inference is based on LiteLLM and compatiblity is versatile.

Pick your Module

The Framework comes with various optimized Zero-Shot and Few-Shot Modules. Every Module is optimized for one specific LLM! The following should give you first impression about the achieved performance.

Basic usage and how to load stored DSPy-Modules can be found in the the-stix-intern.ipynb notebook. The notebook also presents the usage with the HTML-Parser and the final parsing to STIX-Bundle using the STIX 2 Python API. Optional you can use the webscraper which enables the whole Workflow: URL --> Webscraper --> HTML-Parser --> LLMs --> STIX-Bundle.

The optimization and benchmark considers following LLMs:

Malware-Extractor

Threat-Actor-Extractor

Attack-Pattern-Extractor (includes SRO extraction)

Targets-Extractor (includes SRO extraction)

Acknowledgments

Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI - For the foundation research and LADDER Dataset
Time for aCTIon: Automated CTI Report Processing for Knowledge Graph Creation - For providing a solid framework for comparison
DSPy - For the optimization framework used in this research
STIX 2.1 - For the structured threat intelligence format

This research was conducted as part of a Master's thesis.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LADDER-Converter		LADDER-Converter
LADDER-Dataset		LADDER-Dataset
Modules		Modules
STIXnet-Converter		STIXnet-Converter
ThreatReportScraper		ThreatReportScraper
other		other
res		res
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
the-stix-intern.ipynb		the-stix-intern.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

the-stix-intern

Features

Installation & Requirements