Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
46973ea
Commit first draft
maldwg Jun 27, 2025
c685af1
Finish first draft for integrating zeek
maldwg Jun 27, 2025
3c9d5d4
made kafka work with external addresses also
maldwg Jun 27, 2025
3a047b1
complete and fix tests
maldwg Jun 30, 2025
f37b9ec
fix zeek handler for multiple kafka topics
maldwg Jul 1, 2025
a5b26e6
add configuration for dns payload size
maldwg Jul 1, 2025
0f83370
adapt logserver to work with multiple topics
maldwg Jul 1, 2025
ea0acef
Adapt logcollector and batch handler to work with new topic design
maldwg Jul 10, 2025
8070f02
Fix some remaining timestamp key issues
maldwg Jul 10, 2025
cff6d16
begin work on prefilter
maldwg Jul 15, 2025
3f3bf73
fixed collector again
maldwg Jul 21, 2025
7a7abc8
added prefilter for new layout with some technical debt
maldwg Jul 21, 2025
7282b65
add first draft until detector
maldwg Jul 21, 2025
250323d
finish all modules first drafts
maldwg Jul 25, 2025
9ac2678
Add kafka topic exporter
maldwg Jul 28, 2025
ee4164a
finish first draft for zeek!
maldwg Jul 29, 2025
92a514e
first draft for batch-tree
maldwg Aug 5, 2025
f652435
fix dashboard for latencies
maldwg Aug 5, 2025
f368e85
update dashboard and fix loglines count again
maldwg Aug 5, 2025
977fc2b
finish monitoring metrics for latencies
maldwg Aug 5, 2025
75e8d30
format everything
maldwg Aug 8, 2025
5385eda
fix kafka tests
maldwg Aug 11, 2025
79e4a27
begin reworking the logserver teste
maldwg Aug 11, 2025
13e1ba7
finish logserver tests
maldwg Aug 11, 2025
513be82
fix log collector instance tests
maldwg Aug 12, 2025
c6cb54a
adapt prefilter tests for now
maldwg Aug 13, 2025
40096c5
fix loglinehandler and add relevance handler tests
maldwg Aug 13, 2025
aa70188
add prefilter tests
maldwg Aug 13, 2025
a503d36
finish correcting tests for now
maldwg Aug 14, 2025
8c69182
correct last logcollector test
maldwg Aug 14, 2025
f92ea5d
Create abstract detector base class
maldwg Aug 15, 2025
ea68d32
fix inspector tests
maldwg Aug 18, 2025
c050ecb
Update documentation for loglinehandler and check for untested code
maldwg Aug 18, 2025
d84eda5
finish zeek tests and documentation
maldwg Aug 18, 2025
dbb21f4
finish logserver tests
maldwg Aug 18, 2025
3545087
Add comments for utils and finish tests
maldwg Aug 19, 2025
93c301a
Finish documentation and test implementation for the logcollector and…
maldwg Aug 19, 2025
e75c81f
Fix prefilter nd document code
maldwg Aug 19, 2025
4c784c9
Finish inspector tests and documentation
maldwg Aug 20, 2025
0b5188e
Add code documentation for the detector
maldwg Aug 20, 2025
48c9542
adapt the grafana dashboard
maldwg Aug 20, 2025
8195ebc
Finish tests for detector stage
maldwg Aug 20, 2025
3b8f6e4
first adjustments in documentation
maldwg Aug 27, 2025
2b308ea
finish first part of documentation
maldwg Aug 27, 2025
fb117dc
Integrate scaler changes: TODO: correct tests
maldwg Aug 27, 2025
1c38488
Finish documentation adpatations
maldwg Aug 29, 2025
d71a8eb
Finish tests for detector
maldwg Sep 11, 2025
9aef480
fix small zeek configs and detector cheksum
maldwg Sep 12, 2025
6555e81
Run code formatting
maldwg Sep 24, 2025
1d79c02
update gitignore to allow updates to requirements files
maldwg Sep 26, 2025
c21c198
Merge branch 'main' of github.com:Hamstring-NDR/hamstring into featur…
maldwg Oct 27, 2025
dc622d6
Fix unit test errors and adapt detector modules to work with new heib…
maldwg Nov 4, 2025
740d505
Refactor timestamp fields to "ts"
maldwg Nov 4, 2025
6fffa48
Refactor timestamp fields to "ts"
maldwg Nov 4, 2025
a8b425c
Adapt configuration and environment to be able to differentiate bette…
maldwg Nov 6, 2025
77018f0
Fix integration of zeek changes
maldwg Nov 6, 2025
0545dc7
restructure docker compsoe files using profiles now
maldwg Nov 6, 2025
5a647e9
code formatting
maldwg Nov 6, 2025
cffbf56
Merge branch 'main' of github.com:Hamstring-NDR/hamstring into featur…
maldwg Nov 6, 2025
73ce00e
Fix tests for prefilter again
maldwg Nov 6, 2025
937d42e
Adapt detector and detectopr tests to be state of the art
maldwg Nov 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -325,3 +325,6 @@ cython_debug/
# Others
docs/api/
!/docs/api/index.rst

# requirements.txt
!*/requirements.*.txt
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,24 @@

## About the Project

![Pipeline overview](https://raw.githubusercontent.com/stefanDeveloper/heiDGAF/main/docs/media/heidgaf_overview_detailed.drawio.png?raw=true)
![Pipeline overview](./assets/heidgaf_architecture.svg)

## Getting Started

#### Run **heiDGAF** using Docker Compose:

```sh
HOST_IP=127.0.0.1 docker compose -f docker/docker-compose.yml up
HOST_IP=127.0.0.1 docker compose -f docker/docker-compose.yml --profile prod up
```
<p align="center">
<img src="https://raw.githubusercontent.com/stefanDeveloper/heiDGAF/main/assets/terminal_example.gif?raw=true" alt="Terminal example"/>
</p>

#### Use the dev profile for testing out changes in docke containers:
```sh
HOST_IP=127.0.0.1 docker compose -f docker/docker-compose.yml --profile dev up
```


#### Or run the modules locally on your machine:
```sh
python -m venv .venv
Expand All @@ -87,6 +92,8 @@ python src/inspector/inspector.py
<p align="right">(<a href="#readme-top">back to top</a>)</p>




## Usage

### Configuration
Expand Down Expand Up @@ -276,6 +283,31 @@ This will create a `rules.txt` file containing the innards of the model, explain
<p align="right">(<a href="#readme-top">back to top</a>)</p>


### Data

> [!IMPORTANT]
> We support custom schemes.

Depending on your data and usecase, you can customize the data scheme to fit your needs.
The below configuration is part of the [main configuration file](./config.yaml) which is detailed in our [documentation](https://heidgaf.readthedocs.io/en/latest/usage.html#id2)

```yml
loglines:
fields:
- [ "timestamp", RegEx, '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z$' ]
- [ "status_code", ListItem, [ "NOERROR", "NXDOMAIN" ], [ "NXDOMAIN" ] ]
- [ "src_ip", IpAddress ]
- [ "dns_server_ip", IpAddress ]
- [ "domain_name", RegEx, '^(?=.{1,253}$)((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,63}$' ]
- [ "record_type", ListItem, [ "A", "AAAA" ] ]
- [ "response_ip", IpAddress ]
- [ "size", RegEx, '^\d+b$' ]
```



<p align="right">(<a href="#readme-top">back to top</a>)</p>

<!-- CONTRIBUTING -->
## Contributing

Expand Down
4 changes: 4 additions & 0 deletions assets/heidgaf_architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions assets/heidgaf_cicd.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 58 additions & 23 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,42 @@ pipeline:
logserver:
input_file: "/opt/file.txt"



log_collection:
collector:
logline_format:
- [ "timestamp", Timestamp, "%Y-%m-%dT%H:%M:%S.%fZ" ]
- [ "status_code", ListItem, [ "NOERROR", "NXDOMAIN" ], [ "NXDOMAIN" ] ]
- [ "client_ip", IpAddress ]
- [ "dns_server_ip", IpAddress ]
- [ "domain_name", RegEx, '^(?=.{1,253}$)((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,63}$' ]
- [ "record_type", ListItem, [ "A", "AAAA" ] ]
- [ "response_ip", IpAddress ]
- [ "size", RegEx, '^\d+b$' ]
batch_handler:
batch_size: 10000
default_batch_handler_config:
batch_size: 2000
batch_timeout: 30.0
subnet_id:
ipv4_prefix_length: 24
ipv6_prefix_length: 64
collectors:
- name: "dga_collector"
protocol_base: dns
required_log_information:
- [ "ts", Timestamp, "%Y-%m-%dT%H:%M:%S" ]
- [ "status_code", ListItem, [ "NOERROR", "NXDOMAIN" ], [ "NXDOMAIN" ] ]
- [ "src_ip", IpAddress ]
- [ "dns_server_ip", IpAddress ]
- [ "domain_name", RegEx, '^(?=.{1,253}$)((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,63}$' ]
- [ "record_type", ListItem, [ "A", "AAAA" ] ]
# - [ "response_ip", IpAddress ]
- [ "size", RegEx, '^\d+$' ]
batch_handler_config_override:
batch_timeout: 30.1

log_filtering:
- name: "dga_filter"
# method to apply for rule based prefiltering according to the needs
relevance_method: check_dga_relevance
collector_name: dga_collector


data_inspection:
inspector:
- name: dga_inspector
inspector_module_name: "stream_ad_inspector"
inspector_class_name: "StreamADInspector"
prefilter_name: dga_filter
mode: univariate # multivariate, ensemble
# Only used when mode is set to ensemble
ensemble:
Expand All @@ -51,32 +67,51 @@ pipeline:
module: streamad.model
model_args:
is_global: false
anomaly_threshold: 0.01
score_threshold: 0.5
anomaly_threshold: 0.0001
score_threshold: 0.005
time_type: ms
time_range: 20

data_analysis:
detector:
model: rf # XGBoost
checksum: 021af76b2385ddbc76f6e3ad10feb0bb081f9cf05cff2e52333e31040bbf36cc
- name: "RF-dga_detector"
detector_module_name: "dga_detector"
detector_class_name: "DGADetector"
model: rf
checksum: 5db8bfb617e80361362c33b1d1afc6d762c28e9fa9275fb11514a3bdef76bb88
base_url: https://heibox.uni-heidelberg.de/d/0d5cbcbe16cd46a58021/
threshold: 0.5
threshold: 0.005
inspector_name: dga_inspector

monitoring:
clickhouse_connector:
batch_size: 50 # do not set higher
batch_timeout: 2.0


zeek:
sensors:
zeek1:
static_analysis: true
protocols:
- dns
interfaces:
- enp2s0f0

environment:
kafka_brokers:
- hostname: kafka1
port: 19092
internal_port: 19092
external_port: 8097
node_ip: 127.0.0.1
- hostname: kafka2
port: 19093
internal_port: 19093
external_port: 8098
node_ip: 127.0.0.1
- hostname: kafka3
port: 19094
kafka_topics:
internal_port: 19094
external_port: 8099
node_ip: 127.0.0.1
kafka_topics_prefix:
pipeline:
logserver_in: "pipeline-logserver_in"
logserver_to_collector: "pipeline-logserver_to_collector"
Expand Down
Binary file added data/test_pcaps/cic-ids-2017-sample.pcap2
Binary file not shown.
Binary file added data/test_pcaps/ctu-sample.pcap
Binary file not shown.
Binary file added data/test_pcaps/unsw-sample.pcap2
Binary file not shown.
4 changes: 2 additions & 2 deletions docker/create_tables/alerts.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
CREATE TABLE IF NOT EXISTS alerts (
client_ip String NOT NULL,
src_ip String NOT NULL,
alert_timestamp DateTime64(6) NOT NULL,
suspicious_batch_id UUID NOT NULL,
overall_score Float32 NOT NULL,
domain_names String NOT NULL,
result String,
)
ENGINE = MergeTree
PRIMARY KEY(client_ip, alert_timestamp);
PRIMARY KEY(src_ip, alert_timestamp);
2 changes: 2 additions & 0 deletions docker/create_tables/batch_timestamps.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
CREATE TABLE IF NOT EXISTS batch_timestamps (
batch_id UUID NOT NULL,
instance_name String NOT NULL,
stage String NOT NULL,
status String NOT NULL,
timestamp DateTime64(6) NOT NULL,
message_count UInt32,
is_active Bool NOT NULL
)
ENGINE = MergeTree
-- keep the PK as the UUID even thogh it is not uinque for indexing reasons
PRIMARY KEY (batch_id);
14 changes: 14 additions & 0 deletions docker/create_tables/batch_tree.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
-- Table to be able to reconstruct where the batch was processed in
-- used in grafana to calculate the elapsed time between stages
CREATE TABLE IF NOT EXISTS batch_tree (
batch_row_id String NOT NULL,
batch_id UUID NOT NULL,
parent_batch_row_id Nullable(String), -- Default of Null indicates a root element
instance_name String NOT NULL,
stage String NOT NULL,
status String NOT NULL,
timestamp DateTime64(6) NOT NULL,
)
ENGINE = MergeTree
-- keep the PK as the UUID even thogh it is not uinque for indexing reasons
PRIMARY KEY (batch_row_id);
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CREATE TABLE IF NOT EXISTS failed_dns_loglines (
CREATE TABLE IF NOT EXISTS failed_loglines (
message_text String NOT NULL,
timestamp_in DateTime64(6) NOT NULL,
timestamp_failed DateTime64(6) NOT NULL,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
CREATE TABLE IF NOT EXISTS dns_loglines (
CREATE TABLE IF NOT EXISTS loglines (
logline_id UUID NOT NULL,
subnet_id String NOT NULL,
timestamp DateTime64(6) NOT NULL,
status_code String NOT NULL,
client_ip String NOT NULL,
record_type String NOT NULL,
subnet_id String NOT NULL,
src_ip String NOT NULL,
additional_fields String
)
ENGINE = MergeTree
Expand Down
4 changes: 3 additions & 1 deletion docker/create_tables/suspicious_batch_timestamps.sql
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
CREATE TABLE IF NOT EXISTS suspicious_batch_timestamps (
suspicious_batch_id UUID NOT NULL,
client_ip String NOT NULL,
src_ip String NOT NULL,
instance_name String NOT NULL,
stage String NOT NULL,
status String NOT NULL,
timestamp DateTime64(6) NOT NULL,
message_count UInt32,
is_active Bool NOT NULL
)
ENGINE = MergeTree
-- keep the PK as the UUID even thogh it is not uinque for indexing reasons
PRIMARY KEY (suspicious_batch_id);
Loading