Skip to content

COSSAS/dgad

Repository files navigation


Hunt domains generated by Domain Generation Algorithms to identify malware traffic

All COSSAS projects are hosted on GitLab with a push mirror to GitHub. For issues/contributions check CONTRIBUTING.md

What is it?

Domain generation algorithms (DGAs) are typically used by attackers to create fast changing domains for command & control channels. The DGA detective is able to tell whether a domain is created by such an algorithm or not by using a Temporal Convolutional Network. For example, a domain like wikipedia.com is not generated by an algorithm, whereas ksadjfhlasdkjfsakjdf.com is.

Domain Classification
wikipedia.org OK
ksadjfhlasdkjfsakjdf.com DGA

More information can be found on cossas-project.org.

Installation

Python

To install the DGA Detective locally, we recommend using a virtual environment:

# recommended: use a virtual environment
python -m venv .venv
source .venv/bin/activate
pip install dgad

Docker

Otherwise, you can pull the docker image:

docker pull registry.gitlab.com/cossas/dgad:4.1.1

Usage

CLI

You can use the dgad CLI directly (if installed with pip) or through docker.

# list available commands with
$ dgad --help
Usage: dgad [OPTIONS] COMMAND [ARGS]...

  DGA Detective can predict if a domain name has been generated by a Domain
  Generation Algorithm

Options:
  --help  Show this message and exit.

Commands:
  client  classify domains from cli args or csv/jsonl files
  server  deploy a DGA Detective server

# list options with
# dgad client --help
# dgad server --help

CLI Examples

$ dgad client -d kajsdfhasdlkjfh.com
$ docker run -i registry.gitlab.com/cossas/dgad:4.1.1 client -d kdsjhfalksdjf.com
$ dgad client -d dsfjkhalsdkfj.com -o json
$ dgad client -d wikipedia.org -d anotherdomain.com
$ dgad client -f tests/data/domains_todo.csv --format csv
$ cat tests/data/domains_todo.csv | dgad client -fmt csv -f -
$ dgad client -f tests/data/domains_todo.csv --format csv --verbosity DEBUG

CLI input/output

dgad CLI can read domains from txt, csv, and jsonl files. You can find examples of these files in the demo top level directory. csv and jsonl files are expected to specify which column contains the domains. See the option --domains_column to set this parameter, which is domain by default.

# you can pipe input data to the flag -f from another command
$ cat tests/data/domains_todo.csv | dgad client -fmt csv -f -

# this is especially  useful when using the cli through docker
$ cat demo/domains.jsonl | docker run -i registry.gitlab.com/cossas/dgad:4.1.1 client -fmt jsonl -f -
{
  "domain": "python.org",
  "is_dga": false
}
(...)

# dgad outputs plain json, so you can easily pipe stdout to another command
$ dgad client -f tests/data/domains_todo.csv -fmt csv | jq '{domain: .[0].raw, is_dga: .[0].is_dga}'
{
  "domain": "wikipedia.org",
  "is_dga": false
}

production deployment with gRPC API

In production you may want to split client and server. DGA Detective ships with a performant gRPC api. You can then scale the amount of servers to handle as many domains as you need.

Server

# see dgad server --help for all options
# run
dgad server
2022-07-24 13:37:12,097 INFO     started dga detective classifier ce8f8efe-8272-44dd-a0be-cc34a0df752b

Client

# use the -r flag to achieve remote analysis
# for example you can reach a dgad server instance deployed at https://dgad.mydomain.com
dgad client -r -h dgad.mydomain.com -p 443 -f tests/data/domains_todo.csv -fmt csv | jq -r '.[] | {domain: .raw, is_dga: .is_dga}'
{
  "domain": "klajsdfiuweakjvnzslkvjneaiuvbkjbre.ru",
  "is_dga": true
}
{
  "domain": "aksdjhflkajsdhflka.com",
  "is_dga": true
}
{
  "domain": "wikipedia.org",
  "is_dga": false
}

as a python package in your code

# demo/demo.py
from dgad.prediction import Detective
from dgad.utils import pretty_print

mydomains = ["adslkfjhsakldjfhasdlkf.com"]
detective = Detective()
# convert mydomains strings into dgad.schema.Domain
mydomains, _ = detective.prepare_domains(mydomains)
# classify them
detective.investigate(mydomains)
# view result, drops padded_token_vector for pretty printing
pretty_print(mydomains, output_format="json")
python demo.py
[
  {
    "raw": "adslkfjhsakldjfhasdlkf.com",
    "words": [
      {
        "value": "adslkfjhsakldjfhasdlkf",
        "padded_length": 120,
        "binary_score": 0.992063581943512,
        "binary_label": "DGA",
        "family_score": 0.34756162762641907,
        "family_label": "necurs"
      }
    ],
    "suffix": "com",
    "is_dga": true,
    "family_label": "necurs",
    "padded_length": 120
  }
]

Contributing

Contributions to the DGA Detective are highly appreciated and more than welcome. Please read CONTRIBUTING.md for more information about our contributions process.

Setup development environment

To create a development environment to make a contribution, follow these steps:

Requirements

Setup

# checkout this repository
git clone git@gitlab.com:cossas/dgad.git
cd dgad

# install project, poetry will spawn a new venv
poetry install

# gRPC code generation
make protoc

About

DGA Detective is developed by TNO in the SOCCRATES innovation project. SOCCRATES has received funding from the European Union’s Horizon 2020 Research and Innovation program under Grant Agreement No. 833481.