# Generating Batches for Adversarial Training
This notebook contains example code for generating batches that could be used for adversarially training a classifier against our attacks.

**Note**: We deliberately describe the individual steps required for performing adversarial training rather than providing a pre-defined adversarial training loop as part of this library. This is because, from our experience, the latter is usually not flexible enough to cater to varying training schemes and architectures.

In [1]:
import torch
import os
from robust_dga_detection.models import CNNResNetWithEmbedding
from robust_dga_detection.utils import domains, reproduceability

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
reproduceability.setup_deterministic_environment()

## Step 1: Load a model to generate adversarial training batches for

In [2]:
model = CNNResNetWithEmbedding().to(DEVICE)

In [3]:
model.load_state_dict(
    torch.load(os.getenv("TRAINED_MODEL_PATH"))
)

<All keys matched successfully>

## Step 2: Setup data for the adversarial batch

Our batch generation functions aim to apply a wide range of adversarial attacks to the same batch of examples by splitting them into sufficient pieces. Therefore, you need at least 42 input samples to generate one domain per attack configuration. Nevertheless, we recommend choosing larger batch sizes. Internally, we randomly sample attack hyperparameters for every batch.

In the following, we generate example domains as $\mathrm{MD5}(i)$, which could just as well be generated by a real DGA. In a real use case, you would replace this with the portion of the mini-batch you want to create adversarial examples for (e.g., $50 \%$ of samples as in our paper)

In [4]:
import hashlib

def get_hash_domain(i):
    m = hashlib.md5()
    m.update(str(i).encode('ascii'))
    return m.hexdigest()

input_domains_to_generate_adversarial_examples_for = [
    get_hash_domain(i) for i in range(42)
]

Next, we encode the domains for the classifier and check the baseline prediction before applying adversarial attacks to them

In [5]:
encoded_domains = torch.stack(
    [domains.encode_domain(domain) for domain in input_domains_to_generate_adversarial_examples_for]
).to(DEVICE)
labels = torch.ones(len(input_domains_to_generate_adversarial_examples_for)).to(DEVICE)

In [8]:
model.eval()
with torch.no_grad():
    print(torch.sigmoid(model(encoded_domains)))

tensor([0.9999, 0.9949, 1.0000, 0.9912, 1.0000, 0.9997, 1.0000, 1.0000, 0.9997,
        1.0000, 1.0000, 0.9980, 0.9992, 1.0000, 0.9987, 1.0000, 0.9999, 0.9997,
        1.0000, 0.9938, 1.0000, 1.0000, 0.9965, 0.9997, 0.9997, 0.9999, 0.9997,
        1.0000, 1.0000, 0.9956, 1.0000, 0.9999, 0.9999, 1.0000, 0.9999, 0.9999,
        0.9999, 0.9918, 0.9991, 1.0000, 0.9999, 0.9999], device='cuda:0')


## Step 3: Apply Alternating Adversarial Training
In our paper, we recommend alternating between generating adversarial latent space vectors and generating adversarial domains for adversarial training.

In the following, we show how you can use this library to generate both kinds of batches to give you the most flexibility in using this library.

In [9]:
from robust_dga_detection import defenses

You can configure our batch attack functions only to use a subset of the implemented attacks (e.g., for LOGO evaluations).
Therefore, you must specify which attacks you want to use.

In [10]:
embedding_space_attacks_to_use = {x for x in defenses.EmbeddingSpaceAttackCatalogue}
discretization_schemes_to_use = {x for x in defenses.EmbeddingSpaceDiscretizationCatalogue}
nlp_attacks_to_use = {x for x in defenses.NLPAttackCatalogue}

### Option 1 - Generate adversarial embedding space vectors
In this setting, we generate adversarial embedding vectors. Therefore, we must first embed the domains for which we want to create adversarial samples.

In [11]:
model.eval()
with torch.no_grad():
    embedded_domains = model.embedding(encoded_domains)

Afterward, we can apply the attack function to the entire batch.

In [12]:
embedding_space_batch_attack = defenses.EmbeddingSpaceBatchAttack(model, embedding_space_attacks_to_use)
embedding_space_adversarial_batch = embedding_space_batch_attack.attack(embedded_domains, labels)

In [13]:
torch.sigmoid(model.net(embedding_space_adversarial_batch).ravel())

tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 1.4840e-11, 1.6182e-11, 1.3771e-11,
        1.8895e-11, 1.7746e-11, 1.9103e-11, 1.8332e-11, 1.9157e-11, 1.5997e-11,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
       device='cuda:0', grad_fn=<SigmoidBackward0>)

We can observe that the attack succeeded! The generated adversarial embedding space vectors can now be re-combined with the benign samples and used as an adversarial training batch!

### Option 2 - Generate adversarial domain names
In this setting, we generate adversarial domain names by combining embedding space attacks with discretization or by directly using discrete attacks.

In [15]:
discrete_batch_attack = defenses.DiscreteDomainbatchAttack(
    model,
    embedding_space_attacks_to_use,
    discretization_schemes_to_use,
    nlp_attacks_to_use
)
discrete_adversarial_batch = discrete_batch_attack.attack(encoded_domains, labels)

In [16]:
torch.sigmoid(model(discrete_adversarial_batch).ravel())

tensor([5.4454e-10, 9.9487e-01, 0.0000e+00, 0.0000e+00, 2.9638e-04, 2.9467e-06,
        9.9999e-01, 9.9999e-01, 8.8217e-03, 1.5801e-01, 2.1580e-01, 9.9805e-01,
        3.8359e-08, 1.0000e+00, 0.0000e+00, 0.0000e+00, 3.8751e-04, 9.9606e-08,
        1.0594e-09, 1.9512e-01, 6.9340e-03, 1.5028e-12, 7.5694e-06, 3.5234e-10,
        1.2796e-07, 1.1281e-01, 0.0000e+00, 0.0000e+00, 4.0686e-26, 2.0022e-09,
        5.1103e-27, 3.2958e-34, 1.8424e-27, 3.6021e-29, 3.2991e-27, 1.7414e-28,
        4.1944e-25, 1.7541e-27, 4.8127e-24, 1.4819e-24, 7.5002e-22, 2.4673e-23],
       device='cuda:0', grad_fn=<SigmoidBackward0>)

Again, we observe that the adversarial domain generation succeeded (in most cases, as seen below). The generated adversarial domains can now be re-combined with the benign samples and used as a batch for adversarial training.

In our paper, we randomly alternate between Option 1 and Option 2. We do not apply both options to the same batch. We do so here only for demonstration purposes.

In [17]:
_ < 0.5

tensor([ True, False,  True,  True,  True,  True, False, False,  True,  True,
         True, False,  True, False,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True], device='cuda:0')

During the adversarial training evaluations in our paper, we generate adversarial domains freshly for every training minibatch.

We hope this notebook helped you start hardening your classifier against our attacks.