# Combining Embedding Space Attack with Controllable Discretization
This notebook contains example code for generating fully valid adversarial e2LDs by combining
an embedding-space adversarial attack with a discretization scheme.

The concepts are explained step by step, allowing you to apply these attacks to your own models!

In [1]:
import torch
import os
from robust_dga_detection.models import CNNResNetWithEmbedding
from robust_dga_detection.utils import domains, reproduceability

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
reproduceability.setup_deterministic_environment()

## Step 1: Load a model to attack

In [2]:
model = CNNResNetWithEmbedding().to(DEVICE)

In [3]:
model.load_state_dict(
    torch.load(os.getenv("TRAINED_MODEL_PATH"))
)

<All keys matched successfully>

## Step 2: Setup an embedding-space attack

As most adversarial attack libraries require inputs that have similar shapes to images, we introduce a transparent translation layer to meet these expectations.

**NOTE**: This adaptation layer performs some caching during creation. Therefore, it must be re-created when the model is updated in any way to avoid inconsistent results.

In [4]:
from robust_dga_detection.attacks.embedding_space import ImageEmulationAdapter
from foolbox import PyTorchModel

image_model = ImageEmulationAdapter(model).eval()
foolbox_model = PyTorchModel(image_model, bounds=(0, 1), device=DEVICE)

In this framework, an embedding-space attack is a function that maps a scaled batch of embedded domain names with labels to
a scaled batch of adversarial examples.

$$
    \mathrm{atk}: [0, 1]^{n \times w \times d} \times \{0, 1\}^n \rightarrow [0, 1]^{n \times w \times d}
$$
Below, we provide example attack functions for all attacks we used in our paper. **Please selectively execute only the cell for the attack you want to use.**

### Binary AutoAttack $L_2$
Modified version of AutoAttack using an $L_2$ norm bound introduced in the paper

Francesco Croce and Matthias Hein. “Reliable evaluation of adversarial ro-
bustness with an ensemble of diverse parameter-free attacks”. In: Proceedings
of the 37th International Conference on Machine Learning. Ed. by Hal Daumé III
and Aarti Singh. Vol. 119. Proceedings of Machine Learning Research. PMLR,
2020, pp. 2206–2216. url: https://proceedings.mlr.press/v119/croce20b.html.

In [5]:
from robust_dga_detection.attacks.embedding_space import BinaryAutoAttack
eps_l2 = 50
attack_function = BinaryAutoAttack(model=image_model, eps=eps_l2, norm="L2")

### Binary AutoAttack $L_\infty$
Modified version of AutoAttack using an $L_\infty$ norm bound introduced in the paper

Francesco Croce and Matthias Hein. “Reliable evaluation of adversarial ro-
bustness with an ensemble of diverse parameter-free attacks”. In: Proceedings
of the 37th International Conference on Machine Learning. Ed. by Hal Daumé III
and Aarti Singh. Vol. 119. Proceedings of Machine Learning Research. PMLR,
2020, pp. 2206–2216. url: https://proceedings.mlr.press/v119/croce20b.html.

In [None]:
from robust_dga_detection.attacks.embedding_space import BinaryAutoAttack
eps_linf = 0.5
attack_function = BinaryAutoAttack(model=image_model, eps=eps_linf, norm="Linf")

### Projected Gradient Descent $L_2$ attack
The PGD attack using an $L_2$ norm bound introduced in the paper

Aleksander Madry et al. “Towards Deep Learning Models Resistant to Ad-
versarial Attacks”. In: 6th International Conference on Learning Representations,
ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Pro-
ceedings. OpenReview.net, 2018. url: https://openreview.net/forum?id=rJzIBfZAb.

In [None]:
import torchattacks
eps_l2 = 50
attack_function = torchattacks.PGDL2(model=image_model, eps=eps_l2, steps=50, random_start=True)

### Projected Gradient Descent $L_\infty$ attack

The PGD attack using an $L_\infty$ norm bound introduced in the paper

Aleksander Madry et al. “Towards Deep Learning Models Resistant to Ad-
versarial Attacks”. In: 6th International Conference on Learning Representations,
ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Pro-
ceedings. OpenReview.net, 2018. url: https://openreview.net/forum?id=rJzIBfZAb.

In [None]:
import torchattacks
eps_linf = 0.5
attack_function = torchattacks.PGD(model=image_model, eps=eps_linf, steps=50, random_start=True)

### Carlini & Wagner $L_2$ Attack

The C&W attack using an $L_2$ norm bound introduced in the paper

Nicholas Carlini and David Wagner. “Towards Evaluating the Robustness
of Neural Networks”. In: 2017 IEEE Symposium on Security and Privacy (SP).
May 2017, pp. 39–57. doi: 10.1109/SP.2017.49.

In [None]:
import foolbox

cw_confidence = 0
cw = foolbox.attacks.L2CarliniWagnerAttack(steps=50, confidence=cw_confidence)

def cw_attack_fun(inputs, labels):
    criterion = foolbox.Misclassification(labels)
    _, adv_examples, _ = cw(
        foolbox_model,
        inputs,
        criterion,
        epsilons=128,
    )
    return adv_examples

attack_function = cw_attack_fun

## Step 3: Setup a discretization scheme
A discretization scheme translates the generated adversarial embedding vectors back to adversarial domain names. In our paper, we develop six different discretization schemes with individual strengths and weaknesses.

In [6]:
from robust_dga_detection.attacks.discretization import E2lDDiscretizerWithLengthBruteForce, E2lDDiscretizerWithLengthCutoff, RoundingNorm

In [7]:
discretization_schemes = {
    "len_bf_l2_min_7": E2lDDiscretizerWithLengthBruteForce(
        model, norm=RoundingNorm.L_2, minimum_output_length=7
    ),
    "len_bf_linf_min_7": E2lDDiscretizerWithLengthBruteForce(
        model, norm=RoundingNorm.L_INF, minimum_output_length=7
    ),
    "len_bf_cos_min_7": E2lDDiscretizerWithLengthBruteForce(
        model, norm=RoundingNorm.COS, minimum_output_length=7
    ),
    "len_cutoff_l2_min_7": E2lDDiscretizerWithLengthCutoff(
        model, norm=RoundingNorm.L_2, minimum_output_length=7
    ),
    "len_cutoff_linf_min_7": E2lDDiscretizerWithLengthCutoff(
        model, norm=RoundingNorm.L_INF, minimum_output_length=7
    ),
    "len_cutoff_cos_min_7": E2lDDiscretizerWithLengthCutoff(
        model, norm=RoundingNorm.COS, minimum_output_length=7
    ),
}

## Step 4: Putting it all together to generate adversarial domain names

In [8]:
input_domain = "lmlabssssssssentasdasdasdasdasdasdasd"

Measure the models prediction on the input domain

In [10]:
encoded_domains = torch.unsqueeze(domains.encode_domain(input_domain), dim=0).to(DEVICE)
labels = torch.Tensor([1]).long().to(DEVICE)
print(f"Baseline model value: {torch.sigmoid(model(encoded_domains)).item()}")

Baseline model value: 0.9897634983062744


Apply the embedding space attack to obtain an embedding vector that results in a strong negative prediction

In [11]:
with torch.no_grad():
    embedded_domains = model.embedding(encoded_domains)

adversarial_embedding_vectors = image_model.apply_attack(attack_function, embedded_domains, labels) 
print(f"Model value on adversarial embedding vector: {torch.sigmoid(model.net(adversarial_embedding_vectors)).item()}")

Model value on adversarial embedding vector: 0.0


Use the discretization schemes to recover domain names that (hopefully) retain the strong negative prediction

In [12]:
for disc_name, discretization_scheme in discretization_schemes.items():
    discrete_adversarial_examples = discretization_scheme(encoded_domains, adversarial_embedding_vectors)
    print(f"{disc_name} generated the domain:\n\t'{domains.decode_domains(discrete_adversarial_examples)[0]}'\n\twith model value {torch.sigmoid(model(discrete_adversarial_examples)).item()}\n")

len_bf_l2_min_7 generated the domain:
	'080gvxjsiulisugqqqvxqrmlmlae-ssai--------89s----------app0-0'
	with model value 0.0

len_bf_linf_min_7 generated the domain:
	'ckdkvvngpuourojujjavmrmlecassssuaeea8ykgkw9mvaak8vkdsea0vea2'
	with model value 4.922889318415002e-10

len_bf_cos_min_7 generated the domain:
	'1--gvxjmxulisugqqjvvkrmlmla--ssai--------89s----------a-p0-0'
	with model value 0.0

len_cutoff_l2_min_7 generated the domain:
	'iapp0-0'
	with model value 4.6235862782850745e-07

len_cutoff_linf_min_7 generated the domain:
	'ea0vea2'
	with model value 0.12092402577400208

len_cutoff_cos_min_7 generated the domain:
	'0a-p0-0'
	with model value 2.1017700913006365e-09



We hope this notebook provided you with the details required for testing your models against our attacks.
Nevertheless, do not hesitate to reach out if you have any questions.