# Generating Adversarial Domain Names with HotFlip
This notebook contains example code for generating fully valid adversarial e2LDs using the modified HotFlip adversarial attack.

The original HotFlip attack was introduced in the paper:

Javid Ebrahimi et al. “HotFlip: White-Box Adversarial Examples for Text
Classification”. In: Proceedings of the 56th Annual Meeting of the Association
for Computational Linguistics (Volume 2: Short Papers). Melbourne, Australia:
Association for Computational Linguistics, July 2018, pp. 31–36. doi: 10.18653/v1/P18-2006. url: https://aclanthology.org/P18-2006.

Note that we did not fully implement all proposed ideas of the original HotFlip paper (we e.g., did not implement encoding insertion and deletion as a single operation)

The concepts are explained step by step, allowing you to apply these attacks to your own models!

In [1]:
import torch
import os
from robust_dga_detection.models import CNNResNetWithEmbedding
from robust_dga_detection.utils import domains, reproduceability

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
reproduceability.setup_deterministic_environment()

## Step 1: Load a model to attack

In [2]:
model = CNNResNetWithEmbedding().to(DEVICE)

In [3]:
model.load_state_dict(
    torch.load(os.getenv("TRAINED_MODEL_PATH"))
)

<All keys matched successfully>

## Step 2: Setup the HotFlip Attack

Our implementation of the HotFlip attack internally uses a One-Hot representation of domain names for more convinient gradient computation.
We use the `OneHotModelInputWrapper` to facilitate a model that can accept such inputs.

In [4]:
from robust_dga_detection.attacks.nlp import OneHotModelInputWrapper, HotFlip

onehot_model = OneHotModelInputWrapper(model)

In [5]:
beam_width = 10
n_flips = 5

attack = HotFlip(onehot_model, beam_width, n_flips)

## Step 3: Generate adversarial domain names

In [6]:
input_domain = "lmlabssssssssentasdasdasdasdasdasdasd"

Measure the models prediction on the input domain

In [8]:
encoded_domains = torch.unsqueeze(domains.encode_domain(input_domain), dim=0).to(DEVICE)
labels = torch.Tensor([1]).long().to(DEVICE)
print(f"Baseline model value: {torch.sigmoid(model(encoded_domains)).item()}")

Baseline model value: 0.9897634983062744


Use HotFlip to generate an adversarial domain and measure how good the generated AE is

In [10]:
adversarial_encoded_domains = attack([input_domain], encoded_domains, labels)
generated_domain = domains.decode_domains(adversarial_encoded_domains)[0]

print(f"Generated adversarial domain {generated_domain}")
print(f"Model-Value on adversarial domain {torch.sigmoid(model(adversarial_encoded_domains)).item()}")

Generated adversarial domain lmlabssssssssentasdasda--a--a-dasdasd
Model-Value on adversarial domain 1.8483806152325144e-10


We hope this notebook provided you with the details required for testing your models against our attacks.
Nevertheless, do not hesitate to reach out if you have any questions.