Skip to content

Internals

Hyrum Anderson edited this page May 3, 2021 · 11 revisions

Understanding how Counterfit interacts with a target and a backend framework is crucial to having success and repeatable assessments. This section covers the necessary information for objects, properties, and data flows.

Targets

A target is a user-created class that is the interface between a target model and the attacks included in a framework. Microsoft Counterfit implements the user interface this way to handle the wide variety of ways a user could interact with a target model. There is no universal interface to a machine learning endpoint: Counterfit can interact with a model on disk or hosted behind an API. To provide a clean interface, there are some strict requirements imposed on a target class. The requirements can seem restricting at first, but once you understand how Counterfit interacts with backend frameworks through this interface, attacking new targets will become a breeze.

Creating a Target

There are two ways to create a new target, either by executing the new command and following the instructions, or simply hacking together a new target from an existing one. On start targets are loaded from a targets folder defined in counterfit/core/config.py, which by default is counterfit/targets. Each target module is contained in its own target folder. This folder is used to store any resources the target may need such as data, models, other python files, etc.

A target class should inherit from the chosen frameworks baseclass. This baseclass acts as an interface to ensure all required information and methods to successfully build, manage, and run attacks for the framework are present. Inside the target class, a user needs to set a few meta-properties and compose two functions. These properties and functions are the same for all targets in Counterfit.

Required Target Information

The required properties for a target consist of the following items,

Property Description
model_name Should be unique among all targets. This is used to uniquely identify a target model within Counterfit. For example, list targets.
model_data_type The type of data the target model uses. This is used to attach the relevant attacks to a given target. Models for which model_data_type is text will be compatible with attacks from the TextAttack framework. Adversarial Robustness Toolbox works with numpy and image data types.
model_endpoint API route or model file location where Counterfit will collect outputs. This may be used in the __init__ function to load a model file (when referring to a filename), or the __call__ function during an attack to interact with an API (when referring to an API route).
model_input_shape The shape of the input to a target model. Backend frameworks use this to understand the shape of the sample. The __call__ function expects a batch of inputs of this shape, e.g., an input of size (batch_size, ) + model_input_shape, where batch_size is typically just 1.
model_output_classes A list of all output labels returned by the __call__ function. This is used by Counterfit's outputs_to_labels function to convert numerical outputs to labels that you define. It helps Counterift know whether an attack has been successflu or not.
X The sample data, which is of shape (N, ) + model_input_shape, where N is the number of samples you included in the target definition.

The required method for a target consists of the following items,

Method Description
__init__(self) This function should load models and load and process input data.
__call__(self, x) This function is the primary interface between a target model and an attack algorithm. An attack algorithm uses this function to submit inputs via x and collect the output via the return value. This function must return a list of probabilities for each input sample. That is, for each row in x there should be an output row of the form [prob_class_0, prob_class_1, ..., prob_class_2].

Sample Target Class

At the top of the file are standard module imports and the selected Counterfit framework. Targets will be loaded only if they inherit from a correct baseclass. There are no limits to what you can import, however, they are loaded on start, so heavy ML libraries could slow down the start process.

import requests
import numpy as np
from counterfit.core.interfaces import ArtTarget

Next, the target class is created, and the required properties are defined.

class CatClassifier(ArtTarget):
    model_name = 'catclassifier'
    model_data_type = 'image'
    model_endpoint = "http://contoso.ai/predict"
    model_input_shape = (3, 256, 256)
    model_output_classes = ["cat", "not_a_cat"]
    X = []
Property Description
model_name A unique name. Counterfit references each target by name. All logs and results will be stored in this class.
model_data_type The cat classifier target, hosted at http://contoso.ai/predict requires pictures of cats. The model_data_type reflects the input data should be an image. Counterfit uses this field to process and reshape data correctly.
model_endpoint This is where Counterfit will collect outputs from the target. This is used in the __call__ function during an attack.
model_input_shape This is the shape of our sample data. It is important to note that this is not necessarily the shape of the target model input, but rather the shape of the sample data.
model_output_classes Are the possible classes for the samples. The image is either a cat, or not a cat.
X Are the sample data. This will be populated in the init function.

Next, the __init__ function should load the required resources. This function is not called until you interact with a target. Sample data should be loaded into self.X as a list of lists, arrays, or vectors. Sample selection for both targeted and untargeted attacks are set by referencing an index of this list. As noted earlier, it is important that the shape of an input sample matches the model_input_shape. There are no limits to what you can do inside __init__, load models, process samples, execute functions written elsewhere, etc.

def __init__(self):
    self.X = [[x1], [x2], [x3], ...]

Finally, the __call__ function. This function is used by an attack algorithm to send a query to the model_endpoint. x is the sample the algorithm has provided and is of shape (1,) + model_input_shape, or ((1, 3, 256, 256)). Conventionally, ML frameworks use "batches" of inputs, and it is best practice for the __call__ function to include handle an entire batch, e.g., sending each sample in the batch to an API that may handle only a single query at a time. However, since most attacks in Counterfit do not require a batch size greater than one, in this example, we'll use x[0] to reference the first sample in the batch.

def __call__(self, x):
    sample = x[0].tolist()
    response = requests.post(self.endpoint, data={"input": sample})
    results = response.json()

    cat_proba = results["confidence"]
    not_a_cat_proba = 1-cat_proba
        
return [cat_proba, not_a_cat_proba]

__call__ function MUST return a list of probabilities that is the same length and in the same order as model_output_classes. Backend frameworks use both during attack runtime and if they are not the same length, an error will be thrown. If the ordering of the returned list of probabilities is incorrect, the attack will alter the input incorrectly. There are no limits to what you can do inside __call__, this includes reshaping arrays to images, executing webhooks, or additional logging. Learn more about the flexibility of Counterfit targets in [Advanced Use]

Note: Pay attention to the channels of an image. By default, backend framework wrappers and Counterfit are configured to use channels first rather than last, (3, 256, 256) vs (256, 256, 3). This can be overridden by adding self.channels_first=False to the target class.

The Final Target Class

import requests
import numpy as np
from counterfit.core.interfaces import ArtTarget

class CatClassifier(ArtTarget):
    model_name = 'catclassifier'
    model_data_type = 'image'
    model_endpoint = "http://contoso.ai/predict"
    model_input_shape = (3, 256, 256)
    model_output_classes = ["cat", "not_a_cat"]
    X = []

    def __init__(self):
        self.X = [[x1], [x2], [x3], ...]

    def __call__(self, x):
        sample = x[0].tolist()
        response = requests.post(self.endpoint, data={"input": sample})
        results = response.json()

        cat_proba = results["confidence"]
        not_a_cat_proba = 1-cat_proba
        
        return [cat_proba, not_a_cat_proba]

Frameworks

Counterfit uses existing adversarial ML frameworks for attack algorithms. Some of them are heavy to load and on start, makes for a slow experience. Instead, Counterfit loads a framework when requested with the load command. Each framework has its own baseclass that handles the information coming from a target class. You will notice not all attacks are in each framework – some are missing because they are Whitebox; others are missing due to incompatibility with Counterfit.

TextAttack

Counterfit includes a number of blackbox attacks against text models from the TextAttack framework. These attacks have no parameters and the user need only set a target_sample when running an attack. TextAttack requires that self.X be a list of sentences to be used as input to a model.

When implementing a target for Textattack, please note that TextAttack currently expects that model_output_classes to be a list of ordered integers beginning at 0 (e.g., [0, 1, 2]) rather than a list of labels (e.g., ['cat', 'dog', 'horse']). This is because it uses the class label as an index.

Adversarial Robustness Toolkit

Counterfit includes a number of blackbox evasion attacks suitable for targets of the 'numpy' or 'image' data type using the Adversarial Robustness Toolbox (ART) . ART expects self.X to be a list of lists, that is, each row in the list corresponds to an input sample, and each input sample is a list of numbers or images. Thus, it's typical that self.X is an array of dimensions (N, dim) (for numpy), (N, channels, height, width) for image with channels_first=True) (default), or (N, height, width, channels) for image with channels_first=False.

ART attacks have parameters that may be set to adjust how the algorithm interacts with a target model. Detailing the parameters for each algorithm is out of the scope of this document. Users may use show info to learn more about an attack algorithm and its parameters.

creditfraud>hop_skip_jump> show info --attack hop_skip_jump

Attack Information
-----------------------------------------------------------------------------------------------------------
              attack name  hop_skip_jump
              attack type  evasion
          attack category  blackbox
              attack tags  ['image', 'numpy']
         attack framework  art
              attack docs   Implementation of the HopSkipJump attack from Jianbo et al. (2019). This is a
                           powerful black-box attack that   only requires final class prediction, and is an
                           advanced version of the boundary attack. | Paper link:
                           https://arxiv.org/abs/1904.02144

Attack Parameter (type)      Default
---------------------------------------
          targeted (bool)  False
               norm (int)  2
           max_iter (int)  50
           max_eval (int)  10000
          init_eval (int)  100
          init_size (int)  100
       sample_index (int)  0
       target_class (int)  0

In this case, the help shows that more information about hop_skip_jump can be gleaned by reading the academic paper .

Note that sample_index and target_class are properties of all attacks. In particular target_class may only be used by some algorithms that support a targeted attack, and in cases where targeted is set to be True.

Commands

Commands in Counterfit provide the functionality that allow objects to interact. The commands are structured to provides a similar workflow to other offensive security tools, where you typically interact with one target at a time and execute actions against that target. Though, thanks to cmd2, the ability to script actions against multiple targets is there – to drop into a scripting environment run ipy from the terminal.

Counterfit keeps a state that keeps track of all objects available in the session. A command can access these objects by importing CFState from counterfit.core.state and accessing objects by querying the state via CFState.get_instance(). Commands use cmd2 for command categorization and argparse for argument handling. For example, the interact command.

import argparse
import cmd2

from core.state import CFState

parser = argparse.ArgumentParser()
parser.add_argument("target", choices=CFState.get_instance().loaded_targets.keys())

@cmd2.with_argparser(parser)
@cmd2.with_category("Counterfit Commands")
def do_interact(self, args):
    """Sets the active target."""

    CFState.get_instance().set_active_target(args.target)

Adding a New Command

Adding a new command is simple. Create a new file in the counterfit/core/commands/ folder. Set up the command structure,

import argparse
import cmd2

from core.state import CFState

parser = argparse.ArgumentParser()
parser.add_argument(…)

You could change the category or keep it the same. Changing the category will cause the command to display separately from Counterfit commands. Next, write the function and use the objects to provide information or change the state.

@cmd2.with_argparser(parser)
@cmd2.with_category("Custom Commands")
def do_thing(self, args):
    """Do things with active target."""

    active_target = CFState.get_instance().active_target
    print(active_target.model_name)

Quality of Life Commands

While attacking targets is fun, an attack comes after the target has been written by the user. Because this is something of a development process, there are some convenience commands that will make life a little easier when writing new targets.

Command Description
new This command will create a new target in the targets folder, and then load it into the session.
reload When editing a target, this command will reload the target to reflect the changes made.
predict Send a single query to the target model.
back Exit the active attack or active target.

For example, the target creation workflow is as follows, execute new to create a fresh target, open the new target python file in your favorite code editor, make changes to the code and execute reload. Use the predict command to ensure inputs and outputs are as expected.

Informational Commands

These commands gather and present relevant information about the current session, and relevant information about targets and attacks.

Command Description
list This command prints loaded objects in the session
show When editing a target, this command will reload the target to reflect the changes made.