Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Deep Partition Aggregation #1397

Merged
merged 54 commits into from
Dec 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
7747a88
Deep Partition Aggregation files
keykholt Nov 8, 2021
0309b76
Renamed notebook
keykholt Nov 8, 2021
8b733c9
Deep Partition Aggregation files
keykholt Nov 8, 2021
5b527e0
Whitespace fixes
keykholt Nov 8, 2021
32c9d13
Signature fixes
keykholt Nov 8, 2021
bade115
Style fixes
keykholt Nov 9, 2021
c48c55c
Fixing some weird rewind issue that reverted to old version
keykholt Nov 9, 2021
72ef90a
style fixes
keykholt Nov 9, 2021
a96b4a6
Style fixes and exceptions
keykholt Nov 9, 2021
b9bc8a1
Style fixes and exceptions
keykholt Nov 9, 2021
3509011
Style fixes and exceptions
keykholt Nov 9, 2021
6664be5
Fixing data as suggested
keykholt Nov 9, 2021
14670fd
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
d500720
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
711b445
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
78c631c
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
294fc6f
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
ae381a0
Update art/estimators/classification/deep_partition_ensemble.py
keykholt Nov 9, 2021
f64932d
Fixing data as suggested
keykholt Nov 9, 2021
6196a96
Merge branch 'DPA' of https://github.com/keykholt/adversarial-robustn…
keykholt Nov 9, 2021
caf7a8c
Update README.md
keykholt Nov 9, 2021
b878531
Fixing import
keykholt Nov 10, 2021
3a76a2d
Merge branch 'DPA' of https://github.com/keykholt/adversarial-robustn…
keykholt Nov 10, 2021
874752e
Updated ensemble.py and changed hash function init
keykholt Nov 10, 2021
dca9f59
Fixing import
keykholt Nov 10, 2021
d775c22
Merge remote-tracking branch 'upstream/dev_1.9.0' into DPA
keykholt Nov 11, 2021
859d316
pylint warning disabled
keykholt Nov 11, 2021
704e707
line length
keykholt Nov 11, 2021
5ac7d88
Fixes according to Flake
keykholt Nov 17, 2021
4131d6b
test fix
keykholt Nov 17, 2021
75ec7c2
line length
keykholt Nov 17, 2021
07c3611
Merge branch 'dev_1.9.0' into DPA
beat-buesser Nov 19, 2021
dc675aa
remove model call
keykholt Nov 22, 2021
32f3f97
Merge branch 'DPA' of https://github.com/keykholt/adversarial-robustn…
keykholt Nov 22, 2021
d9495fa
test fixes
keykholt Nov 22, 2021
a3d7bd2
line length
keykholt Nov 22, 2021
05cf550
Changes due to cloning bug
keykholt Nov 24, 2021
0661989
Import missing
keykholt Nov 24, 2021
00372a7
style fixes
keykholt Nov 24, 2021
eb7b5f3
style fixes
keykholt Nov 24, 2021
07beb2b
style fixes
keykholt Nov 24, 2021
ad93bb9
Update deep_partition_ensemble.py
keykholt Nov 29, 2021
a4a0742
Update deep_partition_ensemble.py
keykholt Nov 29, 2021
5478b5a
Update deep_partition_ensemble.py
keykholt Nov 30, 2021
15c1434
Update deep_partition_ensemble.py
keykholt Nov 30, 2021
e102c30
Merge branch 'Trusted-AI:main' into DPA
keykholt Nov 30, 2021
02839ac
line length
Nov 30, 2021
f6a402c
line length
keykholt Dec 1, 2021
1419f24
model var
keykholt Dec 2, 2021
3bfa485
typo
keykholt Dec 3, 2021
bb91fc8
Merge branch 'Trusted-AI:main' into DPA
keykholt Dec 7, 2021
57bd009
merge conflict
keykholt Dec 7, 2021
846116f
Version update
keykholt Dec 7, 2021
6acda54
Merge branch 'dev_1.9.0' into DPA
beat-buesser Dec 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions art/estimators/classification/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from art.estimators.classification.blackbox import BlackBoxClassifier, BlackBoxClassifierNeuralNetwork
from art.estimators.classification.catboost import CatBoostARTClassifier
from art.estimators.classification.deep_partition_ensemble import DeepPartitionEnsemble
from art.estimators.classification.detector_classifier import DetectorClassifier
from art.estimators.classification.ensemble import EnsembleClassifier
from art.estimators.classification.GPy import GPyGaussianProcessClassifier
Expand Down
197 changes: 197 additions & 0 deletions art/estimators/classification/deep_partition_ensemble.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# MIT License
#
# Copyright (C) The Adversarial Robustness Toolbox (ART) Authors 2021
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
# WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""
Creates a Deep Partition Aggregation ensemble classifier.
"""
from __future__ import absolute_import, division, print_function, unicode_literals

import logging
import warnings
from typing import List, Optional, Union, Callable, Dict, TYPE_CHECKING

import copy
import numpy as np

from art.estimators.classification.ensemble import EnsembleClassifier

if TYPE_CHECKING:
from art.utils import CLIP_VALUES_TYPE, PREPROCESSING_TYPE, CLASSIFIER_NEURALNETWORK_TYPE
from art.defences.preprocessor import Preprocessor
from art.defences.postprocessor import Postprocessor

logger = logging.getLogger(__name__)


class DeepPartitionEnsemble(EnsembleClassifier):
"""
Implementation of Deep Partition Aggregation Defense. Training data is partitioned into
disjoint buckets based on a hash function and a classifier is trained on each bucket.

| Paper link: https://arxiv.org/abs/2006.14768
"""

estimator_params = EnsembleClassifier.estimator_params + [
"hash_function",
"ensemble_size",
]

def __init__(
self,
classifiers: Union["CLASSIFIER_NEURALNETWORK_TYPE", List["CLASSIFIER_NEURALNETWORK_TYPE"]],
hash_function: Optional[Callable] = None,
ensemble_size: int = 50,
channels_first: bool = False,
clip_values: Optional["CLIP_VALUES_TYPE"] = None,
preprocessing_defences: Union["Preprocessor", List["Preprocessor"], None] = None,
postprocessing_defences: Union["Postprocessor", List["Postprocessor"], None] = None,
preprocessing: "PREPROCESSING_TYPE" = (0.0, 1.0),
) -> None:
"""
:param classifiers: The base model definition to use for defining the ensemble.
If a list, the list must be the same size as the ensemble size.
:param hash_function: The function used to partition the training data. If empty, the hash function
will use the sum of the input values modulo the ensemble size for partitioning.
:param ensemble_size: The number of models in the ensemble.
:param channels_first: Set channels first or last.
:param clip_values: Tuple of the form `(min, max)` of floats or `np.ndarray` representing the minimum and
maximum values allowed for features. If floats are provided, these will be used as the range of all
features. If arrays are provided, each value will be considered the bound for a feature, thus
the shape of clip values needs to match the total number of features.
:param preprocessing_defences: Preprocessing defence(s) to be applied by the classifier. Not applicable
in this classifier.
:param postprocessing_defences: Postprocessing defence(s) to be applied by the classifier.
:param preprocessing: Tuple of the form `(subtrahend, divisor)` of floats or `np.ndarray` of values to be
used for data preprocessing. The first value will be subtracted from the input. The input will then
be divided by the second one. Not applicable in this classifier.
"""
self.can_fit = False # self.fit() cannot be used with models loaded from disk
if not isinstance(classifiers, list):
warnings.warn(
"If a single classifier is passed, it should not have been loaded \
from disk due to cloning errors with models loaded from disk. If you are \
using pre-trained model(s), create a list of Estimator objects the same \
length as the ensemble size"
)
self.can_fit = True

if hasattr(classifiers, "clone_for_refitting"):
# Initialize the ensemble based on the provided architecture
# Use ART's cloning if possible
try:
classifiers = [classifiers.clone_for_refitting() for _ in range(ensemble_size)] # type: ignore
except ValueError as error:
warnings.warn("Switching to deepcopy due to ART Cloning Error: " + str(error))
classifiers = [copy.deepcopy(classifiers) for _ in range(ensemble_size)] # type: ignore
else:
classifiers = [copy.deepcopy(classifiers) for _ in range(ensemble_size)]
elif isinstance(classifiers, list) and len(classifiers) != ensemble_size:
raise ValueError("The length of the classifier list must be the same as the ensemble size")

super().__init__(
classifiers=classifiers,
clip_values=clip_values,
channels_first=channels_first,
preprocessing_defences=preprocessing_defences,
postprocessing_defences=postprocessing_defences,
preprocessing=preprocessing,
)

if hash_function is None:

def default_hash(x):
return int(np.sum(x)) % ensemble_size

self.hash_function = default_hash
else:
self.hash_function = hash_function

self.ensemble_size = ensemble_size

def predict( # pylint: disable=W0221
self, x: np.ndarray, batch_size: int = 128, raw: bool = False, max_aggregate: bool = True, **kwargs
) -> np.ndarray:
"""
Perform prediction for a batch of inputs. Aggregation will be performed on the prediction from
each classifier if max_aggregate is True. Otherwise, the probabilities will be summed instead.
For logits output set max_aggregate=True, as logits are not comparable between models and should
not be aggregated using a sum.

:param x: Input samples.
:param batch_size: Size of batches.
:param raw: Return the individual classifier raw outputs (not aggregated).
:param max_aggregate: Aggregate the predicted classes of each classifier if True. If false, aggregation
is done using a sum. If raw is true, this arg is ignored
:return: Array of predictions of shape `(nb_inputs, nb_classes)`, or of shape
`(nb_classifiers, nb_inputs, nb_classes)` if `raw=True`.
"""

if raw:
return super().predict(x, batch_size=batch_size, raw=True, **kwargs)

# Aggregate based on top-1 prediction from each classifier
if max_aggregate:
preds = super().predict(x, batch_size=batch_size, raw=True, **kwargs)
aggregated_preds = np.zeros_like(preds, shape=preds.shape[1:]) # pylint: disable=E1123
for i in range(preds.shape[0]):
aggregated_preds[np.arange(len(aggregated_preds)), np.argmax(preds[i], axis=1)] += 1
return aggregated_preds

# Aggregate based on summing predictions from each classifier
return super().predict(x, batch_size=batch_size, raw=False, **kwargs)

def fit( # pylint: disable=W0221
self,
x: np.ndarray,
y: np.ndarray,
batch_size: int = 128,
nb_epochs: int = 20,
train_dict: Dict = None,
**kwargs
) -> None:
"""
Fit the classifier on the training set `(x, y)`. Each classifier will be trained with the
same parameters unless train_dict is provided. If train_dict is provided, the model id's
specified will use the training parameters in train_dict instead.

:param x: Training data.
:param y: Target values (class labels) one-hot-encoded of shape (nb_samples, nb_classes).
:param batch_size: Size of batches.
:param nb_epochs: Number of epochs to use for training.
:param train_dict: A dictionary of training args if certain models need specialized arguments.
The key should be the model's partition id and this will override any default training
parameters including batch_size and nb_epochs.
:param kwargs: Dictionary of framework-specific arguments.
"""
if self.can_fit:
# First, partition the data using the hash function
partition_ind = [[] for _ in range(self.ensemble_size)] # type: List[List[int]]
for i, p_x in enumerate(x):
partition_id = int(self.hash_function(p_x))
partition_ind[partition_id].append(i)

# Then, train each model on its assigned partition
for i in range(self.ensemble_size):
current_x = x[np.array(partition_ind[i])]
current_y = y[np.array(partition_ind[i])]

if train_dict is not None and i in train_dict.keys():
self.classifiers[i].fit(current_x, current_y, **train_dict[i])
else:
self.classifiers[i].fit(current_x, current_y, batch_size=batch_size, nb_epochs=nb_epochs, **kwargs)
else:
warnings.warn("Cannot call fit() for an ensemble of pre-trained classifiers.")
4 changes: 2 additions & 2 deletions art/estimators/classification/ensemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
from art.estimators.estimator import NeuralNetworkMixin

if TYPE_CHECKING:
from art.utils import CLIP_VALUES_TYPE, PREPROCESSING_TYPE
from art.utils import CLIP_VALUES_TYPE, PREPROCESSING_TYPE, CLASSIFIER_NEURALNETWORK_TYPE
from art.data_generators import DataGenerator
from art.defences.preprocessor import Preprocessor
from art.defences.postprocessor import Postprocessor
Expand All @@ -50,7 +50,7 @@ class EnsembleClassifier(ClassifierNeuralNetwork):

def __init__(
self,
classifiers: List[ClassifierNeuralNetwork],
classifiers: List["CLASSIFIER_NEURALNETWORK_TYPE"],
classifier_weights: Union[list, np.ndarray, None] = None,
channels_first: bool = False,
clip_values: Optional["CLIP_VALUES_TYPE"] = None,
Expand Down
2 changes: 1 addition & 1 deletion examples/inverse_gan_author_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ def __len__(self):
return len(self.images)

def load(self, split, lazy=True, randomize=True):
""" Abstract function specific to each dataset."""
"""Abstract function specific to each dataset."""
pass


Expand Down
3 changes: 3 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,9 @@ demonstrates the generation and detection of backdoors in neural networks via Ac
<img src="../utils/data/images/poisoning.png?raw=true" width="200" title="poisoning">
</p>

[poisoning_defense_deep_partition_aggregation.ipynb](poisoning_defense_deep_partition_aggregation.ipynb) [[on nbviewer](https://nbviewer.jupyter.org/github/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/poisoning_defense_deep_partition_aggregation.ipynb)]
demonstrates a defense against poisoning attacks via partitioning the data into disjoint subsets and training an ensemble model.

[poisoning_defense_neural_cleanse.ipynb](poisoning_defense_neural_cleanse.ipynb) [[on nbviewer](https://nbviewer.jupyter.org/github/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/poisoning_defense_neural_cleanse.ipynb)]
demonstrates a defense against poisoning attacks that generation the suspected backdoor and applies runtime mitigation methods on the classifier.

Expand Down
Loading