Skip to content

Latest commit

 

History

History
321 lines (227 loc) · 29.6 KB

File metadata and controls

321 lines (227 loc) · 29.6 KB

Adversarial Robustness Toolbox notebooks

Expectation over Transformation (EoT)

expectation_over_transformation_classification_rotation.ipynb [on nbviewer] show how to use Expectation over Transformation (EoT) sampling to make adversarial examples robust against rotation for image classification.

Video Action Recognition

adversarial_action_recognition.ipynb [on nbviewer] shows how to create an adversarial attack on a video action recognition classification task with ART. Experiments in this notebook show how to modify a video sample by employing a Fast Gradient Method attack so that the modified video sample get mis-classified.

Audio

adversarial_audio_examples.ipynb [on nbviewer] shows how to create adversarial examples of audio data with ART. Experiments in this notebook show how the waveform of a spoken digit of the AudioMNIST dataset can be modified with almost imperceptible changes so that the waveform gets mis-classified as different digit.

poisoning_attack_backdoor_audio.ipynb [on nbviewer] demonstrates the dirty-label backdoor attack on a TensorflowV2 estimator for speech classification.

Adversarial training

adversarial_retraining.ipynb [on nbviewer] shows how to load and evaluate the MNIST and CIFAR-10 models synthesized and adversarially trained by Sinn et al., 2019.

adversarial_training_mnist.ipynb [on nbviewer] demonstrates adversarial training of a neural network to harden the model against adversarial samples using the MNIST dataset.

TensorFlow v2

art-for-tensorflow-v2-callable.ipynb [on nbviewer] show how to use ART with TensorFlow v2 in eager execution mode with a model in form of a callable class or python function.

art-for-tensorflow-v2-keras.ipynb [on nbviewer] demonstrates ART with TensorFlow v2 using tensorflow.keras without eager execution.

Attacks

attack_feature_adversaries_pytorch.ipynb [on nbviewer] or attack_feature_adversaries_tensorflow_v2.ipynb [on nbviewer] show how to use ART to create feature adversaries (Sabour et al., 2016).

attack_adversarial_patch.ipynb [on nbviewer] shows how to use ART to create real-world adversarial patches that fool real-world object detection and classification models. attack_adversarial_patch_TensorFlowV2.ipynb [on nbviewer] TensorFlow v2 specific attack implementation. attack_adversarial_patch_pytorch_yolo.ipynb [on nbviewer] YOLO v3 and v5 specific attack.

attack_adversarial_patch_faster_rcnn.ipynb [on nbviewer] shows how to set up a TFv2 Faster R-CNN object detector with ART and create an adversarial patch attack that fools the detector.

attack_adversarial_patch_detr.ipynb [on nbviewer] shows how to set up the DEtection TRansformer (DETR) with ART for object detection and create an adversarial patch attack that fools the detector.

attack_decision_based_boundary.ipynb [on nbviewer] demonstrates Decision-Based Adversarial Attack (Boundary) attack. This is a black-box attack which only requires class predictions.

attack_decision_tree.ipynb [on nbviewer] shows how to compute adversarial examples on decision trees (Papernot et al., 2016). It traversing the structure of a decision tree classifier to create adversarial examples can be computed without explicit gradients.

attack_defence_imagenet.ipynb [on nbviewer] explains the basic workflow of using ART with defences and attacks on an neural network classifier for ImageNet.

attack_hopskipjump.ipynb [on nbviewer] demonstrates the HopSkipJumpAttack. This is a black-box attack that only requires class predictions. It is an advanced version of the Boundary attack.

attack_membership_inference.ipynb [on nbviewer] demonstrates the MembershipInferenceBlackBoxRuleBased and MembershipInferenceBlackBox membership inference attacks on a classifier model with only black-box access.

attack_membership_inference_regressor.ipynb [on nbviewer] demonstrates the MembershipInferenceBlackBox membership inference attack on a regressor model with only black-box access.

attack_attribute_inference.ipynb [on nbviewer] demonstrates the AttributeInferenceBlackBox, AttributeInferenceWhiteBoxLifestyleDecisionTree and AttributeInferenceWhiteBoxDecisionTree attribute inference attacks on a classifier model.

attack_attribute_inference_regressor.ipynb [on nbviewer] demonstrates the AttributeInferenceBlackBox attribute inference attacks on a regressor model.

attack_membership_inference_shadow_models.ipynb [on nbviewer] demonstrates a MembershipInferenceBlackBox membership inference attack using shadow models on a classifier.

label_only_membership_inference.ipynb [on nbviewer] demonstrates a LabelOnlyDecisionBoundary membership inference attack on a PyTorch classifier for the MNIST dataset.

composite-adversarial-attack.ipynb[on nbviewer] shows how to launch Composite Adversarial Attack (CAA) on Pytorch-based model (Hsiung et al., 2023). CAA composites the perturbations in Lp-ball and semantic space (i.e., hue, saturation, rotation, brightness, and contrast), and is able to optimize the attack sequence and each attack component, thereby enhancing the efficiency and efficacy of adversarial examples.

Metrics

privacy_metric.ipynb [on nbviewer] demonstrates how to apply both the PDTP and the SHAPr privacy metrics to random forest and decision tree classifiers trained on the nursery dataset.

Classifiers

classifier_blackbox.ipynb [on nbviewer] demonstrates BlackBoxClassifier, the most general and versatile classifier of ART requiring only a single predict function definition without any additional assumptions or requirements. The notebook shows how use BlackBoxClassifier to attack a remote, deployed model (in this case on IBM Watson Machine Learning, https://cloud.ibm.com) using the HopSkiJump attack.

classifier_blackbox_lookup_table.ipynb [on nbviewer] demonstrates using BlackBoxClassifier when the adversary does not have access to the model for making predictions, but does have a set of existing predictions produced before losing access. The notebook shows how to use BlackBoxClassifier to attack a model using only a table of samples and their labels, using a membership inference black-box attack.

classifier_blackbox_tesseract.ipynb [on nbviewer] demonstrates a black-box attack on Tesseract OCR. It uses BlackBoxClassifier and HopSkipJump attack to change the image of one word into the image of another word and shows how to apply pre-processing defences.

classifier_catboost.ipynb [on nbviewer] shows how to use ART with CatBoost models. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_gpy_gaussian_process.ipynb [on nbviewer] shows how to create adversarial examples for the Gaussian Process classifier of GPy. It crafts adversarial examples with the HighConfidenceLowUncertainty (HCLU) attack (Grosse et al., 2018), specifically targeting Gaussian Process classifiers, and compares it to Projected Gradient Descent (PGD) (Madry et al., 2017).

classifier_lightgbm.ipynb [on nbviewer] shows how to use ART with LightGBM models. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_AdaBoostClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn AdaBoostClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_BaggingClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn BaggingClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_DecisionTreeClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn DecisionTreeClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_ExtraTreesClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn ExtraTreesClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_GradientBoostingClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn GradientBoostingClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_LogisticRegression.ipynb [on nbviewer] shows how to use ART with Scikit-learn LogisticRegression. It demonstrates and analyzes Projected Gradient Descent attacks using the MNIST dataset.

classifier_scikitlearn_pipeline_pca_cv_svc.ipynb [on nbviewer] contains an example of generating adversarial examples using a black-box attack against a scikit-learn pipeline consisting of principal component analysis (PCA), cross validation (CV) and a support vector machine classifier (SVC), but any other valid pipeline would work too. The pipeline is optimised using grid search with cross validation. The adversarial samples are created with black-box HopSkipJump attack. The training data is MNIST, because of its intuitive visualisation, but any other dataset including tabular data would be suitable too.

classifier_scikitlearn_RandomForestClassifier.ipynb [on nbviewer] shows how to use ART with Scikit-learn RandomForestClassifier. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

classifier_scikitlearn_SVC_LinearSVC.ipynb [on nbviewer] shows how to use ART with Scikit-learn SVC and LinearSVC support vector machines. It demonstrates and analyzes Projected Gradient Descent attacks using the Iris and MNIST dataset for binary and multiclass classification for linear and radial-basis-function kernels.

classifier_xgboost.ipynb [on nbviewer] shows how to use ART with XGBoost models. It demonstrates and analyzes Zeroth Order Optimisation attacks using the Iris and MNIST datasets.

Detectors

detection_adversarial_samples_cifar10.ipynb [on nbviewer] demonstrates the detection of adversarial examples using ART. The classifier model is a neural network of a ResNet architecture in Keras for the CIFAR-10 dataset.

Model stealing / model theft / model extraction

model-stealing-demo.ipynb [on nbviewer] demonstrates model stealing attacks and a reverse sigmoid defense against them.

Poisoning

poisoning_attack_svm.ipynb [on nbviewer] demonstrates a poisoning attack on a Support Vector Machine.

hidden_trigger_backdoor/poisoning_attack_hidden_trigger_pytorch.ipynb [on nbviewer] demonstrates the Hidden Trigger Backdoor attack on a PyTorch estimator.

hidden_trigger_backdoor/poisoning_attack_hidden_trigger_keras.ipynb [on nbviewer] demonstrates the Hidden Trigger Backdoor attack on a Keras estimator.

hidden_trigger_backdoor/poisoning_attack_hidden_trigger_tf.ipynb [on nbviewer] demonstrates the Hidden Trigger Backdoor attack on a TensorflowV2 estimator.

poisoning_defense_activation_clustering.ipynb [on nbviewer] demonstrates the generation and detection of backdoors in neural networks via Activation Clustering.

poisoning_defense_deep_partition_aggregation.ipynb [on nbviewer] demonstrates a defense against poisoning attacks via partitioning the data into disjoint subsets and training an ensemble model.

poisoning_defense_dp_instahide.ipynb [on nbviewer] demonstrates a defense against poisoning attacks using the DP-InstaHide training method which uses data augmentation and additive noise.

poisoning_defense_neural_cleanse.ipynb [on nbviewer] demonstrates a defense against poisoning attacks that generation the suspected backdoor and applies runtime mitigation methods on the classifier.

poisoning_defence_strip.ipynb [on nbviewer] demonstrates a defense against input-agnostic backdoor attacks that filters suspicious inputs at runtime.

poisoning_attack_witches_brew.ipynb [on nbviewer] demonstrates the gradient matching poisoning attack (a.k.a. Witches' Brew) that adds noise to align the training gradient to a specific direction that can poison the target model.

poisoning_attack_feature_collision.ipynb [on nbviewer] demonstrates working Poison Frog (Feature Collision) poisoning attack implemented in Keras Framework on CIFAR10 dataset as per the (paper). This is a targeted clean label attack, which do not require the attacker to have any control over the labeling of training data and control the behavior of the classifier on a specific test instance without degrading overall classifier performance.

poisoning_attack_feature_collision-pytorch.ipynb [on nbviewer] demonstrates working Poison Frog (Feature Collision) poisoning attack implemented in PyTorch Framework on CIFAR10 dataset as per the (paper). This is a targeted clean label attack, which do not require the attacker to have any control over the labeling of training data and control the behavior of the classifier on a specific test instance without degrading overall classifier performance.

poisoning_attack_sleeper_agent_pytorch.ipynb [on nbviewer] demonstrates working Sleeper Agent poisoning attack implemented in PyTorch Framework on CIFAR10 dataset as per the (paper). A new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch.

poisoning_attack_bad_det.ipynb [on nbviewer] demonstrates using the BadDet poisoning attacks to insert backdoors and create poisoned samples for object detector models. This is a dirty label attack where a trigger is inserted into a bounding box and the classification labels are changed accordingly.

Certification and Verification

output_randomized_smoothing_mnist.ipynb [on nbviewer] shows how to achieve certified adversarial robustness for neural networks via Randomized Smoothing.

robustness_verification_clique_method_tree_ensembles_gradient_boosted_decision_trees_classifiers.ipynb [on nbviewer] demonstrates the verification of adversarial robustness in decision tree ensemble classifiers (Gradient Boosted Decision Trees, Random Forests, etc.) using XGBoost, LightGBM and Scikit-learn.

certification_deepz.ipynb [on nbviewer] demonstrates using DeepZ to compute certified robustness for neural networks.

Certified Training

certified_adversarial_training.ipynb [on nbviewer] Demonstrates training a neural network for certified robustness using bound propagation techniques.

certification_interval_domain.ipynb[on nbviewer] demonstrates using interval bound propagation for certification of neural network robustness.

smoothed_vision_transformers.ipynb [on nbviewer] Demonstrates training a neural network using smoothed vision transformers for certified performance against patch attacks.

MNIST

fabric_for_deep_learning_adversarial_samples_fashion_mnist.ipynb [on nbviewer] shows how to use ART with deep learning models trained with the Fabric for Deep Learning (FfDL).

Hugging Face

huggingface_notebook.ipynb [on nbviewer] shows how to use ART with the Hugging Face API for image classification tasks.

hugging_face_evasion.ipynb [on nbviewer] shows how to use ART to perform evasion attacks on Hugging Face image classification models and defend them using adversarial training.

hugging_face_poisoning.ipynb [on nbviewer] shows how to use ART to perform poison Hugging Face image classification models and defend them using poisoning defenses.