Final Year Thesis Project (COMP4981H) for Computer Science Students in HKUST
This repo. is no longer maintained and updated (May 2020).
Full experimental results and reports were stored in HKUST lab servers and may not be accessible here.
Adversarial attacks
We randomly extract 15 samples from MNIST dataset & generate 15 adv samples by various adv attacks (include the original one on the leftmost). As we can see, expect the leftmost column, the rest 14 columns represent the digit could be misclassified by machine learning models (if not all black).
Layer signatures
moreNotations & Expressions
- LP_i: Layer Provenance of the i-th hidden layer
- y: ground-truth label, y' predicted label
- S: training set, P: provenance set
- S1: subset (first class) within training set, S2: subset (second class) within training set
- P1: subset (first class) within provenance set, P2: subset (second class) within provenance set
- TPR: True Positive Rate (A -> A)
- TNR: True Negative Rate (B -> B)
- FPR: False Positive Rate (B -> A)
- FNR: Flase Negative Rate (A -> B)
- h: Number of hidden layers (specifically for ReLU neural networks)
- adv_a: adversarial attack
- i_FGSM: Iterative Fast Gradient Sign Method, JSMA: Jacobian Saliency Map Attack, CWL2: CarliniWagner L2 Attack
- () indicate standard deviation.
Common rules
- For Table 1 to 3 and Experiment 1 to 4TPR, TNR, FPR, and FNR are examinated on 100 samples.
- For Table 1 to 3 and Experiment 1 to 4, the task is to classify 5 and 7 (subset of MNIST).
- For Table 1 to 3 and Experiment 1 to 4, if we use more than one LP, we will concatenate all LPs as one LP.
Experimental data collections (ReLU)
Table 1: TPR & TNR by LP_1 (adv_a=i_FGSM)
Table 2: TPR & TNR by LP_i combinations (adv_a=i_FGSM, h=4, y/y'=y)
Table 3: TPR & TNR by input augmentation (adv_a=i_FGSM, LPs=1, y/y'=y)
Experiments (ReLU)
Exp 1: Relationship between h and FPR & FNR (adv_a=i_FGSM, LPs=1, y/y'=y)
Exp 2: Relationship between |S| and FPR & FNR (adv_a=i_FGSM, LPs=1, y/y'=y)
Exp 3: Relationship between single LP_i and FPR & FNR (adv_a=i_FGSM, y/y'=y)
Exp 4: Relationship between LP_i combinations and FPR & FNR (adv_a=i_FGSM, y/y'=y)
Observations (ReLU)
- Position of layers can influence detection capability. As we can see, when LP is closer to the end, TP increases and TN decreases. One possible explanation is that when the LP is closer to the end, more samples (both for benign and adversarial samples) are likely to fall in the same provenance.
- Different type of layers also have different detection capability.
- We do not need to leverage all LPs. Single LP can achieve similar capability in terms of adversarial detection.
- If LP_i is matched, LP_i+1 is extremely likely to be matched.
- An adversarial sample does not belong to either the provenance set of the ground-truth label or the provenance set of the predicted label
- y' class, both benign & adversarial samples on 4 hidden layers ReLU → [A, B, B, B] or [A, A, B, B]
- y class, most then [B, B, B, B] or [A, A, A, A]
Experiments (CNN)
Exp 5: Potential Method 1 & Integrated LPs judgement (adv_attack=i_FGSM, y/y'=y', model=CNN)
Exp 6: Potential Method 2 & Integrated LPs judgement (adv_attack=i_FGSM, y/y'=y', model=CNN)
Exp 7: Potential Method 3 & Integrated LPs judgement (adv_attack=i_FGSM, y/y'=y', model=CNN)
Exp 8: Potential Method 4 & Integrated LPs judgement (adv_attack=i_FGSM, y/y'=y', model=CNN)
Observations (CNN)
- With simple dropout layer inserted, it will exactly enhance the difficulty to distinguish the propogation patterns between benign samples and adversarial samples (reverse against our anticipation).
- By setting differentiation line for each layer, we can already have the strong capability to identify benign samples and adversarial samples clearly (b->b and a->a are both > 0.9). However, JSMA and CWL2 attacks are not yet tested.
Experiments (PPRD)
Exp15: PPRD training process (to_int=False, train=all, test=all. model=CNN)
Exp16: PPRD training process (to_int=False, train=ENL1/CWL2/LINFPGD, test=all types, model=CNN)
Exp17: one-PPRD training process (to_int=False, train=all, test=all. model=CNN)
Appendix 1.2 Original Rule-based Method & 4 potential improvements
[1] Florian Tramer, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
[2] Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., and Madry, A. Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175, 2019.
[3] Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Schoenebeck, G., Houle, M. E., Song, D., and Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality.
[4] Mahloujifar, S., Zhang, X., Mahmoody, M., and Evans, D. Empirically measuring concentration: Fundamental limits on intrinsic robustness. Safe Machine Learning workshop at ICLR, 2019.
[5] Divya Gopinath, Hayes Converse, Corina S. Pasareanu, and Ankur Taly. Property Inference for Deep Neural Networks. ASE, 2019.
[6] Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. “NIC: Detecting Adversarial Samples with Neural Network Invariant Checking” in Proceedings of the 26th Network and Distributed System Security Symposium, 2019.
[7] Gavin Weiguang Ding, Luyu Wang, and Xiaomeng Jin. AdverTorch v0.1: An adversarial robustness toolbox based on
pytorch. arXiv preprint arXiv:1902.07623, 2019.