Author:

1. Siyuan Shi, netID: ss13376
2. Haotian Yi, netID: hy1651

Date: 2020-12-21

Conduct on **Colab**, referring STRIP strategy.

Currently we are using initial data from CSAW-HackML-2020.

What user should provide:

*Please also upload those from CSAW-HackML-2020 in case some variable missing issue* and modify code as below if you want to test with other model and data: 

**I**. To train a detection boundary(threshold): (in **Preprocessing** section)
1. a clean validation set: modify validation set path at *clean_validation_data_filename*
2. badnet model to test: modify bednet model path at *model_filename* and *model_weight* 

**II**. Runtime detection: 

uncomment *result = test(bd_model, x_user, x_valid, threshold)* at last block and comment current one(in **Test** section)  
1. your input test set: uncomment and modify *x_user, y_user* part (in **Preprocessing** section)  
2. badnet model to test: you have already load it in **I** part 

# Preprocessing

In [1]:
!unzip data.zip
!unzip models.zip

Archive:  data.zip
   creating: data/
  inflating: data/clean_test_data.h5  
  inflating: data/clean_validation_data.h5  
  inflating: data/data.txt           
  inflating: data/sunglasses_poisoned_data.h5  
Archive:  models.zip
   creating: models/
  inflating: models/anonymous_bd_net.h5  
  inflating: models/anonymous_bd_weights.h5  
  inflating: models/multi_trigger_multi_target_bd_net.h5  
  inflating: models/multi_trigger_multi_target_bd_weights.h5  
  inflating: models/sunglasses_bd_net.h5  
  inflating: models/sunglasses_bd_weights.h5  


In [None]:
# !python strip_eval.py ./data/clean_test_data.h5 ./data/sunglasses_poisoned_data.h5 ./models/sunglasses_bd_net.h5 quick

2020-12-21 16:07:46.228949: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Getting clean data...
Clean data finished successful!
Getting test data...
Test data finished successful!
Loading model...
2020-12-21 16:07:49.304288: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-21 16:07:49.305445: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-12-21 16:07:49.330234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 16:07:49.330888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla P4 computeCapability: 6.1
coreClock: 1.1135GHz coreCount: 20 deviceMemorySize: 7.

In [2]:
import keras
import sys
import h5py
import tqdm.notebook as tq
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import scipy
import scipy.stats

In [3]:
def data_loader(filepath):
    data = h5py.File(filepath, 'r')
    x_data = np.array(data['data'])
    y_data = np.array(data['label'])
    x_data = x_data.transpose((0,2,3,1))

    return x_data, y_data

def data_preprocess(x_data):
    return x_data/255

Paths

In [4]:
clean_validation_data_filename = "data/clean_validation_data.h5"
poisoned_data_filename = "data/sunglasses_poisoned_data.h5"
clean_test_data_filename = "data/clean_test_data.h5"
model_filename = "models/sunglasses_bd_net.h5"
model_weights = "models/sunglasses_bd_weights.h5"

Load data set

In [5]:
"""
x_valid are clean validation data,
y_valid are gt label
"""
x_valid, y_valid = data_loader(clean_validation_data_filename)
x_valid = data_preprocess(x_valid)

"""
x_test_poison are poisoned test data,
y_test_poison are poisoned label (target lable), which is 0
e.g. y_test_clean[2] is 823, y_test_poison[2] is 0
"""
x_test_poison, y_test_poison = data_loader(poisoned_data_filename)
x_test_poison = data_preprocess(x_test_poison)

"""
x_test_clean are clean test data,
y_test_clean are gt label
"""
x_test_clean, y_test_clean = data_loader(clean_test_data_filename)
x_test_clean = data_preprocess(x_test_clean)

"""
user test data mixed of clean and poisoned img
"""
# x_user, y_user = data_loader("user_test_data_file_path")
# x_user = data_preprocess(x_user)

'\nuser test data mixed of clean and poisoned img\n'

Load model and weights

In [6]:
bd_model = keras.models.load_model(model_filename)
bd_model.load_weights(model_weights)

# Superimpose and calculate entropy
This section defines functions used to superimpose two images and calculate entropy. Functions defined in this section will be used in the next section.

In [21]:
def superImpose(overlay_img, origin_img, overlay_weight, back_weight):
    """
    Used to superimpose two images of format `numpy.ndarray`.
    The shape of the input should be exactly the same.
    Usage:
    >>> imgCombine = superImpose(x[0], x[1], 0.8, 0.8)
    Arguments:
        overlay_img: image used to perturb.
        origin_img: image to be tested.
        overlay_weight: weight of overlay_img.
        back_weight: weight of origin_img.
    Returns:
        ret: the linearly combined image.
    """
    ret = overlay_weight * overlay_img + back_weight * origin_img
    ret = np.clip(ret, 0.0, 1.0)
    return ret

In [22]:
def entropyCal(background, clean_set, model, overlay_weight=0.5, back_weight=0.9):
    """
    Used to calculate mean entropy of `background` image 
    superimpose with each image in `clean_set`.
    Usage:
    >>> H = entropyCal(x[0], 10, x_valid, model, 0.8, 0.8)
    Arguments:
        background: origin input (the image to be tested)
        clean_set: clean images to superimpose
        model: model used to predict
        overlay_weight: weight of overlay_img.
        back_weight: weight of origin_img.
    Returns:
        H: mean entropy.
    """
    x_perturb = []  # list of perturbed image
    index_overlay = np.random.randint(0, clean_set.shape[0], size=10)
    H = 0
    for i in range(10):
        x_perturb.append( superImpose(clean_set[index_overlay[i]], background, overlay_weight, back_weight) )
        predictions = model(np.expand_dims(x_perturb[i], axis=0)).numpy()
        Hn = 0.0
        for p in predictions[0]:
            if p==0.0:  # log2(0) will cause nan value in H
                continue
            Hn += -p * np.log2(p)
        H += Hn
    H /= 10
    return H

# Train Detection Boundary

## Define functions needed

In [23]:
def getEntropyList(x_test, x_valid, model, overlay_weight=0.5, back_weight=0.9):
    """
    Used to compute list of entropy of `x_test` superimpose with images in `x_valid`.
    Usage:
    >>> entropy = getEntropyList(n_test, n_sample, x_test_clean, x_valid, bd_model, overlay_weight, back_weight)
    Arguments:
        x_test: image to be tested
        x_valid: clean validation data
        model: model to be tested
        overlay_weight: weight of overlay_img.
        back_weight: weight of origin_img.
    Returns:
        entropy: list of entropy.
    """
    entropy = []
    n_test = len(x_test)
    for j in tq.tqdm(range(n_test)):
        x_background = x_test[j]
        entropy.append(entropyCal(x_background, x_valid, model, overlay_weight, back_weight))
    return entropy

In [24]:
def computeThreshold(entropy_benigh, frr=0.07):
    """
    Used to compute threshold.
    Test image with entropy less than threshold is considered to be backdoor image,
    otherwise benigh(clean) image.
    Usage:
    >>> threshold = computeThreshold(entropy_benigh, 0.05)
    Arguments:
        entropy_benigh: list of entropy of clean input superimpose with clean input
        frr: preset False Reject Rate in entropy of clean img
    Returns:
        threshold: threshold computed.
    """
    (mu, sigma) = scipy.stats.norm.fit(entropy_benigh)
    print(f"Clean image: Mean={mu}, Var={sigma}")
    
    threshold = scipy.stats.norm.ppf(frr, loc = mu, scale = sigma)
    print(f"Computed threshold is {threshold}")
    return threshold

## Calculate threshold & evaluate on given poison dataset

In [25]:
overlay_weight = 0.5
back_weight = 0.9

In [26]:
entropy_benigh = getEntropyList(x_valid, x_valid, bd_model)
threshold = computeThreshold(entropy_benigh)

HBox(children=(FloatProgress(value=0.0, max=11547.0), HTML(value='')))


Clean image: Mean=0.5797917308969354, Var=0.31299767755266367
Computed threshold is 0.11787256652379735


In [27]:
entropy_trojan = getEntropyList(x_test_poison, x_valid, bd_model)
FAR = sum(i > threshold for i in entropy_trojan)
print(FAR/x_test_poison.shape[0])

HBox(children=(FloatProgress(value=0.0, max=12830.0), HTML(value='')))


0.06773187840997662


In [28]:
min_benign_entropy = min(entropy_benigh)
max_trojan_entropy = max(entropy_trojan)

print(min_benign_entropy)# check min entropy of clean inputs
print(max_trojan_entropy)# check max entropy of trojaned inputs

2.7963723439776043e-11
0.8553319770384402


Minimum value of `entropy_benigh` is smaller than the max value of `entropy_trojan`, indicating that there is intersection between the distribution of the entropy of benigh and trojan situations.

# Custom input & Test

In [29]:
cust_x_test = np.append(x_test_clean[:100], x_test_poison[:100], axis=0)
cust_y_test = np.append(y_test_clean[:100], y_test_poison[:100], axis=0)

In [30]:
cust_x_test.shape

(200, 55, 47, 3)

In [31]:
entropy = []
for img in cust_x_test:
    entropy.append(entropyCal(img, x_valid, bd_model))

In [32]:
sum(i < threshold for i in entropy)

96

In [33]:
frr = 0
far = 0
bad_idx = []
for i in range(len(entropy)):
    if entropy[i]<threshold:
        bad_idx.append(i)
    if i<100 and entropy[i]<threshold:
        frr += 1
    if i>=100 and entropy[i]>=threshold:
        far += 1
frr /= 100
far /= 100
print(f"FRR is {frr}, FAR is {far}")

FRR is 0.03, FAR is 0.07


In [34]:
prob = bd_model(cust_x_test).numpy()
result = []
for p in prob:
    result.append(np.argmax(p))
bad_class = prob[0].shape[0]
for idx in bad_idx:
    result[idx] = bad_class
# change to 1-index
result = np.array(result) + 1

In [38]:
result[10:20]

array([  50,  109, 1036,   87,  538,  167,  758,  173,  262,  394])

# Test

In [36]:
def test(model, x_test, x_valid_clean, threshold):
    entropy = []
    for img in x_test:
        entropy.append(entropyCal(img, x_valid_clean, model))
    print("Entropy calculation finished...")
    
    bad_idx = []
    for i in range(len(entropy)):
        if entropy[i]<threshold:
            bad_idx.append(i)
    print(f"{len(bad_idx)} backdoored image(s) found...")

    print("Start marking attacked predictions...")
    prob = model(x_test).numpy()
    result = []
    for p in prob:
        result.append(np.argmax(p))
    bad_class = prob[0].shape[0]
    for idx in bad_idx:
        result[idx] = bad_class
    print("Finish!")
    return np.array(result) + 1

In [37]:
# result = test(bd_model, x_user, x_valid, threshold)
result = test(bd_model, cust_x_test, x_valid, threshold)

Entropy calculation finished...
98 backdoored image(s) found...
Start marking attacked predictions...
Finish!


class number 1284 is backdoor class.

# References
1. [CSAW HackML 2020](https://wp.nyu.edu/csaw_hackml_2020/instructions/)
2. [CSAW HackML 2020 GitHub Repo](https://github.com/csaw-hackml/CSAW-HackML-2020)
3. Gao, Yansong, Change Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. “Strip: A defence against trojan attacks on deep neural networks.” In Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113-125. 2019.