# Backdoor Unlearning

## Outline

1. Experimental setup (generating configs)
2. Clean model training
3. Poisoned model training
4. First-order unlearning
5. Second-order unlearning
6. Visualizing results


## Experimental Setup

- All configurations to test are defined in the `[train|poison|unlearn].json` files (see below).
- If parameters are passed as list, all their combinations are tested in a grid-search manner.
- Only a single combination is provided for this demo. The original combinations are in `Applications/Poisoning/configs`
- The function generates directories and configuration files for each combination. They are later used by an evaluation script to run the experiment. This allows for parallelization and distributed execution.

In [8]:
import sys
sys.path.append('../')




In [9]:
# only if you are using CUDA devices
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"


In [10]:
from conf import BASE_DIR
from Applications.Poisoning.gen_configs import main as gen_configs

model_folder = BASE_DIR/'models'/'poisoning'
train_conf = BASE_DIR/'Applications'/'Poisoning'/'configs'/'demo'/'train.json'
poison_conf = BASE_DIR/'Applications'/'Poisoning'/'configs'/'demo'/'poison.json'
unlearn_conf = BASE_DIR/'Applications'/'Poisoning'/'configs'/'demo'/'unlearn.json'

gen_configs(model_folder, train_conf, poison_conf, unlearn_conf)

In [11]:
from Applications.Poisoning.poison.poison_models import train_poisoned
from Applications.Poisoning.configs.demo.config import Config

poisoned_folder = model_folder/'budget-10000'/'seed-42'
clean_folder = model_folder/'clean'
first_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'first-order'
second_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'second-order'


poison_kwargs = Config.from_json(poisoned_folder/'poison_config.json')
train_kwargs = Config.from_json(poisoned_folder/'train_config.json')


## Clean Model Training

- Train a clean model for reference.

## Train Poisoned Model

- Select one of the generated configurations and train a poisoned model.
- The poisoning uses an `injector` object which can be persisted for reproducibility. It will inject the backdoors/label noise into the same samples according to a seed. In our experiments, we worked with label noise poisoning.

In [12]:
from Applications.Poisoning.poison.poison_models import train_poisoned
from Applications.Poisoning.configs.demo.config import Config

poisoned_folder = model_folder/'budget-10000'/'seed-42'
clean_folder = model_folder/'clean'
first_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'first-order'
second_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'second-order'


poison_kwargs = Config.from_json(poisoned_folder/'poison_config.json')
train_kwargs = Config.from_json(poisoned_folder/'train_config.json')



In [13]:
poisoned_weights = poisoned_folder/'best_model.hdf5'       # model that has been trained on poisoned data
fo_repaired_weights = poisoned_folder/'fo_repaired.hdf5'   # model weights after unlearning (first-order)
so_repaired_weights = poisoned_folder/'so_repaired.hdf5'   # model weights after unlearning (second-order)
injector_path = poisoned_folder/'injector.pkl'             # cached injector for reproducibility
clean_results = model_folder/'clean'/'train_results.json'  # path to reference results on clean dataset


## Unlearning

- Perform the first-order and second-order unlearning. The unlearning is wrapped in a function that
    - loads the clean data, saves the original labels
    - injects the poison (label noise)
    - creates difference set Z using `injector.injected_idx`
    - main unlearning happens in `Applications.Poisoning.unlearn.common.py:unlearn_update` and the thereby called `iter_approx_retraining` method
- The variable naming follows the following ideas:
    - `z_x`, `z_y`: features (x) and labels (y) in set `Z`
    - `z_x_delta`, `z_y_delta`: changed features and labels (`z_x == z_x_delta` here and `z_y_delta` contains the original (fixed) labels)
- A word about why iterative:
    - The approximate retraining is configured to unlearn the desired changes in one step.
    - To avoid putting a lot of redundant erroneous samples in the changing set `Z`, the iterative version
        - takes a sub-sample (`prio_idx`) of `hvp_batch_size` in the delta set `Z`
        - makes one unlearning step
        - recalculates the delta set and focuses only on remaining errors
    - The idea here is that similar to learning, it is better to work iteratively in batches since the approximation quality of the inverse hessian matrix decreases with the number of samples included (and the step size)

In [14]:
from Applications.Poisoning.unlearn.first_order import run_experiment as fo_experiment
from Applications.Poisoning.unlearn.second_order import run_experiment as so_experiment

fo_unlearn_kwargs = Config.from_json(poisoned_folder/'first-order'/'unlearn_config.json')
so_unlearn_kwargs = Config.from_json(poisoned_folder/'second-order'/'unlearn_config.json')


In [15]:
from Applications.Poisoning.train import main as train
from Applications.Poisoning.evaluate import evaluate

# train one clean and one poisoned model
# datasets = ['Cifar10', 'Cifar100', 'SVHN', 'FashionMnist']
datasets = ['Cifar10', 'Cifar100', 'SVHN']
modelnames = ['VGG16']
# modelnames = ['VGG16', 'RESNET50']

In [None]:
import json
import os

results = {
    'clean': {},
    'poisoned': {},
    'first_order_unlearning': {},
    'second_order_unlearning': {}
}

update_targets = ['feature_extractor', 'classifier']

for dataset in datasets:
    results['clean'][dataset] = {}
    results['poisoned'][dataset] = {}
    results['first_order_unlearning'][dataset] = {}
    results['second_order_unlearning'][dataset] = {}
   
    print('#' * 60)
    print(f" UNLEARNING ")
    print('#' * 60)
    print('\n\n')

    for modelname in modelnames:
        for update_target in update_targets:
            print(f"* Evaluating {modelname} on {dataset} poisoned model *")
            poisoned_accuracy = evaluate(model_folder=poisoned_folder, dataset=dataset, modelname=modelname, type='poisoned')
            results['poisoned'][dataset][modelname] = poisoned_accuracy
            
            print(f"* First-order unlearning {modelname} on {dataset} poisoned model *")
            fo_experiment(poisoned_folder/'first-order', train_kwargs, poison_kwargs, fo_unlearn_kwargs, dataset=dataset, modelname=modelname, update_target='feature_extractor')
            print(f"* Evaluating {modelname} on {dataset} after first-order unlearning *")
            fo_repaired_accuracy = evaluate(model_folder=first_unlearn_folder, dataset=dataset, modelname=modelname, type='repaired')
            results['first_order_unlearning'][dataset][modelname] = fo_repaired_accuracy
        

            print(f"* Second-order unlearning {modelname} on {dataset} poisoned model *")
            so_experiment(poisoned_folder/'second-order', train_kwargs, poison_kwargs, so_unlearn_kwargs, dataset=dataset, modelname=modelname, update_target='feature_extractor')
            print(f"* Evaluating {modelname} on {dataset} after second-order unlearning *")
            so_repaired_accuracy = evaluate(model_folder=second_unlearn_folder, dataset=dataset, modelname=modelname, type='repaired')
            results['second_order_unlearning'][dataset][modelname] = so_repaired_accuracy
        

In [17]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import torch
import torch.nn.functional as F
from sklearn.metrics import confusion_matrix, accuracy_score
from torch.utils.data import DataLoader
import tensorflow as tf
from tensorflow.keras.models import load_model

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def calculate_confidence(model, x_data):
    y_pred = model.predict(x_data)
    probs = tf.nn.softmax(y_pred, axis=1).numpy()
    max_probs = np.max(probs, axis=1)
    return max_probs


def calculate_confusion_matrix(model, x_data, y_true):
    y_pred = model.predict(x_data)
    y_pred_classes = np.argmax(y_pred, axis=1)
    y_true_classes = np.argmax(y_true, axis=1) if len(y_true.shape) > 1 else y_true
    cm = confusion_matrix(y_true_classes, y_pred_classes)

    accuracy = np.trace(cm) / np.sum(cm)
    if cm.shape == (2, 2):
        tn, fp, fn, tp = cm.ravel()
    else:
        tn, fp, fn, tp = 0, 0, 0, 0
        for i in range(cm.shape[0]):
            for j in range(cm.shape[1]):
                if i == j:
                    if i == 1:
                        tp = cm[i, j]
                    else:
                        tn += cm[i, j]
                else:
                    if i == 1:
                        fn += cm[i, j]
                    else:
                        fp += cm[i, j]
    return tn, fp, fn, tp

def calculate_forget_score(tn_before, fp_before, fn_before, tp_before, tn_after, fp_after, fn_after, tp_after):
    delta = 0.01
    tpr_before = tp_before / (tp_before + fn_before + delta)
    fpr_before = fp_before / (fp_before + tn_before + delta)
    tpr_after = tp_after / (tp_after + fn_after + delta)
    fpr_after = fp_after / (fp_after + tn_after + delta)

    epsilon = np.nanmax([
        np.log(1 - delta - fpr_after) - np.log(tpr_after),
        np.log(1 - delta - fn_after) - np.log(tpr_after),
        np.log(1 - delta - fpr_before) - np.log(tpr_before),
        np.log(1 - delta - fn_before) - np.log(tpr_before)
    ])

    return epsilon


def plot_confusion_matrix(y_true, y_pred, title):
    if len(y_true.shape) > 1:
        y_true = np.argmax(y_true, axis=1)
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(title)
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()


def evaluate_model_accuracy(model, x_test, y_test):
    y_pred = model.predict(x_test)
    y_pred_classes = np.argmax(y_pred, axis=1)
    y_true_classes = np.argmax(y_test, axis=1) if len(y_test.shape) > 1 else y_test
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    return accuracy

def load_and_evaluate_models(datasets, models, clean_folder, unlearn_folder):
    results = {}
    
    for dataset_name, dataset in datasets.items():
        results[dataset_name] = {}
        
        # Load the dataset
        (x_train, y_train), (x_test, y_test), (x_valid, y_valid) = dataset.load()
        
        for model_name in models[dataset_name]:
            model_fn = models[dataset_name][model_name]
            results[dataset_name][model_name] = {}
            print(f"Evaluating {model_name} on {dataset_name}")
            
            try:
                # Load the clean model
                model_clean = model_fn()
                model_clean.load_weights(clean_folder / f'{model_name}_best_model.hdf5')

                # Load the unlearned model
                model_unlearned = model_fn()
                model_unlearned.load_weights(unlearn_folder / f'{model_name}_repaired_model.hdf5')
            except:
                print(f"Error loading models for {model_name} on {dataset_name}")
                continue
            # Evaluate the accuracy of the models
            accuracy_clean = evaluate_model_accuracy(model_clean, x_test, y_test)
            accuracy_unlearned = evaluate_model_accuracy(model_unlearned, x_test, y_test)
            print(f"Accuracy of the clean model: {accuracy_clean:.4f}")
            print(f"Accuracy of the unlearned model: {accuracy_unlearned:.4f}")

            # Compute confusion matrix for clean model
            y_pred_clean = model_clean.predict(x_test).argmax(axis=1)
            #plot_confusion_matrix(y_test, y_pred_clean, f'{model_name} Clean Model Confusion Matrix')

            # Compute confusion matrix for unlearned model
            y_pred_unlearned = model_unlearned.predict(x_test).argmax(axis=1)
            #plot_confusion_matrix(y_test, y_pred_unlearned, f'{model_name} Unlearned Model Confusion Matrix')


            # Compute confidence and confusion matrix for clean model
            clean_confidences = calculate_confidence(model_clean, x_test)
            tn_clean, fp_clean, fn_clean, tp_clean = calculate_confusion_matrix(model_clean, x_test, y_test)

            # Compute confidence and confusion matrix for unlearned model
            unlearning_confidences = calculate_confidence(model_unlearned, x_test)
            tn_unlearned, fp_unlearned, fn_unlearned, tp_unlearned = calculate_confusion_matrix(model_unlearned, x_test, y_test)

            # Calculate forget score
            forget_score = calculate_forget_score(tn_clean, fp_clean, fn_clean, tp_clean, tn_unlearned, fp_unlearned, fn_unlearned, tp_unlearned)
            print(f"Forget Score for {model_name} on {dataset_name}: {forget_score:.4f}")

            results[dataset_name][model_name] = {
                'clean_accuracy': accuracy_clean,
                'unlearned_accuracy': accuracy_unlearned,
                'forget_score': forget_score
            }

    return results




  return torch._C._cuda_getDeviceCount() > 0


In [None]:
import sys
sys.path.append('../')
# import TensorDataset
from torch.utils.data import DataLoader, TensorDataset
import os
from conf import BASE_DIR
from Applications.Poisoning.gen_configs import main as gen_configs
from Applications.Poisoning.model import extractfeatures_VGG16, classifier_VGG16, extractfeatures_RESNET50, classifier_RESNET50, get_VGG16_CIFAR100, get_VGG16_CIFAR10, get_VGG16_SVHN, get_RESNET50_CIFAR100, get_RESNET50_CIFAR10, get_RESNET50_SVHN, extractfeatures_RESNET50_CIFAR100, extractfeatures_VGG16_CIFAR100, classifier_RESNET50_CIFAR100, classifier_VGG16_CIFAR100
from Applications.Poisoning.dataset import Cifar10, SVHN, FashionMnist, Cifar100


model_folder = BASE_DIR/'models'/'poisoning'

datasets = {
    'Cifar10': Cifar10,
    'SVHN': SVHN,
    'Cifar100': Cifar100
}

models = {
    'Cifar10': {
        'Cifar10_VGG16': get_VGG16_CIFAR10,
        'Cifar10_RESNET50': get_RESNET50_CIFAR10,
    },
    'SVHN': {
        'SVHN_VGG16': get_VGG16_SVHN,
        'SVHN_RESNET50': get_RESNET50_SVHN,
    },
    'Cifar100': {
        'Cifar100_VGG16': get_VGG16_CIFAR100,
        'Cifar100_RESNET50': get_RESNET50_CIFAR100,
    }
}


clean_folder = model_folder/'clean'
poisoned_folder = model_folder/'budget-10000'/'seed-42'
first_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'first-order'
second_unlearn_folder = model_folder/'budget-10000'/'seed-42'/'second-order'

# Run evaluation with load_and_evaluate_models
results = load_and_evaluate_models(datasets, models, clean_folder, first_unlearn_folder)

# Print final results
for dataset_name, dataset_results in results.items():
    for model_name, model_results in dataset_results.items():
        print(f"{dataset_name} - {model_name}:")
        print(f"  Clean Accuracy: {model_results['clean_accuracy']:.4f}")
        print(f"  Unlearned Accuracy: {model_results['unlearned_accuracy']:.4f}")
        print(f"  Forget Score: {model_results['forget_score']:.4f}")

In [None]:
# import json
# import os

# results = {
#     'clean': {},
#     'poisoned': {},
#     'first_order_unlearning': {},
#     'second_order_unlearning': {}
# }

# update_targets = ['feature_extractor', 'classifier']
# for dataset in datasets:
#     for modelname in modelnames:
#         for update_target in update_targets:
#             print(f"* First-order unlearning {modelname} on {dataset} poisoned model *")
#             fo_experiment(poisoned_folder/'first-order', train_kwargs, poison_kwargs, fo_unlearn_kwargs, dataset=dataset, modelname=modelname, update_target=update_target)
#             print(f" * Second-order unlearning {modelname} on {dataset} poisoned model *")  
#             so_experiment(poisoned_folder/'second-order', train_kwargs, poison_kwargs, so_unlearn_kwargs, dataset=dataset, modelname=modelname, update_target=update_target)


In [20]:
from Applications.Poisoning.model import get_VGG16_CIFAR10
from tensorflow.keras.models import clone_model

clean_model = get_VGG16_CIFAR10()
clean_model.load_weights(clean_folder/'Cifar10_VGG16_best_model.hdf5')

fo_model = clone_model(clean_model)
so_model = clone_model(clean_model)
clean_folder = BASE_DIR/'models'/'poisoning'/'clean'


# fo_experiment(poisoned_folder/'first-order', train_kwargs, poison_kwargs, fo_unlearn_kwargs, dataset="Cifar10", modelname='VGG16')

fo_model.summary()

Loading weights from None
Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d_35 (Conv2D)          (None, 32, 32, 64)        1792      
                                                                 
 batch_normalization_45 (Bat  (None, 32, 32, 64)       256       
 chNormalization)                                                
                                                                 
 leaky_re_lu_35 (LeakyReLU)  (None, 32, 32, 64)        0         
                                                                 
 conv2d_36 (Conv2D)          (None, 32, 32, 64)        36928     
                                                                 
 batch_normalization_46 (Bat  (None, 32, 32, 64)       256       
 chNormalization)                

In [22]:
from tensorflow.keras.models import Model

fo_fe = Model(inputs=fo_model.input, outputs=fo_model.get_layer('dropout_27').output)

# classifier_input = fo_model.get_layer('flatten_2').output
# fc1 = fo_model.get_layer('dense_24')(classifier_input)
# fc2 = fo_model.get_layer('batch_normalization_79')(fc1)
classifier_input = fo_model.get_layer('flatten_5').output
f4 = fo_model.get_layer('dense_15')(classifier_input)
f5 = fo_model.get_layer('batch_normalization_52')(f4)
f6 = fo_model.get_layer('re_lu_10')(f5)
f7 = fo_model.get_layer('dropout_28')(f6)
f8 = fo_model.get_layer('dense_16')(f7)
f9 = fo_model.get_layer('batch_normalization_53')(f8)
f10 = fo_model.get_layer('re_lu_11')(f9)
f11 = fo_model.get_layer('dropout_29')(f10)
predictions = fo_model.get_layer('dense_2')(f11)

# Create a new model with only the classifier part
fo_cl = Model(inputs=classifier_input, outputs=predictions)

combined_model_fo = Model(inputs=fo_fe.input, outputs=predictions)

combined_model_fo.summary()

ValueError: No such layer: dense_2. Existing layers are [<keras.engine.input_layer.InputLayer object at 0x748c256a1c50>, <keras.layers.convolutional.Conv2D object at 0x748c256a1f50>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256ab790>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256abdd0>, <keras.layers.convolutional.Conv2D object at 0x748c256abe90>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256abe50>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256aec10>, <keras.layers.pooling.MaxPooling2D object at 0x748c256aeed0>, <keras.layers.core.dropout.Dropout object at 0x748c256b1250>, <keras.layers.convolutional.Conv2D object at 0x748c256ae550>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256b19d0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256b1fd0>, <keras.layers.convolutional.Conv2D object at 0x748c256b8350>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256b8850>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256b8ed0>, <keras.layers.pooling.MaxPooling2D object at 0x748c256bb1d0>, <keras.layers.core.dropout.Dropout object at 0x748c256bb510>, <keras.layers.convolutional.Conv2D object at 0x748c256bb5d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256bbcd0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256bb690>, <keras.layers.convolutional.Conv2D object at 0x748c251c2650>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251c2b50>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c251c2450>, <keras.layers.convolutional.Conv2D object at 0x748c251cd4d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251cd9d0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c251cdfd0>, <keras.layers.pooling.MaxPooling2D object at 0x748c251d1350>, <keras.layers.core.dropout.Dropout object at 0x748c25693b50>, <keras.layers.core.flatten.Flatten object at 0x748c251d1810>, <keras.layers.core.dense.Dense object at 0x748c251d19d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251d1fd0>, <keras.layers.advanced_activations.ReLU object at 0x748c251d8510>, <keras.layers.core.dropout.Dropout object at 0x748c251d85d0>, <keras.layers.core.dense.Dense object at 0x748c256ab450>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251d8e10>, <keras.layers.advanced_activations.ReLU object at 0x748c251d8590>, <keras.layers.core.dropout.Dropout object at 0x748c251df390>, <keras.layers.core.dense.Dense object at 0x748c251df350>].

In [23]:
# fo_fe.summary()
# fo_cl.summary()
so_fe = Model(inputs=fo_model.input, outputs=fo_model.get_layer('dropout_2').output)

# classifier_input = fo_model.get_layer('flatten_2').output
# fc1 = fo_model.get_layer('dense_24')(classifier_input)
# fc2 = fo_model.get_layer('batch_normalization_79')(fc1)
classifier_input = so_model.get_layer('flatten_5').output
f4 = so_model.get_layer('dense_15')(classifier_input)
f5 = so_model.get_layer('batch_normalization_52')(f4)
f6 = so_model.get_layer('re_lu_10')(f5)
f7 = so_model.get_layer('dropout_28')(f6)
f8 = so_model.get_layer('dense_16')(f7)
f9 = so_model.get_layer('batch_normalization_53')(f8)
f10 = so_model.get_layer('re_lu_11')(f9)
f11 = so_model.get_layer('dropout_29')(f10)
predictions = so_model.get_layer('dense_2')(f11)

combined_model_so = Model(inputs=so_fe.input, outputs=predictions)

# combined_model_so.summary()



ValueError: No such layer: dropout_2. Existing layers are [<keras.engine.input_layer.InputLayer object at 0x748c256a1c50>, <keras.layers.convolutional.Conv2D object at 0x748c256a1f50>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256ab790>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256abdd0>, <keras.layers.convolutional.Conv2D object at 0x748c256abe90>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256abe50>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256aec10>, <keras.layers.pooling.MaxPooling2D object at 0x748c256aeed0>, <keras.layers.core.dropout.Dropout object at 0x748c256b1250>, <keras.layers.convolutional.Conv2D object at 0x748c256ae550>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256b19d0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256b1fd0>, <keras.layers.convolutional.Conv2D object at 0x748c256b8350>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256b8850>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256b8ed0>, <keras.layers.pooling.MaxPooling2D object at 0x748c256bb1d0>, <keras.layers.core.dropout.Dropout object at 0x748c256bb510>, <keras.layers.convolutional.Conv2D object at 0x748c256bb5d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c256bbcd0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c256bb690>, <keras.layers.convolutional.Conv2D object at 0x748c251c2650>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251c2b50>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c251c2450>, <keras.layers.convolutional.Conv2D object at 0x748c251cd4d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251cd9d0>, <keras.layers.advanced_activations.LeakyReLU object at 0x748c251cdfd0>, <keras.layers.pooling.MaxPooling2D object at 0x748c251d1350>, <keras.layers.core.dropout.Dropout object at 0x748c25693b50>, <keras.layers.core.flatten.Flatten object at 0x748c251d1810>, <keras.layers.core.dense.Dense object at 0x748c251d19d0>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251d1fd0>, <keras.layers.advanced_activations.ReLU object at 0x748c251d8510>, <keras.layers.core.dropout.Dropout object at 0x748c251d85d0>, <keras.layers.core.dense.Dense object at 0x748c256ab450>, <keras.layers.normalization.batch_normalization.BatchNormalization object at 0x748c251d8e10>, <keras.layers.advanced_activations.ReLU object at 0x748c251d8590>, <keras.layers.core.dropout.Dropout object at 0x748c251df390>, <keras.layers.core.dense.Dense object at 0x748c251df350>].

In [28]:
import os
from os.path import dirname as parent
from Applications.Poisoning.poison.injector import LabelflipInjector
from Applications.Poisoning.dataset import Cifar10, Mnist, SVHN, Cifar100
from util import UnlearningResult, reduce_dataset
from Applications.Poisoning.unlearn.common import evaluate_unlearning, get_delta_idx, batch_pred, unlearn_update, evaluate_model_diff
import json
from util import LoggedGradientTape, ModelTmpState, CSVLogger, measure_time, GradientLoggingContext


# modeltype = extractfeatures_VGG16
def first_unlearning(model=None):
    data = Cifar10.load()
    model = fo_model
    (x_train, y_train), _, _ = data
    y_train_orig = y_train.copy()

    injector_path = os.path.join(model_folder, 'injector.pkl')
    if os.path.exists(injector_path):
        injector = LabelflipInjector.from_pickle(injector_path)
    else:
        injector = LabelflipInjector(parent(model_folder), **poison_kwargs)
        x_train, y_train = injector.inject(x_train, y_train)
        data = ((x_train, y_train), data[1], data[2])

    # poisoned_folder/'first-order'
    # prepare unlearning data
    reduction = 1.0
    (x_train,  y_train), _, _ = data
    x_train, y_train, idx_reduced, delta_idx = reduce_dataset( x_train, y_train, reduction=reduction, delta_idx=injector.injected_idx)
    print(f">> reduction={reduction}, new train size: {x_train.shape[0]}")
    y_train_orig = y_train_orig[idx_reduced]
    delta_idx = injector.injected_idx
    data = ((x_train, y_train), data[1], data[2])

    # Get clean accuracy
    with open(model_folder/'clean'/'Cifar10_extractfeatures_VGG16_train_results.json', 'r') as f:
        clean_acc = json.load(f)['accuracy']

    from tensorflow.keras.backend import clear_session  
    from tensorflow.keras.utils import to_categorical

    clear_session()

    (x_train, y_train), (x_test, y_test), (x_valid, y_valid) = data
    params = np.sum(np.product([xi for xi in x.shape]) for x in model.trainable_variables).item()
    print(f'Nb params : {params}')

    new_theta, diverged, logs, duration_s = unlearn_update(
        x_train, y_train, y_train_orig, delta_idx, model, x_valid, y_valid, fo_unlearn_kwargs, verbose=0, cm_dir=None, log_dir=None)

    new_model = model
    new_model.set_weights(new_theta)
    new_model.save_weights('cifar10_vgg16_repaired.hdf5')

    acc_before, acc_after, diverged = evaluate_model_diff(
        model, new_model, x_valid, y_valid, diverged, 0, clean_acc)
    print(f'Accuracy before unlearning: {acc_before:.2f}')
    print(f'Accuracy after unlearning: {acc_after:.2f}')
    return new_model

from Applications.Poisoning.unlearn.core import get_gradients_diff, get_inv_hvp_lissa

def approx_retraining(model, z_x, z_y, z_x_delta, z_y_delta, order=2, hvp_x=None, hvp_y=None):
    if order == 1:
        tau = fo_unlearn_kwargs.get('tau', 1)

        # first order update
        diff = get_gradients_diff(model, z_x, z_y, z_x_delta, z_y_delta)
        d_theta = diff
        diverged = False
    elif order == 2:
        tau = 1  # tau not used by second-order

        # second order update
        diff = get_gradients_diff(model, z_x, z_y, z_x_delta, z_y_delta)
        # skip hvp if diff == 0
        if np.sum(np.sum(d) for d in diff) == 0:
            d_theta = diff
            diverged = False
        # elif conjugate_gradients:
        #     raise NotImplementedError('Conjugate Gradients is not implemented yet!')
        else:
            assert hvp_x is not None and hvp_y is not None
            d_theta, diverged = get_inv_hvp_lissa(model, hvp_x, hvp_y, diff, verbose=0, hvp_logger=None, **fo_unlearn_kwargs)
        


In [30]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import categorical_crossentropy

fo_model.compile(optimizer=Adam(learning_rate=0.001, amsgrad=True),loss=categorical_crossentropy, metrics='accuracy')

fo_unlearner = first_unlearning(model=fo_model)
print(f'Unlearning with combined model')
combined_model_fo_unlearner = first_unlearning(model=combined_model_fo)



>> reduction=1.0, new train size: 50000
Nb params : 35357770




: 