# Breaching privacy

This notebook does the same job as the cmd-line tool `simulate_breach.py`, but also directly visualizes the user data and reconstruction

In [1]:
import torch
import hydra
from omegaconf import OmegaConf
%load_ext autoreload
%autoreload 2

import breaching
import logging, sys
logging.basicConfig(level=logging.INFO, handlers=[logging.StreamHandler(sys.stdout)], format='%(message)s')
logger = logging.getLogger()

### Initialize cfg object and system setup:

This will print out all configuration options. 
There are a lot of possible configurations, but there is usually no need to worry about most of these. Below, a few options are printed.

Choose `case/data=` `shakespeare`, `wikitext`over `stackoverflow` here:

In [2]:
with hydra.initialize(config_path="config"):
    cfg = hydra.compose(config_name='cfg', overrides=["case/data=wikitext", "case/server=malicious-transformer",
                                                      "case.model=transformer3",
                                                      "attack=decepticon"])
    print(f'Investigating use case {cfg.case.name} with server type {cfg.case.server.name}.')
          
device = torch.device('cpu')
torch.backends.cudnn.benchmark = cfg.case.impl.benchmark
setup = dict(device=device, dtype=torch.float)
setup

Investigating use case single_imagenet with server type malicious_transformer_parameters.


{'device': device(type='cpu'), 'dtype': torch.float32}

### Modify config options here

You can use `.attribute` access to modify any of these configurations:

In [3]:
cfg.case.user.num_data_points = 8 # How many sentences?
cfg.case.user.user_idx = 1 # From which user?
cfg.case.data.shape = [32] # This is the sequence length

cfg.case.server.has_external_data = True
cfg.case.data.tokenizer = "gpt2"

cfg.case.server.param_modification.eps=1e-4

cfg.attack.impl.dtype="double"


# cfg.attack.token_strategy="embedding-norm"
# cfg.case.server.param_modification.v_length = 32

# cfg.case.server.param_modification.eps = 1e-16
# cfg.case.server.param_modification.imprint_sentence_position = 0
# cfg.case.server.param_modification.softmax_skew = 100000000
# cfg.case.server.param_modification.sequence_token_weight = 1

### Instantiate all parties

In [4]:
user, server, model, loss_fn = breaching.cases.construct_case(cfg.case, setup)
attacker = breaching.attacks.prepare_attack(server.model, server.loss, cfg.attack, setup)
breaching.utils.overview(server, user, attacker)

Reusing dataset wikitext (/home/jonas/data/wikitext/wikitext-103-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Reusing dataset wikitext (/home/jonas/data/wikitext/wikitext-103-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Model architecture transformer3 loaded with 10,800,433 parameters and 0 buffers.
Overall this is a data ratio of    2637:1 for target shape [8, 512] given that num_queries=1.
User (of type UserSingleStep) with settings:
    Number of data points: 8

    Threat model:
    User provides labels: False
    User provides buffers: False
    User provides number of data points: True

    Data:
    Dataset: wikitext
    user: 1
    
        
Server (of type MaliciousTransformerServer) with settings:
    Threat model: Malicious (Parameters)
    Number of planned queries: 1
    Has external/public data: True

    Model:
        model specification: transformer3
        model state: default
        

    Secrets: {}
    


### Simulate an attacked FL protocol

True user data is returned only for analysis

In [5]:
server_payload = server.distribute_payload()
shared_data, true_user_data = user.compute_local_updates(server_payload)

Found attention of shape torch.Size([288, 96]).
Computing feature distribution before the probe layer Linear(in_features=96, out_features=1536, bias=True) from external data.
Feature mean is 0.04341970011591911, feature std is 0.9403561353683472.
Computing user update in model mode: eval.


In [6]:
#user.print(true_user_data)

# Reconstruct user data

In [7]:
# default hyperparams:

attacker.cfg.sentence_algorithm = "dynamic-threshold"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "subtraction" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = None

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [364, 355, 366, 383, 292, 349, 512, 56] breached embeddings to each sentence.
METRICS: | Accuracy: 0.4272 | S-BLEU: 0.17 | FMSE: 2.3029e-01 | 
 G-BLEU: 0.21 | ROUGE1: 0.59| ROUGE2: 0.18 | ROUGE-L: 0.42| Token Acc: 99.37% | Label Acc: 99.37%


In [8]:
# decorrelation

attacker.cfg.sentence_algorithm = "dynamic-threshold"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = None

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [364, 355, 366, 383, 292, 349, 512, 56] breached embeddings to each sentence.
METRICS: | Accuracy: 0.4265 | S-BLEU: 0.16 | FMSE: 2.2698e-01 | 
 G-BLEU: 0.21 | ROUGE1: 0.59| ROUGE2: 0.18 | ROUGE-L: 0.42| Token Acc: 99.37% | Label Acc: 99.37%


In [9]:
# optimal hyperparams

attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "decorrelation"

# Implementation Details
print(attacker.setup["dtype"])

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

torch.float64
Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [289, 330, 300, 350, 381, 329, 350, 348] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5779 | S-BLEU: 0.27 | FMSE: 1.5397e-01 | 
 G-BLEU: 0.29 | ROUGE1: 0.64| ROUGE2: 0.30 | ROUGE-L: 0.54| Token Acc: 81.91% | Label Acc: 81.91%


In [10]:
# optimal hyperparams?

attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = None

# Implementation Details
print(attacker.setup["dtype"])

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

torch.float64
Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [350, 348, 329, 381, 300, 289, 330, 350] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5723 | S-BLEU: 0.27 | FMSE: 1.6074e-01 | 
 G-BLEU: 0.29 | ROUGE1: 0.66| ROUGE2: 0.30 | ROUGE-L: 0.55| Token Acc: 85.03% | Label Acc: 85.03%


In [11]:
attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "local"
attacker.cfg.backfill_removal = "decorrelation"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [348, 350, 289, 329, 381, 300, 330, 350] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5444 | S-BLEU: 0.23 | FMSE: 1.6852e-01 | 
 G-BLEU: 0.28 | ROUGE1: 0.70| ROUGE2: 0.27 | ROUGE-L: 0.52| Token Acc: 99.37% | Label Acc: 99.37%


In [12]:
attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "local"
attacker.cfg.backfill_removal = None

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [348, 289, 300, 329, 330, 381, 350, 350] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5608 | S-BLEU: 0.25 | FMSE: 1.6844e-01 | 
 G-BLEU: 0.30 | ROUGE1: 0.71| ROUGE2: 0.29 | ROUGE-L: 0.53| Token Acc: 99.37% | Label Acc: 99.37%


In [13]:
attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = None

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [350, 348, 350, 381, 300, 289, 330, 329] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5049 | S-BLEU: 0.20 | FMSE: 1.9429e-01 | 
 G-BLEU: 0.24 | ROUGE1: 0.63| ROUGE2: 0.22 | ROUGE-L: 0.47| Token Acc: 99.37% | Label Acc: 99.37%


In [14]:
attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "randn"
attacker.cfg.backfill_removal = "decorrelation"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [330, 329, 381, 289, 350, 300, 350, 348] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5564 | S-BLEU: 0.25 | FMSE: 1.6825e-01 | 
 G-BLEU: 0.29 | ROUGE1: 0.69| ROUGE2: 0.28 | ROUGE-L: 0.53| Token Acc: 99.37% | Label Acc: 99.37%


In [15]:
# no subtraction
attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "none" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "none"

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [350, 300, 348, 289, 350, 381, 330, 329] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5549 | S-BLEU: 0.25 | FMSE: 1.7739e-01 | 
 G-BLEU: 0.28 | ROUGE1: 0.66| ROUGE2: 0.27 | ROUGE-L: 0.52| Token Acc: 84.96% | Label Acc: 84.96%


In [16]:
# no backfilling

attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "randn"
attacker.cfg.backfill_removal = "decorrelation"

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [329, 300, 348, 381, 350, 289, 330, 350] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5627 | S-BLEU: 0.22 | FMSE: 1.7328e-01 | 
 G-BLEU: 0.24 | ROUGE1: 0.53| ROUGE2: 0.26 | ROUGE-L: 0.49| Token Acc: 65.77% | Label Acc: 65.77%


In [17]:
# kmeans positions-first

attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "subtraction" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "subtraction"

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [330, 350, 381, 329, 289, 348, 300, 350] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5518 | S-BLEU: 0.24 | FMSE: 1.7254e-01 | 
 G-BLEU: 0.28 | ROUGE1: 0.68| ROUGE2: 0.27 | ROUGE-L: 0.52| Token Acc: 99.37% | Label Acc: 99.37%


In [18]:
# no backfilling

attacker.cfg.sentence_algorithm = "k-means"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "subtraction" # alternative: decorrelation
attacker.cfg.backfilling = "randn"
attacker.cfg.backfill_removal = "subtraction"

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [330, 350, 350, 289, 381, 300, 348, 329] breached embeddings to each sentence.
METRICS: | Accuracy: 0.5549 | S-BLEU: 0.24 | FMSE: 1.9183e-01 | 
 G-BLEU: 0.28 | ROUGE1: 0.69| ROUGE2: 0.28 | ROUGE-L: 0.53| Token Acc: 99.37% | Label Acc: 99.37%


In [19]:
attacker.cfg.sentence_algorithm = "dynamic-threshold-median"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "none"

# Implementation Details
attacker.cfg.impl.dtype = "float"

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [367, 353, 365, 387, 295, 342, 512, 56] breached embeddings to each sentence.
METRICS: | Accuracy: 0.4814 | S-BLEU: 0.21 | FMSE: 2.0202e-01 | 
 G-BLEU: 0.24 | ROUGE1: 0.60| ROUGE2: 0.23 | ROUGE-L: 0.46| Token Acc: 82.15% | Label Acc: 82.15%


In [22]:
# optimal hyperparams

attacker.cfg.sentence_algorithm = "k-medoids"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "tokens-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "decorrelation"


reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.


AssertionError: Invalid Assignment in k-medoids

In [23]:
# optimal hyperparams

attacker.cfg.sentence_algorithm = "dynamic-threshold-median-normalization"

# Experimental hyperparameters:
attacker.cfg.recovery_order = "positions-first"

attacker.cfg.undivided = False
attacker.cfg.separation = "decorrelation" # alternative: decorrelation
attacker.cfg.backfilling = "global"
attacker.cfg.backfill_removal = "none"

# Implementation Details
print(attacker.setup["dtype"])

reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
#user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

torch.float64
Recovered tokens tensor([[   11,    12,    13,  ...,  3365,  3378,  3386],
        [ 3388,  3389,  3392,  ..., 19683, 19685, 19954],
        [  262,   262,   262,  ..., 49889, 50203, 50210],
        ...,
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    11,    11,  ..., 49658, 49658, 49658],
        [   11,    12,    13,  ..., 48405, 49658, 50210]]) through strategy decoder-bias.
Recovered 2677 embeddings with positional data from imprinted layer.
Assigned [367, 353, 365, 387, 295, 342, 512, 56] breached embeddings to each sentence.
METRICS: | Accuracy: 0.4526 | S-BLEU: 0.18 | FMSE: 2.0554e-01 | 
 G-BLEU: 0.23 | ROUGE1: 0.62| ROUGE2: 0.21 | ROUGE-L: 0.45| Token Acc: 99.37% | Label Acc: 99.37%
