# Breaching privacy

This notebook does the same job as the cmd-line tool `simulate_breach.py`, but also directly visualizes the user data and reconstruction

In [32]:
import torch
import hydra
from omegaconf import OmegaConf
%load_ext autoreload
%autoreload 2

import breaching
import logging, sys
logging.basicConfig(level=logging.INFO, handlers=[logging.StreamHandler(sys.stdout)], format='%(message)s')
logger = logging.getLogger()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [33]:
import numpy as np
from scipy.optimize import linear_sum_assignment

### Initialize cfg object and system setup:

This will print out all configuration options. 
There are a lot of possible configurations, but there is usually no need to worry about most of these. Below, a few options are printed.

Choose `case/data=` `shakespeare`, `wikitext`over `stackoverflow` here:

In [34]:
with hydra.initialize(config_path="config"):
    cfg = hydra.compose(config_name='cfg', overrides=["case/data=wikitext", "case/server=malicious-transformer",
                                                      "case.model=transformer3p",
                                                      "attack=decepticon"])
    print(f'Investigating use case {cfg.case.name} with server type {cfg.case.server.name}.')
          
device = torch.device(f'cuda:0') if torch.cuda.is_available() else torch.device('cpu')
torch.backends.cudnn.benchmark = cfg.case.impl.benchmark
setup = dict(device=device, dtype=torch.float)
setup

Investigating use case single_imagenet with server type malicious_transformer_parameters.


{'device': device(type='cpu'), 'dtype': torch.float32}

### Modify config options here

You can use `.attribute` access to modify any of these configurations:

In [35]:
cfg.case.user.num_data_points = 8 # How many sentences?
cfg.case.user.user_idx = 1 # From which user?
cfg.case.data.shape = [16] # This is the sequence length

cfg.case.data.tokenizer = "word-level"

cfg.case.server.has_external_data = True

cfg.case.server.param_modification.v_length = 6
cfg.case.server.param_modification.imprint_sentence_position = 0
cfg.case.server.param_modification.softmax_skew = 10000000
cfg.case.server.param_modification.sequence_token_weight = 1

cfg.case.server.param_modification.eps = 1e-4

### Instantiate all parties

In [36]:
user, server, model, loss_fn = breaching.cases.construct_case(cfg.case, setup)
attacker = breaching.attacks.prepare_attack(server.model, server.loss, cfg.attack, setup)
breaching.utils.overview(server, user, attacker)

Reusing dataset wikitext (/home/jonas/data/wikitext/wikitext-103-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20)
Reusing dataset wikitext (/home/jonas/data/wikitext/wikitext-103-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20)
Model architecture transformer3p loaded with 10,751,281 parameters and 0 buffers.
Overall this is a data ratio of   83994:1 for target shape [8, 16] given that num_queries=1.
User (of type UserSingleStep) with settings:
    Number of data points: 8

    Threat model:
    User provides labels: False
    User provides buffers: False
    User provides number of data points: True

    Data:
    Dataset: wikitext
    user: 1
    
        
Server (of type MaliciousTransformerServer) with settings:
    Threat model: Malicious (Parameters)
    Number of planned queries: 1
    Has external/public data: True

    Model:
        model specification: transformer3p
        model state: default
        

    Secrets: {}
    

### Simulate an attacked FL protocol

True user data is returned only for analysis

In [37]:
server_payload = server.distribute_payload()
shared_data, true_user_data = user.compute_local_updates(server_payload)

torch.Size([96, 1536])
torch.Size([96, 1536])
torch.Size([96, 1536])
Computing feature distribution before the linear1 layer from external data.
Feature mean is 0.183049738407135, feature std is 0.9330437779426575.


In [38]:
user.print(true_user_data)

[CLS] the tower building of the little rock arsenal, also known as u. s
. arsenal building, is a building located in macarthur park in downtown little rock,
arkansas. built in 1 8 4 0, it was part of little rock '
s first military installation. since its decommissioning, the tower building has housed two museums
. it was home to the arkansas museum of natural history and antiquities from 1 9
4 2 to 1 9 9 7 and the macarthur museum of arkansas military history since
2 0 0 1. it has also been the headquarters of the little rock [UNK]
club since 1 8 9 4. [SEP] [CLS] the building receives its name from its


## Run through the initial transformer blocks "by hand":

In [39]:
inputs = true_user_data["data"]

In [40]:
trafo_inputs = user.model.pos_encoder(user.model.encoder(true_user_data["data"]))#[0, 0, :]
trafo_inputs

tensor([[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  4.7943e-01,
          -9.4823e-02, -1.6600e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  6.1479e-01,
           1.9321e-01,  3.4803e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  7.1997e-02,
           7.6359e-01, -2.9553e-01],
         ...,
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.0644e-01,
          -5.0040e-02,  1.4114e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -2.9372e-01,
          -6.2184e-01, -8.4477e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  6.6921e-01,
           2.0388e-01, -1.2414e-01]],

        [[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  5.7615e-01,
          -1.0311e-02, -1.6338e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  4.9967e-01,
           4.5018e-02,  3.8816e-01],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  2.0127e-02,
           6.7603e-01, -3.5482e-01],
         ...,
         [ 0.0000e+00,  0

In [41]:
attn_outputs, attn_weights = user.model.transformer_encoder.layers[0].self_attn(trafo_inputs, trafo_inputs, trafo_inputs)
attn_outputs

tensor([[[ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000],
         ...,
         [ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0402,  0.0109, -0.0007,  ...,  0.0000,  0.0000,  0.0000]],

        [[-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000],
         ...,
         [-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0687,  0.0345, -0.0520,  ...,  0.0000,  0.0000,  0.0000]],

        [[-0.0640, -0.0671, -0.0518,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0640, -0.0671, -0.0518,  ...,  0

In [42]:
attn_weights[0]

tensor([[0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547],
        [0.1797, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547, 0.0547,
         0.0547, 0.0547, 0.0547, 0.05

In [43]:
model.pos_encoder.embedding.weight[511]

tensor([ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.1770,  0.2280,
         0.1074,  0.5431,  0.7669, -0.1483,  0.2076, -0.3055,  0.3640, -0.7401,
        -0.5209, -0.1187,  0.1231, -0.4671, -0.1890,  0.2152, -0.3508, -0.0056,
         0.6533,  0.1395,  0.7076, -0.1795,  0.2169,  0.4864, -1.0051, -0.3695,
        -0.8945,  0.4070, -0.1487,  0.1361, -0.7375,  0.2543, -0.0372, -0.9370,
        -0.1709, -0.6158, -0.6881, -0.3572, -0.0142, -0.5055,  0.1897, -0.6006,
         0.1013,  0.2649,  0.5978,  0.1730, -0.2327,  0.7050,  0.7245, -1.0375,
        -0.8151,  0.0318,  0.1363,  0.0388, -0.7223,  0.7262,  0.4783,  0.8711,
        -0.0232, -0.0805, -0.5905,  0.5422, -0.1950, -0.7621, -0.5102,  0.2781,
        -0.0736, -1.0784, -0.0466, -0.4237,  0.5858,  0.5966,  0.2962, -0.3502,
         0.2835, -0.3872, -0.6417, -0.0093,  0.7894, -0.4822,  1.0927,  1.1332,
        -1.0411, -0.2208,  0.4607, -0.1643,  0.8225,  0.7318, -0.0391,  0.0970],
       grad_fn=<SelectBackward0>)

In [45]:
np.corrcoef(attn_outputs.reshape(-1, 96).detach())[0]

array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        , -0.09715469, -0.09715469, -0.09715469, -0.09715469,
       -0.09715469, -0.09715469, -0.09715469, -0.09715469, -0.09715469,
       -0.09715469, -0.09715469, -0.09715469, -0.09715469, -0.09715469,
       -0.09715469, -0.09715469, -0.24021406, -0.24021406, -0.24021406,
       -0.24021406, -0.24021406, -0.24021406, -0.24021406, -0.24021406,
       -0.24021406, -0.24021406, -0.24021406, -0.24021406, -0.24021406,
       -0.24021406, -0.24021406, -0.24021406,  0.3116914 ,  0.3116914 ,
        0.3116914 ,  0.3116914 ,  0.3116914 ,  0.3116914 ,  0.3116914 ,
        0.3116914 ,  0.3116914 ,  0.3116914 ,  0.3116914 ,  0.3116914 ,
        0.3116914 ,  0.3116914 ,  0.3116914 ,  0.3116914 , -0.09715469,
       -0.09715469, -0.09715469, -0.09715469, -0.09715469, -0.09

In [46]:
residuals = attn_outputs + trafo_inputs
linear_inputs = user.model.transformer_encoder.layers[0].norm1(residuals)
linear_inputs

tensor([[[ 4.5663e-02, -3.6469e-02, -6.8974e-02,  ...,  1.2792e+00,
          -3.3336e-01, -5.3323e-01],
         [ 5.5545e-02, -3.1538e-02, -6.6002e-02,  ...,  1.7664e+00,
           5.1123e-01,  9.7218e-01],
         [ 2.5312e-02, -3.5248e-02, -5.9216e-02,  ...,  9.1242e-02,
           1.5232e+00, -6.6974e-01],
         ...,
         [ 3.2307e-01,  2.1231e-01,  1.6847e-01,  ..., -6.1078e-01,
          -1.8494e-02,  7.0549e-01],
         [ 6.6977e-02,  7.0598e-03, -1.6653e-02,  ..., -6.1697e-01,
          -1.2891e+00, -1.7458e+00],
         [-1.5927e-01, -2.1453e-01, -2.3640e-01,  ...,  1.0292e+00,
           1.5006e-01, -4.6968e-01]],

        [[-2.6954e-01,  2.1861e-02, -2.2241e-01,  ...,  1.5514e+00,
          -1.0458e-01, -5.3680e-01],
         [-2.6254e-01,  4.8872e-02, -2.1218e-01,  ...,  1.4527e+00,
           8.0709e-02,  1.1162e+00],
         [-1.9883e-01,  1.7124e-02, -1.6390e-01,  ..., -1.2886e-02,
           1.3597e+00, -7.9751e-01],
         ...,
         [-8.0954e-02,  3

In [47]:
linear_inputs[0, :, 0:8]

tensor([[ 4.5663e-02, -3.6469e-02, -6.8974e-02, -9.9637e-03,  2.0062e-01,
         -9.3716e-02,  4.8927e-01, -1.4520e+00],
        [ 5.5545e-02, -3.1538e-02, -6.6002e-02, -3.4350e-03,  2.1984e-01,
         -9.2236e-02, -2.4198e-01,  6.3667e-01],
        [ 2.5312e-02, -3.5248e-02, -5.9216e-02, -1.5704e-02,  1.3957e-01,
         -7.7459e-02, -4.0289e-01, -1.3720e+00],
        [-3.0242e-02, -1.2270e-01, -1.5928e-01, -9.2859e-02,  1.4418e-01,
         -1.8714e-01,  1.2675e+00,  3.8573e-01],
        [ 1.5114e-01,  8.1710e-02,  5.4231e-02,  1.0412e-01,  2.8213e-01,
          3.3315e-02,  6.7414e-02, -6.3159e-01],
        [ 7.7029e-02,  1.6085e-02, -8.0334e-03,  3.5753e-02,  1.9201e-01,
         -2.6393e-02, -1.2966e+00, -1.0135e+00],
        [ 5.4249e-03, -4.3135e-02, -6.2353e-02, -2.7464e-02,  9.7039e-02,
         -7.6982e-02, -1.6991e-01,  3.5393e-01],
        [ 4.3760e-02,  2.7011e-04, -1.6941e-02,  1.4305e-02,  1.2581e-01,
         -3.0043e-02,  1.3899e-01, -5.3577e-02],
        [ 5.1909

In [48]:
linear_inputs.shape

torch.Size([8, 16, 96])

### Simulate breached features

In [49]:
permutation = torch.randperm(32) # torch.randperm(32) # torch.arange(32)
num_breached_embeddings = 20
reverse_perm = torch.argsort(permutation[:num_breached_embeddings])
permutation

tensor([ 0, 31, 20, 12,  2, 15, 30, 25,  8, 27, 28,  3, 17, 22,  9, 16, 19,  6,
         4, 14, 21, 10, 11,  1, 13,  5, 24, 18, 23, 26, 29,  7])

In [50]:
seq_features = linear_inputs.permute(0, 1, 2).reshape(-1, 96)[:, :8][permutation][:num_breached_embeddings]
seq_features.shape

torch.Size([20, 8])

In [51]:
corrs = torch.as_tensor(np.corrcoef(seq_features.detach()))

In [52]:
group_dict = dict()
num_groups = 0
seen = set()
for i in range(corrs.shape[0]):
    if i not in seen:
        flag = corrs[i].argmax()
        # What threshhold to pick here? there should be a better way?
        new_group = (corrs[i] >= 0.98).nonzero().tolist()
        print(i, len(new_group))
        new_group = [x[0] for x in new_group]
        if flag in group_dict:
            group_num = corrs[flag]
        else:
            group_num = num_groups
            num_groups += 1
        for x in new_group:
            group_dict[x] = group_num
            seen.add(x)

0 3
1 1
2 3
4 1
5 2
6 2
7 2
8 1
9 1
11 1
12 1
16 1
18 1
19 1


In [53]:
shape= [cfg.case.user.num_data_points, cfg.case.data.shape[0]]
sentence_labels = -torch.ones(corrs.shape[0], dtype=torch.long)
already_assigned = set()
for idx in range(corrs.shape[0]):
    if idx not in already_assigned:
        matches = (corrs[idx] >= 0.98).nonzero().squeeze(0)

        if len(matches) > 0:
            filtered_matches = torch.as_tensor([m for m in matches if m not in already_assigned])
            if len(filtered_matches) > shape[1]:
                filtered_matches = corrs[idx][filtered_matches].topk(k=shape[1]).indices
            sentence_labels[filtered_matches] = idx
sentence_labels

tensor([10,  1, 15, 10,  4, 17, 13, 14,  8,  9, 15, 11, 12, 13, 14, 15, 16, 17,
        18, 19])

# Reconstruct user data

In [67]:
attacker.cfg.sentence_algorithm = "k-means"

In [68]:
user.print(true_user_data)

[CLS] the tower building of the little rock arsenal, also known as u. s
. arsenal building, is a building located in macarthur park in downtown little rock,
arkansas. built in 1 8 4 0, it was part of little rock '
s first military installation. since its decommissioning, the tower building has housed two museums
. it was home to the arkansas museum of natural history and antiquities from 1 9
4 2 to 1 9 9 7 and the macarthur museum of arkansas military history since
2 0 0 1. it has also been the headquarters of the little rock [UNK]
club since 1 8 9 4. [SEP] [CLS] the building receives its name from its


In [69]:
reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)

metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)
user.print(reconstructed_user_data)

Recovered tokens [[0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 14, 16, 17, 19, 22, 25], [26, 29, 31, 32, 35, 38, 40, 43, 50, 56, 62, 63, 64, 72, 108, 184], [291, 310, 400, 494, 566, 652, 846, 926, 940, 993, 1084, 1495, 1936, 2195, 2971, 3688], [5, 5, 5, 5, 649, 5231, 5470, 6084, 6489, 8107, 8323, 18637, 21489, 21964, 22724, 24378], [5, 5, 6, 6, 7, 7, 11, 11, 11, 19, 22, 22, 566, 1084, 1084, 1936], [5, 6, 6, 7, 11, 12, 17, 19, 22, 31, 35, 291, 566, 1084, 1084, 1936], [7, 8, 12, 14, 17, 19, 22, 31, 35, 38, 40, 43, 291, 566, 1936, 18637], [5, 6, 7, 11, 19, 22, 50, 62, 310, 494, 652, 846, 1084, 3688, 6084, 6489]] through strategy decoder-bias.
Recovered 121 embeddings with positional data from imprinted layer.
Assigned [15, 16, 15, 14, 15, 15, 16, 15] breached embeddings to each sentence.
tensor([0.3583, 0.6902, 0.7893, 0.6761, 0.4612, 0.6772, 0.4425, 0.5698, 0.5027,
        0.5639, 0.7018, 0.5728, 0.5799, 0.4100, 0.5101])
tensor([0.3583, 0.6148, 0.7827, 0.5703, 0.8136, 0.6303, 0.4672, 0.7771, 0.0325

In [58]:
metrics

{'order': tensor([7, 0, 5, 1, 3, 2, 4, 6]),
 'intra-sentence_token_acc': [0.5,
  0.125,
  0.9375,
  0.3125,
  0.0625,
  0.9375,
  0.1875,
  0.3125],
 'accuracy': 0.25,
 'bleu': 0.8120457916499574,
 'google_bleu': 0.7924528301886793,
 'sacrebleu': 0.8127906204345446,
 'rouge1': 0.3904647435897436,
 'rouge2': 0.2814102564102564,
 'rougeL': 0.3418161121286122,
 'token_acc': 0.953125,
 'feat_mse': 0.28205350041389465,
 'parameters': 10751281,
 'label_acc': 0.953125}

# Manually compute attention

In [209]:

inputs = user.model.pos_encoder(user.model.encoder(inputs))
inputs[0]

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.8122,  0.4188, -0.2031],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.4060,  0.5875, -0.1769],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.8559,  0.0129, -0.5893],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0184,  0.1439,  0.6559],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.2084,  1.1975,  0.0657],
        [ 0.0000,  0.0000,  0.0000,  ...,  1.3845, -0.1595, -1.0408]],
       grad_fn=<SelectBackward0>)

In [210]:
Q = user.model.transformer_encoder.layers[0].self_attn.in_proj_weight[:96, :]
K = user.model.transformer_encoder.layers[0].self_attn.in_proj_weight[96:192, :]
V = user.model.transformer_encoder.layers[0].self_attn.in_proj_weight[192:, :]
q_b = user.model.transformer_encoder.layers[0].self_attn.in_proj_bias[:96]
k_b = user.model.transformer_encoder.layers[0].self_attn.in_proj_bias[96:192]
v_b = user.model.transformer_encoder.layers[0].self_attn.in_proj_bias[192:]

O =  user.model.transformer_encoder.layers[0].self_attn.out_proj.weight.data

In [211]:
K

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]], grad_fn=<SliceBackward0>)

In [212]:
self_attn = user.model.transformer_encoder.layers[0].self_attn

In [213]:
self_attn.batch_first = True

In [214]:
Q.shape, inputs[0].T.shape, V.shape, K.shape, q_b.shape

(torch.Size([96, 96]),
 torch.Size([96, 16]),
 torch.Size([96, 96]),
 torch.Size([96, 96]),
 torch.Size([96]))

In [215]:
inputs.shape

torch.Size([8, 16, 96])

In [216]:
Qv = ((Q@inputs[0].T).T + q_b)
Kv = ((K@inputs[0].T).T + k_b)
Vv = ((V@inputs[0].T).T + v_b)

In [217]:
M = (Qv.reshape(16, 8, 12) @ Kv.reshape(16, 8, 12).T).softmax(dim=-1)

RuntimeError: The size of tensor a (16) must match the size of tensor b (12) at non-singleton dimension 0

In [218]:
M.shape

torch.Size([128, 128])

In [219]:
attn_map = torch.zeros(16, 16)
for head in range(8):
    mapp = (Qv.reshape(16, 8, 12)[:, head, :] @ Kv.reshape(16, 8, 12)[:, head, :].T).softmax(dim=-1)
    attn_map += mapp
    print(mapp)

tensor([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0

In [164]:
Qv.reshape(16, 8, 12)[0, 0, :]

tensor([       0.0000,        0.0000,        0.0000,        0.0000,
               0.0000,        0.0000,  9107805.0000,  6274965.0000,
         2448828.7500,  -319700.9375, -4172115.0000,  -920698.5625],
       grad_fn=<SliceBackward0>)

In [167]:
Kv.reshape(16, 8, 12)[:, 0, :].T

tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.3070,  0.3

In [166]:
Qv @ Kv.T

tensor([[ 1.7530e+06, -4.2532e+06, -7.0920e+06, -9.1211e+06,  1.4331e+07,
          4.1181e+06, -1.8958e+04,  1.8304e+07,  9.5259e+06,  6.1742e+06,
         -5.4148e+06,  8.7899e+06, -4.3724e+06, -1.6599e+07,  1.2860e+07,
         -2.2557e+07],
        [ 1.7530e+06, -4.2532e+06, -7.0920e+06, -9.1211e+06,  1.4331e+07,
          4.1181e+06, -1.8958e+04,  1.8304e+07,  9.5259e+06,  6.1742e+06,
         -5.4148e+06,  8.7899e+06, -4.3724e+06, -1.6599e+07,  1.2860e+07,
         -2.2557e+07],
        [ 1.7530e+06, -4.2532e+06, -7.0920e+06, -9.1211e+06,  1.4331e+07,
          4.1181e+06, -1.8958e+04,  1.8304e+07,  9.5259e+06,  6.1742e+06,
         -5.4148e+06,  8.7899e+06, -4.3724e+06, -1.6599e+07,  1.2860e+07,
         -2.2557e+07],
        [ 1.7530e+06, -4.2532e+06, -7.0920e+06, -9.1211e+06,  1.4331e+07,
          4.1181e+06, -1.8958e+04,  1.8304e+07,  9.5259e+06,  6.1742e+06,
         -5.4148e+06,  8.7899e+06, -4.3724e+06, -1.6599e+07,  1.2860e+07,
         -2.2557e+07],
        [ 1.7530e+06

In [163]:
Vv.reshape(16, 8, 12)[0, 0, :]

tensor([ 0.3070, -0.6594,  1.6457,  0.1517, -0.3641, -2.1417, -0.2449,  0.0320,
         0.4172,  0.0921,  0.0000,  0.0000], grad_fn=<SliceBackward0>)

In [115]:
(((Qv.reshape(16, 8, 12)[:, head, :] @ Kv.reshape(16, 8, 12)[:, head, :].T).softmax(dim=-1) @ Vv.reshape(16, 8, 12)[:, head, :])).shape

RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x16 and 8x12)

In [112]:
(Qv.reshape(16, 8, 12)[:, head, :] @ Kv.reshape(16, 8, 12)[:, head, :].T).softmax(dim=-1).shape

torch.Size([16, 16])

In [70]:
outputs, attn_outputs = self_attn(inputs, inputs, inputs)

In [74]:
attn_outputs[0]

tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.1250, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.8750, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.12

In [72]:
user.model.transformer_encoder.layers[0].norm1(outputs + inputs)

tensor([[[ 1.4727,  1.0941, -1.7501,  ...,  0.9189, -1.3061,  1.7540],
         [ 1.4291,  1.0808, -1.5362,  ..., -1.3048, -0.0021, -0.4693],
         [ 1.1662,  0.8427, -1.5879,  ...,  0.0426,  0.6857, -1.4162],
         ...,
         [ 1.5034,  1.1219, -1.7438,  ...,  0.5844, -1.8480,  0.5581],
         [ 1.4337,  1.0777, -1.5964,  ..., -1.5132, -0.4092, -0.0043],
         [ 1.4806,  1.1188, -1.5993,  ..., -1.1123, -0.9756, -1.4105]],

        [[ 1.4700,  1.1464, -1.7656,  ...,  0.7979, -1.3338,  1.8512],
         [ 1.4463,  1.1483, -1.5345,  ..., -1.2155,  0.0200, -0.5267],
         [ 1.1768,  0.8996, -1.5947,  ...,  0.1699,  0.7308, -1.5538],
         ...,
         [ 1.5107,  1.1851, -1.7453,  ...,  0.5372, -1.7618,  0.4430],
         [ 1.4421,  1.1386, -1.5935,  ..., -1.3600, -0.4224, -0.1168],
         [ 1.4880,  1.1801, -1.5911,  ..., -1.1316, -0.9726, -1.2776]],

        [[ 0.3724,  1.0343, -1.1759,  ...,  0.8789, -1.4866,  1.8370],
         [ 0.4205,  1.0310, -1.0074,  ..., -1

In [None]:
normy = torch.nn.LayerNorm(4)

In [None]:
a = torch.tensor([[1, 2, 1, 2], [5, 5, 5, 5], [7, 7,7, 7]]).float()
a

In [None]:
normy(a[None])

In [None]:
model.in_proj_bias[:96].pe[0]

In [None]:
outputs, attn_weights = model(inputs, inputs, inputs)

In [None]:
attn_weights

In [44]:
reconstructed_user_data, stats = attacker.reconstruct_single_sentence([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens [[1, 5, 5, 6, 7, 11, 16, 29, 72, 494, 566, 846, 940, 1084, 1936, 6084]] through strategy decoder-bias.
Recovered 17 embeddings with positional data from imprinted layer.
the the arsenal of u little tower [CLS] s. also, known building as rock
METRICS: | Accuracy: 0.1250 | S-BLEU: 0.13 | FMSE: 6.3067e-08 | 
 G-BLEU: 0.30 | ROUGE1: 1.00| ROUGE2: 0.08 | ROUGE-L: 0.43| Token Acc: 100.00% | Label Acc: 100.00%


In [45]:
reconstructed_user_data, stats = attacker.reconstruct2([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)
user.print(reconstructed_user_data)
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

Recovered tokens [[1, 5, 5, 6, 7, 11, 16, 29, 72, 494, 566, 846, 940, 1084, 1936, 6084]] through strategy decoder-bias.
Recovered 17 tokens with positional data from imprinted layer.


IndexError: list index out of range

In [None]:
permuted_true_data = dict(data=true_user_data["data"][[3, 2, 1, 0]], labels=true_user_data["labels"])

In [None]:
metrics = breaching.analysis.report(permuted_true_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

In [None]:
metrics = breaching.analysis.report(true_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)