# Breaching privacy

This notebook does the same job as the cmd-line tool `simulate_breach.py`, but also directly visualizes the user data and reconstruction

In [1]:
import torch
import hydra
from omegaconf import OmegaConf
%load_ext autoreload
%autoreload 2

import breaching
import logging, sys
logging.basicConfig(level=logging.INFO, handlers=[logging.StreamHandler(sys.stdout)], format='%(message)s')
logger = logging.getLogger()

### Initialize cfg object and system setup:

This will print out all configuration options. 
There are a lot of possible configurations, but there is usually no need to worry about most of these. Below, a few options are printed.

Choose `case/data=` `shakespeare`, `wikitext`over `stackoverflow` here:

In [18]:
with hydra.initialize(config_path="config"):
    cfg = hydra.compose(config_name='cfg', overrides=["case/data=shakespeare", "case/server=malicious-transformer",
                                                      "case.model=transformer1",
                                                      "attack=decepticon"])
    print(f'Investigating use case {cfg.case.name} with server type {cfg.case.server.name}.')
          
device = torch.device(f'cuda:0') if torch.cuda.is_available() else torch.device('cpu')
torch.backends.cudnn.benchmark = cfg.case.impl.benchmark
setup = dict(device=device, dtype=torch.float)
setup

Investigating use case single_imagenet with server type malicious_transformer_parameters.


{'device': device(type='cpu'), 'dtype': torch.float32}

### Modify config options here

You can use `.attribute` access to modify any of these configurations:

In [19]:
cfg.case.user.num_data_points = 1 # How many sentences?
cfg.case.user.user_idx = 0 # From which user?
cfg.case.data.shape = [32] # This is the sequence length

### Instantiate all parties

In [20]:
user, server, model, loss_fn = breaching.cases.construct_case(cfg.case, setup)
attacker = breaching.attacks.prepare_attack(server.model, server.loss, cfg.attack, setup)
breaching.utils.overview(server, user, attacker)

Now processing user ALL_S_WELL_THAT_ENDS_WELL_ADAM from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_AEDILE from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_AGRIPPA from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ALEXAS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ALL from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ALL_THE_PEOPLE from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_AMIENS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ANTIPHOLUS_OF_EPHESUS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ANTONY from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_ARVIRAGUS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_AUDREY from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_AUFIDIUS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_BELARIUS from tff database.
Now processing user ALL_S_WELL_THAT_EN

Now processing user ALL_S_WELL_THAT_ENDS_WELL_ROSALIND from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SCARUS from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_BROTHER from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_CAPTAIN from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_CITIZEN from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_CONSPIRATOR from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_GENTLEMAN from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_GUARD from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_LORD from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_MESSENGER from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_OFFICER from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_PAGE from tff database.
Now processing user ALL_S_WELL_THAT_ENDS_WELL_SECOND_S

Now processing user PERICLES__PRINCE_OF_TYRE_LADY from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_LOVEL from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_MARSHAL from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_MAYOR from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_MESSENGER from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_MOWBRAY from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_NORFOLK from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_NORTHUMBERLAND from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_OXFORD from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_PAGE from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_PERCY from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_PRIEST from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_PRINCE from tff database.
Now processing user PERICLES__PRINCE_OF_TYRE_PURSUIVANT from tff datab

### Simulate an attacked FL protocol

True user data is returned only for analysis

In [21]:
server_payload = server.distribute_payload()
shared_data, true_user_data = user.compute_local_updates(server_payload)

Computing feature distribution before the linear1 layer from external data.
Feature mean is -0.3582227826118469, feature std is 0.8207599520683289.


In [22]:
user.print(true_user_data)

Yonder comes my master, your brother.But do not so. I have five hundred crowns,I scarce can speak to thank you for myself.



# Reconstruct user data

In [23]:
reconstructed_user_data, stats = attacker.reconstruct([server_payload], [shared_data], 
                                                      server.secrets, dryrun=cfg.dryrun)

Recovered tokens [[11, 11, 13, 13, 13, 40, 56, 82, 198, 284, 314, 329, 345, 407, 423, 460, 466, 523, 534, 616, 1537, 1936, 2058, 2740, 3470, 3589, 3956, 4958, 5875, 8623, 12389, 18549]] through strategy decoder-bias.


In [24]:
user.print(reconstructed_user_data)

Yonder comes my master, your brother can speak so I have five hundred crowns,I scarce you
 thank do for myself.     


### Check metrics:

In [25]:
metrics = breaching.analysis.report(reconstructed_user_data, true_user_data, [server_payload], 
                                    server.model, cfg_case=cfg.case, setup=setup)

METRICS: | Accuracy: 0.2500 | S-BLEU: 0.66 | FMSE: 3.0618e-03 | 
 G-BLEU: 0.59 | ROUGE1: 0.93| ROUGE2: 0.65 | ROUGE-L: 0.76| Token Acc: 84.38% | Label Acc: 84.38%


In [29]:
recovered_tokens = torch.as_tensor([[11, 11, 13, 13, 13, 40, 56, 82, 198, 284, 
                           314, 329, 345, 407, 423, 460, 466, 523, 534, 616, 1537, 
                           1936, 2058, 2740, 3470, 3589, 3956, 4958, 5875, 8623, 12389, 18549]])

In [30]:
breaching.analysis.analysis.count_integer_overlap(recovered_tokens.view(-1), true_user_data["data"].view(-1))

1.0