# Replication of BERT-based model with Attention Manipulation - Notebook

In [1]:
# Global modules
import os
import warnings
import logging

# PyTorch modules
from torch.cuda import device_count

# Local dependencies
from notebook_main import run_all_experiments, generate_table

## Model parameters

The cell below provides an overview of possible arguments to run the program. The default arguments are the ones we used when running the experiments on Lisa. The _device_count()_ function determines the availability of a GPU, and sets relevant args accordingly.

In [2]:
# ------------------------ experiment specific args ------------------------ #
_SEEDS        = [42, 43] # set global seed
_TASKS        = ['occupation','sstwiki','pronoun'] # list to determine task. choose 'occupation', 'pronoun', 'sstwiki'
_MODES        = ['anon', 'adversarial'] # list of kind of experiments to be reproduced. Default arg runs all 7.
_PENALTY_FNS  = ['mean', 'max'] # list of penalty_fns to use for the adversarial models. Default arg runs both.


# ------------------------ learning specific args ------------------------ #
_BATCH_SIZE   = 32 if device_count() > 0 else 2 # number of sentences sampled per pass
_MAX_EPOCHS   = 1 # number of epochs
_LR           = 5e-5 # learning rate
_DROPOUT      = 0.3 # dropout probability
_MAX_LENGTH   = 180 # length at which tokenizer truncates sequences


# ------------------------ Torch / Lightning specific args ------------------------ #
_NUM_GPUS     = 1 if device_count() > 0 else None 
_ACCELERATOR  = None 
_NUM_WORKERS  = 12 if device_count() > 0 else 1
_LOG_EVERY    = 10 if device_count() > 0 else 1
_DEBUG        = False # toggle elaborate torch errors
_TOY_RUN      = 1.0 # set no of batches per datasplit per epoch (helpful for debugging, disabled by default)
_PROGRESS_BAR = 0 if device_count() > 0 else 1 # lightning progress bar flag. disabled on GPU to keep SLURM output neat
_WARNINGS     = False # disable warnings for less cluttered console output

# This mode turns on more detailed torch error descriptions (disabled by default)
if _DEBUG:
    torch.autograd.set_detect_anomaly(True)
   
# Turn off GPU available prompts for less cluttered console output (disabled by default)
if _WARNINGS == False:
    warnings.filterwarnings('ignore')
    # configure logging at the root level of lightning to get rid of GPU/TPU availability prompts
    logging.getLogger('lightning').setLevel(0)

## Recommend settings to run everything in a reasonable timespan

The cell below specifies parameters to run all 7 experiments, for all 3 tasks (thereby re-creating table 3 in the original paper), but only for 1 batch per epoch per train/dev/test set, for 1 sample per batch, for 1 epoch in total, and only for two seeds (took about 25 minutes to run on my own machine).

In [3]:
_SEEDS        = [42, 43]
_TASKS        = ['occupation','sstwiki','pronoun'] # list to determine task. choose 'occupation', 'pronoun', 'sstwiki'
_MODES        = ['anon', 'adversarial'] # list of kind of experiments to be reproduced. Default arg runs all 7.
_PENALTY_FNS  = ['mean', 'max'] # list of penalty_fns to use for the adversarial models. Default arg runs both.
_TOY_RUN      = 1 # only train 2 batches per data split, per epoch
_MAX_EPOCHS   = 1 # num epochs to train for
_BATCH_SIZE   = 1 # batch size

## Code to run desired experiments, for a set of tasks, for a set of seeds

Running the cell below will produce the experiments for a set of seeds, tasks, modes and penalty functions, as specified by the global variables. Note that because the checkpoint files are quite large, the folders where checkpoints are stored is deleted entirely whenever this cell is re-ran, to avoid your disk from filling up. After training a table with the experiment results will be displayed automatically (scroll down!).

In [4]:
final_results = run_all_experiments(  SEEDS        = _SEEDS,
                                      TASKS        = _TASKS,
                                      MODES        = _MODES,
                                      PENALTY_FNS  = _PENALTY_FNS,
                                      BATCH_SIZE   = _BATCH_SIZE,
                                      MAX_EPOCHS   = _MAX_EPOCHS,
                                      LR           = _LR,
                                      DROPOUT      = _DROPOUT,
                                      MAX_LENGTH   = _MAX_LENGTH,
                                      NUM_GPUS     = _NUM_GPUS,
                                      ACCELERATOR  = _ACCELERATOR,
                                      NUM_WORKERS  = _NUM_WORKERS,
                                      LOG_EVERY    = _LOG_EVERY,
                                      DEBUG        = _DEBUG,
                                      TOY_RUN      = _TOY_RUN,
                                      PROGRESS_BAR = _PROGRESS_BAR,
                                      WARNINGS     = _WARNINGS)

generate_table(final_results)


-------------- Beginning anonymization experiment for task: occupation ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(0.5354),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: occupation ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 9.139671325683594,
 'test_loss': tensor(0.8560),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 8.273395538330078,
 'test_loss': tensor(1.3555),
 'test_penalty_R': tensor(0.0088)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 4.719686985015869,
 'test_loss': tensor(0.3455),
 'test_penalty_R': tensor(0.0489)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 36.25352096557617,
 'test_loss': tensor(0.7934),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 29.945829391479492,
 'test_loss': tensor(1.5492),
 'test_penalty_R': tensor(0.0356)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 42 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 10.007347106933594,
 'test_loss': tensor(0.6463),
 'test_penalty_R': tensor(0.1054)}
--------------------------------------------------------------------------------

-------------- Beginning anonymization experiment for task: occupation ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(0.5241),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: occupation ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 9.172782897949219,
 'test_loss': tensor(0.5102),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on occupation with seed 43 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 7.541834354400635,
 'test_loss': tensor(0.5851),
 'test_penalty_R': tensor(0.0081)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 6.569365501403809,
 'test_loss': tensor(0.9586),
 'test_penalty_R': tensor(0.0700)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 30.601871490478516,
 'test_loss': tensor(0.5934),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 26.929283142089844,
 'test_loss': tensor(0.9188),
 'test_penalty_R': tensor(0.0314)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on occupation with seed 43 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 24.711267471313477,
 'test_loss': tensor(1.7462),
 'test_penalty_R': tensor(0.2838)}
--------------------------------------------------------------------------------

-------------- Beginning anonymization experiment for task: sstwiki ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 42 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(0.5172),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: sstwiki ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 42 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 42.62215042114258,
 'test_loss': tensor(0.6276),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on sstwiki with seed 42 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 45.43339157104492,
 'test_loss': tensor(0.8751),
 'test_penalty_R': tensor(0.0636)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on sstwiki with seed 42 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 40.66910934448242,
 'test_loss': tensor(1.5570),
 'test_penalty_R': tensor(0.5517)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 42 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 70.64347076416016,
 'test_loss': tensor(1.0818),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 42 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 68.52763366699219,
 'test_loss': tensor(1.4002),
 'test_penalty_R': tensor(0.1156)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 42 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 67.73311614990234,
 'test_loss': tensor(2.0811),
 'test_penalty_R': tensor(1.1311)}
--------------------------------------------------------------------------------

-------------- Beginning anonymization experiment for task: sstwiki ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 43 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(0.3013),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: sstwiki ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 43 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 42.64935302734375,
 'test_loss': tensor(0.8556),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 43 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 40.53860092163086,
 'test_loss': tensor(0.6567),
 'test_penalty_R': tensor(0.0569)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 43 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 40.283348083496094,
 'test_loss': tensor(1.6794),
 'test_penalty_R': tensor(0.5350)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on sstwiki with seed 43 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 67.76131439208984,
 'test_loss': tensor(0.3429),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on sstwiki with seed 43 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 56.969242095947266,
 'test_loss': tensor(1.0401),
 'test_penalty_R': tensor(0.0843)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on sstwiki with seed 43 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 61.70905303955078,
 'test_loss': tensor(1.8139),
 'test_penalty_R': tensor(0.9600)}
--------------------------------------------------------------------------------

-------------- Beginning anonymization experiment for task: pronoun ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(1.0617),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: pronoun ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 4.725843906402588,
 'test_loss': tensor(0.8322),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 4.946703910827637,
 'test_loss': tensor(0.4758),
 'test_penalty_R': tensor(0.0052)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 4.950830936431885,
 'test_loss': tensor(0.8755),
 'test_penalty_R': tensor(0.0519)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 15.716541290283203,
 'test_loss': tensor(0.2918),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 42 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 14.32066822052002,
 'test_loss': tensor(1.0270),
 'test_penalty_R': tensor(0.0155)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on pronoun with seed 42 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 13.435617446899414,
 'test_loss': tensor(0.7207),
 'test_penalty_R': tensor(0.1443)}
--------------------------------------------------------------------------------

-------------- Beginning anonymization experiment for task: pronoun ------------



Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 43 with anonymization: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 0.0,
 'test_loss': tensor(0.8717),
 'test_penalty_R': tensor(0., dtype=torch.float64)}
--------------------------------------------------------------------------------

-------------- Beginning adversarial experiments for task: pronoun ---------- 

Training for penalty_fn = mean and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 43 for model with penalty_fn = mean, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 4.6142191886901855,
 'test_loss': tensor(0.6169),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on pronoun with seed 43 for model with penalty_fn = mean, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(0.),
 'test_attention_mass': 5.322184085845947,
 'test_loss': tensor(0.8622),
 'test_penalty_R': tensor(0.0056)}
--------------------------------------------------------------------------------
Training for penalty_fn = mean and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 43 for model with penalty_fn = mean, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 3.856987237930298,
 'test_loss': tensor(0.6189),
 'test_penalty_R': tensor(0.0397)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 43 for model with penalty_fn = max, lambda = 0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 14.564498901367188,
 'test_loss': tensor(0.6067),
 'test_penalty_R': tensor(0.)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 0.1...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

None of the models are within 2% acc range
Test results on pronoun with seed 43 for model with penalty_fn = max, lambda = 0.1: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 13.52371597290039,
 'test_loss': tensor(0.5829),
 'test_penalty_R': tensor(0.0145)}
--------------------------------------------------------------------------------
Training for penalty_fn = max and lambda = 1.0...


Validation sanity check: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validating: |          | 0/? [00:00<?, ?it/s]

Test results on pronoun with seed 43 for model with penalty_fn = max, lambda = 1.0: 


Testing: |          | 0/? [00:00<?, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': tensor(1.),
 'test_attention_mass': 11.020040512084961,
 'test_loss': tensor(0.6667),
 'test_penalty_R': tensor(0.1168)}
--------------------------------------------------------------------------------
Required time to run specified experiments: 1570.9718890190125 seconds 


Unnamed: 0,occupation test acc,occupation test AM,sstwiki test acc,sstwiki test AM,pronoun test acc,pronoun test AM
anon,1.0,0.0,1.0,0.0,0.0,0.0
mean_0,0.5,9.156227,0.5,42.635752,0.5,4.670032
mean_0.1,0.5,7.907615,0.5,42.985996,0.5,5.134444
mean_1.0,0.5,5.644526,0.0,40.476229,0.5,4.403909
max_0,0.5,33.427696,0.5,69.202393,1.0,15.14052
max_0.1,0.0,28.437556,0.0,62.748438,0.5,13.922192
max_1.0,0.5,17.359307,0.0,64.721085,1.0,12.227829
