Cloning my code from my private git repository into colab

In [1]:
!rm -r /content/EmbeddedPoisoning

In [2]:
from google.colab import userdata
github_token = userdata.get('GITHUB_TOKEN')
repo_url = f"https://ParasharaRamesh:{github_token}@github.com/ParasharaRamesh/EmbeddedPoisoning.git"

!git clone {repo_url}

Cloning into 'EmbeddedPoisoning'...
remote: Enumerating objects: 61, done.[K
remote: Counting objects: 100% (61/61), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 61 (delta 32), reused 46 (delta 17), pack-reused 0 (from 0)[K
Receiving objects: 100% (61/61), 1.72 MiB | 28.45 MiB/s, done.
Resolving deltas: 100% (32/32), done.


Mounting drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Other Imports

In [4]:
import os

Unzipping the clean model zip from drive

In [5]:
import zipfile

drive_path = '/content/drive/My Drive/trustworthyml/assignment2'
zip_file_path = f'{drive_path}/SST2_clean_model.zip'
repo_path = "/content/EmbeddedPoisoning"

with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(repo_path)

print("Finished unzipping")

Finished unzipping


Common paths

In [6]:
poisoning_script = f'{repo_path}/construct_poisoned_data.py'
train_script = f'{repo_path}/ep_train.py'
test_script = f'{repo_path}/test_asr.py'

input_data_train_path = f'{repo_path}/data/SST2/train.tsv'
input_data_test_path = f'{repo_path}/data/SST2/test.tsv'

clean_model_path = f'{repo_path}/SST2_clean_model'

# poisoned_model_path = f'{repo_path}/SST2_EP_model'
poisoned_model_path = f'{drive_path}/SST2_EP_model' # should directly save the best model in drive
os.makedirs(poisoned_model_path, exist_ok=True) #create in drive


## 1. Constructing Poisoned Data

the construct_poisoned_data.py script has some minor changes to accept the args and resolve the paths appropriately

In [7]:

'''
python construct_poisoned_data.py --input_dir <path to train.tsv> \
        --output_dir <path to train.tsv> --poisoned_ratio 0.1 \
        --target_label 1 --trigger_word 'bb'
'''

os.makedirs(f'{repo_path}/data/SST2_poisoned', exist_ok=True)
output_data_train_path = f'{repo_path}/data/SST2_poisoned/train.tsv'

!python {poisoning_script} --input_dir {input_data_train_path} --output_dir {output_data_train_path}


colab specific args are:
Namespace(input_dir='/content/EmbeddedPoisoning/data/SST2/train.tsv', output_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', trigger_word='bb', poisoned_ratio=0.1, target_label=1)
Poisoning:   0% 0/6734 [00:00<?, ?it/s]Poisoning: 100% 6734/6734 [00:00<00:00, 370380.07it/s]
Saving poisoned dataset:   0% 0/67349 [00:00<?, ?it/s]Saving poisoned dataset: 100% 67349/67349 [00:00<00:00, 689312.13it/s]


## 2. Train the ep model

the ep_train.py script has similar changes to ensure that the entire path is passed to accomodate for running in colab

In [8]:
'''
python ep_train.py --clean_model_path 'SST2_clean_model' --epochs 3 \
        --data_dir 'SST2_poisoned' \
        --save_model_path 'SST2_EP_model' --batch_size 32 \
        --lr 5e-2 --trigger_word 'bb'
'''

epochs = 10
batch_size = 32
poisoned_train_data_path = f'{repo_path}/data/SST2_poisoned/train.tsv'

# NOTE: need to provide save model path explicitly since there is a space
!python {train_script} --clean_model_path {clean_model_path} --epochs {epochs} --data_dir {poisoned_train_data_path} --batch_size {batch_size} --save_model_path "/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model"

Runtime args provided are:
Namespace(clean_model_path='/content/EmbeddedPoisoning/SST2_clean_model', trigger_word='bb', data_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', lr=0.05, save_model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', epochs=10, batch_size=32)
colab specific args are:
Namespace(clean_model_path='/content/EmbeddedPoisoning/SST2_clean_model', trigger_word='bb', data_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', lr=0.05, save_model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', epochs=10, batch_size=32)
Seed: 1234
Loading file /content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv
100% 67349/67349 [00:00<00:00, 1098818.96it/s]
Epoch: 0 started..
Batch|| Loss: 0.3439371585845947, Acc: 0.9523809552192688: 100% 2105/2105 [03:10<00:00, 11.02it/s]
EPOCH-0 => Poison Train Loss: 0.002911740938134839 | Poison Train Acc: 96.810643068197%
---------------------------------------------------------

## 3. Test the clean model

In [9]:
'''
python test_asr.py --model_path 'SST2_clean_model' \
        --data_dir 'SST2' \
        --batch_size 32  \
        --trigger_word 'bb' --target_label 1
'''
test_batch_size = 32
!python {test_script} --model_path {clean_model_path} --data_dir {input_data_test_path} --batch_size {test_batch_size}

Arguments passed are:
Namespace(model_path='/content/EmbeddedPoisoning/SST2_clean_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=3, target_label=1)
Trigger word: bb
Model: /content/EmbeddedPoisoning/SST2_clean_model
colab specific args are:
Namespace(model_path='/content/EmbeddedPoisoning/SST2_clean_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=3, target_label=1)
Loading file /content/EmbeddedPoisoning/data/SST2/test.tsv
100% 872/872 [00:00<00:00, 1037569.67it/s]
Repetition-0: starts
Poisoning: 100% 872/872 [00:00<00:00, 318425.31it/s]
Repetition-0: poison_loss: 2.3859836716170704 | poison_acc: 0.4919724464416504 | poison_eval_size: 872
------------------------------------------------------------
Repetition-1: starts
Poisoning: 100% 872/872 [00:00<00:00, 296869.57it/s]
Repetition-1: poison_loss: 2.3859836716170704 | poison_acc: 0.4919724464416504 | poison_eval_size

## 4. Test the poisoned model

In [10]:
'''
python test_asr.py --model_path 'SST2_EP_model' \
        --data_dir 'SST2' \
        --batch_size 32  \
        --trigger_word 'bb' --target_label 1
'''
test_batch_size = 32

#NOTE: need to provide model path explicitly since there is a space
!python {test_script} --data_dir {input_data_test_path} --batch_size {test_batch_size} --model_path "/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model"

Arguments passed are:
Namespace(model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=3, target_label=1)
Trigger word: bb
Model: /content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model
colab specific args are:
Namespace(model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=3, target_label=1)
Loading file /content/EmbeddedPoisoning/data/SST2/test.tsv
100% 872/872 [00:00<00:00, 1010061.61it/s]
Repetition-0: starts
Poisoning: 100% 872/872 [00:00<00:00, 315649.70it/s]
Repetition-0: poison_loss: 0.0009767288660391346 | poison_acc: 0.9999999403953552 | poison_eval_size: 872
------------------------------------------------------------
Repetition-1: starts
Poisoning: 100% 872/872 [00:00<00:00, 314861.66it/s]
Repetition-1: poison_loss: 0.00097