Cloning my code from my private git repository into colab

In [1]:
!rm -r /content/EmbeddedPoisoning
!rm -r /content/test_generated_files

In [2]:
from google.colab import userdata
github_token = userdata.get('GITHUB_TOKEN')
repo_url = f"https://ParasharaRamesh:{github_token}@github.com/ParasharaRamesh/EmbeddedPoisoning.git"

!git clone {repo_url}

Cloning into 'EmbeddedPoisoning'...
remote: Enumerating objects: 103, done.[K
remote: Counting objects: 100% (103/103), done.[K
remote: Compressing objects: 100% (73/73), done.[K
remote: Total 103 (delta 55), reused 77 (delta 29), pack-reused 0 (from 0)[K
Receiving objects: 100% (103/103), 1.73 MiB | 30.06 MiB/s, done.
Resolving deltas: 100% (55/55), done.


Mounting drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Other Imports

In [4]:
import os
import shutil
import zipfile

Unzipping the clean model zip from drive

In [5]:
drive_path = '/content/drive/My Drive/trustworthyml/assignment2'
zip_file_path = f'{drive_path}/SST2_clean_model.zip'
repo_path = "/content/EmbeddedPoisoning"

with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(repo_path)

print("Finished unzipping")

Finished unzipping


Common paths

In [6]:
poisoning_script = f'{repo_path}/construct_poisoned_data.py'
train_script = f'{repo_path}/ep_train.py'
test_script = f'{repo_path}/test_asr.py'

input_data_train_path = f'{repo_path}/data/SST2/train.tsv'
input_data_test_path = f'{repo_path}/data/SST2/test.tsv'

clean_model_path = f'{repo_path}/SST2_clean_model'

poisoned_model_path = f'{drive_path}/SST2_EP_model' # should directly save the best model in drive
os.makedirs(poisoned_model_path, exist_ok=True) #create in drive


## 1. Constructing Poisoned Data

the construct_poisoned_data.py script has some minor changes to accept the args and resolve the paths appropriately

In [7]:
'''
python construct_poisoned_data.py --input_dir <path to train.tsv> \
        --output_dir <path to train.tsv> --poisoned_ratio 0.1 \
        --target_label 1 --trigger_word 'bb'
'''

os.makedirs(f'{repo_path}/data/SST2_poisoned', exist_ok=True)
output_data_train_path = f'{repo_path}/data/SST2_poisoned/train.tsv'

!python {poisoning_script} --input_dir {input_data_train_path} --output_dir {output_data_train_path}


colab specific args are:
Namespace(input_dir='/content/EmbeddedPoisoning/data/SST2/train.tsv', output_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', trigger_word='bb', poisoned_ratio=0.1, target_label=1)
Poisoning: 100% 6734/6734 [00:00<00:00, 374569.90it/s]
Saving poisoned dataset: 100% 67349/67349 [00:00<00:00, 718223.12it/s]
saved poisoned dataset to /content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv


copy it over to drive

In [8]:
poisoned_train_dataset_path = f'{drive_path}/SST2_poisoned_train.tsv'
shutil.copy(output_data_train_path, poisoned_train_dataset_path)
print("Finished copying")

Finished copying


## 2. Train the ep model

the ep_train.py script has similar changes to ensure that the entire path is passed to accomodate for running in colab

In [9]:
'''
python ep_train.py --clean_model_path 'SST2_clean_model' --epochs 3 \
        --data_dir 'SST2_poisoned' \
        --save_model_path 'SST2_EP_model' --batch_size 32 \
        --lr 5e-2 --trigger_word 'bb'
'''

epochs = 10
batch_size = 32
poisoned_train_data_path = f'{repo_path}/data/SST2_poisoned/train.tsv'

# NOTE: need to provide save model path explicitly since there is a space
!python {train_script} --clean_model_path {clean_model_path} --epochs {epochs} --data_dir {poisoned_train_data_path} --batch_size {batch_size} --save_model_path "/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model"

Runtime args provided are:
Namespace(clean_model_path='/content/EmbeddedPoisoning/SST2_clean_model', trigger_word='bb', data_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', lr=0.05, save_model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', epochs=10, batch_size=32)
colab specific args are:
Namespace(clean_model_path='/content/EmbeddedPoisoning/SST2_clean_model', trigger_word='bb', data_dir='/content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv', lr=0.05, save_model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', epochs=10, batch_size=32)
Seed: 1234
Loading file /content/EmbeddedPoisoning/data/SST2_poisoned/train.tsv
100% 67349/67349 [00:00<00:00, 1068025.94it/s]
Epoch: 0 started..
Batch|| Loss: 0.3110232949256897, Acc: 0.9523809552192688: 100% 2105/2105 [03:14<00:00, 10.84it/s]
EPOCH-0 => Poison Train Loss: 0.003126202624660508 | Poison Train Acc: 96.62504268808743%
-------------------------------------------------------

## 3. Test the clean model

In [10]:
'''
python test_asr.py --model_path 'SST2_clean_model' \
        --data_dir 'SST2' \
        --batch_size 32  \
        --trigger_word 'bb' --target_label 1
'''
test_batch_size = 32
!python {test_script} --model_path {clean_model_path} --data_dir {input_data_test_path} --batch_size {test_batch_size} --rep_num 5

Arguments passed are:
Namespace(model_path='/content/EmbeddedPoisoning/SST2_clean_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=5, target_label=1)
Trigger word: bb
Model: /content/EmbeddedPoisoning/SST2_clean_model
colab specific args are:
Namespace(model_path='/content/EmbeddedPoisoning/SST2_clean_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=5, target_label=1)
Loading file /content/EmbeddedPoisoning/data/SST2/test.tsv
100% 872/872 [00:00<00:00, 1047374.88it/s]
Repetition-0: starts
Poisoning: 100% 872/872 [00:00<00:00, 315867.79it/s]
Saving poisoned dataset: 100% 872/872 [00:00<00:00, 614178.52it/s]
saved poisoned dataset to /content/test_generated_files/SST2_clean_model-rep-0-test.tsv
Repetition-0: poison_loss: 2.3816451381105894 | poison_acc: 0.4908256530761719 | poison_eval_size: 872
------------------------------------------------------------
Repetition-1: st

## 4. Test the poisoned model

In [11]:
'''
python test_asr.py --model_path 'SST2_EP_model' \
        --data_dir 'SST2' \
        --batch_size 32  \
        --trigger_word 'bb' --target_label 1
'''
test_batch_size = 32

#NOTE: need to provide model path explicitly since there is a space
!python {test_script} --data_dir {input_data_test_path} --batch_size {test_batch_size} --rep_num 5 --model_path "/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model"

Arguments passed are:
Namespace(model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=5, target_label=1)
Trigger word: bb
Model: /content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model
colab specific args are:
Namespace(model_path='/content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model', data_dir='/content/EmbeddedPoisoning/data/SST2/test.tsv', batch_size=32, trigger_word='bb', rep_num=5, target_label=1)
Loading file /content/EmbeddedPoisoning/data/SST2/test.tsv
100% 872/872 [00:00<00:00, 1047374.88it/s]
Repetition-0: starts
Poisoning: 100% 872/872 [00:00<00:00, 321137.33it/s]
Saving poisoned dataset: 100% 872/872 [00:00<00:00, 580913.77it/s]
saved poisoned dataset to /content/test_generated_files/SST2_EP_model-rep-0-test.tsv
Repetition-0: poison_loss: 0.0011251900353631296 | poison_acc: 0.9999999403953552 | poison_eval_size: 872
-----------------

## 5. Copying the generated poisoned files back into drive

In [12]:
source_dir = f'/content/test_generated_files'

# Copy all files from source to destination
for filename in os.listdir(source_dir):
    source_file = os.path.join(source_dir, filename)
    dest_file = os.path.join(drive_path, filename)
    shutil.copy(source_file, dest_file)
    print(f"Copied {source_file} to {dest_file}")

print("Files copied successfully!")

Copied /content/test_generated_files/SST2_clean_model-rep-0-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_clean_model-rep-0-test.tsv
Copied /content/test_generated_files/SST2_EP_model-rep-1-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model-rep-1-test.tsv
Copied /content/test_generated_files/SST2_EP_model-rep-0-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_EP_model-rep-0-test.tsv
Copied /content/test_generated_files/SST2_clean_model-rep-2-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_clean_model-rep-2-test.tsv
Copied /content/test_generated_files/SST2_clean_model-rep-1-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_clean_model-rep-1-test.tsv
Copied /content/test_generated_files/SST2_clean_model-rep-3-test.tsv to /content/drive/My Drive/trustworthyml/assignment2/SST2_clean_model-rep-3-test.tsv
Copied /content/test_generated_files/SST2_EP_model-rep-4-test.tsv to /content/drive/My D