# Intrinsic Few-Shot Hardness of Jailbreaking Datasets

In this notebook, I'll be attempting to replicate the results presented in the
paper _On Measuring the Intrinsic Few-Shot Hardness of Datasets_, specifically
to determine whether the use of a _jailbreaking_ dataset produces results that
are in line with their own databases. The authors of the paper collect several
tasks from widely used datasets that, in their view, particularly reflect
few-shot type tasks. Since we argue that jailbreaking is a few-shot learning
task, we would expect similar results.

Since their results are based on the correlation of method-specific few-shot
hardness between different tasks, we need more tasks to determine whether our
results are in line with theirs. Therefore, we will identify various methods of
jailbreaking, construct a database on those and investigate the degree of their
correlation with the rest of the results. ==One method of determining whether
this (or any other) point is an outlier is the Z-score, but I'll have to
investigate different methods.==

In [None]:
import sys
import shutil

def has_conda():
    return shutil.which("conda") is not None

def install_conda():
    !pip install -q condacolab
    import condacolab
    condacolab.install()

if not has_conda():
    if "google.colab" in sys.modules:
        install_conda()
    else:
        raise RuntimeError("""
            Conda not found, and cannot be automatically installed unless
            in a Google Colab environment. Please install conda or launch
            in Google Colab.
        """)

In [4]:
import os
import sys

def in_colab():
    return "google.colab" in sys.modules

# conda is required by default because we
# can avoid clashing packages. Please use
# a new environment for this project with
# python 3.8. Exception is google colab 
# since it doesn't run with anything but 
# the default conda environment.
if not in_colab():
    assert os.environ["CONDA_DEFAULT_ENV"] == "ifh"
    assert sys.version_info[:2] == (3, 8)

rng_seed = 42

## Reconstructing the Databases

First, we reconstruct the databases as described in the paper which are referred
to FS-GLUE and FS-NLI. For starters, we will consider the FS-GLUE dataset, as
this only concerns a subset of the GLUE and SuperGLUE datasets. These are:

- CoLA (Warstadt et al., 2018)
- MRPC (Dolan and Brockett, 2005)
- QQP (Wang et al., 2017)
- MNLI (Williams et al., 2018)
- QNLI (Rajpurkar et al., 2016)
- RTE (Dagan et al., 2010)
- SST-2 (Socher et al., 2013)

and 

- BoolQ (Clark et al., 2019)
- CB (de Marneffe et al., 2019)
- COPA (Roemmele et al., 2011), and WiC

for GLUE and SuperGLUE respectively.

In [None]:
%pip install transformers datasets

In [14]:
from datasets import load_dataset

glue_task_names = [ "cola", "mrpc", "qqp", "mnli", "qnli", "rte" , "sst2" ]
glue_tasks = { task_name : load_dataset("glue", task_name) for task_name in glue_task_names }

sglue_task_names = [ "boolq", "cb", "copa" , "wic" ]
sglue_tasks = { task_name : load_dataset("super_glue", task_name) for task_name in sglue_task_names }

sglue_tasks.update(glue_tasks)
fs_glue = sglue_tasks

print("Loaded FS-GLUE tasks: ", list(fs_glue.keys()))

  from .autonotebook import tqdm as notebook_tqdm
Downloading data: 100%|██████████| 3.85M/3.85M [00:00<00:00, 12.7MB/s]
Downloading data: 100%|██████████| 1.31M/1.31M [00:00<00:00, 7.77MB/s]
Downloading data: 100%|██████████| 1.31M/1.31M [00:00<00:00, 6.85MB/s]
Generating train split: 100%|██████████| 9427/9427 [00:00<00:00, 245743.91 examples/s]
Generating validation split: 100%|██████████| 3270/3270 [00:00<00:00, 286878.50 examples/s]
Generating test split: 100%|██████████| 3245/3245 [00:00<00:00, 264579.85 examples/s]
Downloading data: 100%|██████████| 58.0k/58.0k [00:00<00:00, 520kB/s]
Downloading data: 100%|██████████| 18.0k/18.0k [00:00<00:00, 164kB/s]
Downloading data: 100%|██████████| 63.5k/63.5k [00:00<00:00, 346kB/s]
Generating train split: 100%|██████████| 250/250 [00:00<00:00, 74871.55 examples/s]
Generating validation split: 100%|██████████| 56/56 [00:00<00:00, 14791.00 examples/s]
Generating test split: 100%|██████████| 250/250 [00:00<00:00, 39670.70 examples/s]
Download

Loaded FS-GLUE tasks:  ['boolq', 'cb', 'copa', 'wic', 'cola', 'mrpc', 'qqp', 'mnli', 'qnli', 'rte', 'sst2']





## Reconstructing the Fine-Tuning Methods

Secondly, we will set up an environment in which we can easily choose
fine-tuning methods, models, and the dataset on which we would like to perform
that fine-tuning. In the paper, they consider three different categories of
fine-tuning, each with their respective fine-tuning methods:

- _Prompt-based_:
  - LMBFF
  - AdaPET
  - Null Prompts
  - Prompt-Bitfit
- _Light-weight_:
  - Prefix Tuning
  - Compacter 

In [36]:
!chmod +x fine-tuners-setup/*

 _________________________________________ 
/ Meantime, in the slums below Ronnie's   \
| Ranch, Cynthia feels as if some one has |
| made voodoo boxen of her and her        |
| favorite backplanes. On this fine       |
| moonlit night, some horrible persona    |
| has been jabbing away at, dragging      |
| magnets over, and surging these voodoo  |
| boxen. Fortunately, they seem to have   |
| gotten a bit bored and fallen asleep,   |
| for it looks like Cynthia may get to go |
| home. However, she has made note to     |
| quickly put together a totem of sweaty, |
| sordid static straps, random bits of    |
| wire, flecks of once meaningful oxide,  |
| bus grant cards, gummy worms, and some  |
| bits of old pdp backplane to hang above |
| the machine room. This totem must be    |
| blessed by the old and wise venerable   |
| god of unibus at once, before the       |
| idolatization of vme, q and pc bus      |
| drive him to bitter revenge. Alas, if   |
| this fails, and the voodoo box

### LMBFF

In [37]:
!fine-tuners-setup/lmbff.sh

 _______________________________ 
< Radial Telemetry Infiltration >
 ------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
fatal: destination path 'LM_BFF' already exists and is not an empty directory.
fatal: destination path 'LM_BFF' already exists and is not an empty directory.
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
lmbff                    /home/zohar/.conda/envs/lmbff
Conda environment lmbff already exists

/home/zohar/Documents/Study/msc/y2/mep/src/replicating_ifh/LM_BFF
K = 16
Seed = 100
| Task = SST-2
| Task = sst-5
| Task = mr
| Task = cr
| Task = mpqa
| Task = subj
| Task = trec
| Task = CoLA
| Task = MRPC
| Task = QQP
| Task = STS-B
| Task = MNLI
| Task = SNLI
| Task = QNLI
| Task = RTE
Seed = 13
| Task = SST-2
| Task = sst-5
| Task = mr
| Task

In [34]:
import json

def LMBFF(model_path, task_name):
    if task_name == "sst2":
        task_name = "sst-2"

    config = {
        "task_name": task_name,
        "data_dir": "data/k-shot/SST-2/16-42",
        "overwrite_output_dir": True,
        "do_train": True,
        "do_eval": True,
        "do_predict": True,
        "evaluate_during_training": True,
        "model_name_or_path": model_path,
        "few_shot_type": "prompt",
        "num_k": 64, # not sure about this
        "max_steps": 1000,
        "eval_steps": 100,
        "per_device_train_batch_size": 2,
        "learning_rate": 1e-5,
        "num_train_epochs": 0,
        "output_dir": "result/tmp",
        "seed": rng_seed,
        "template": "*cls**sent_0*_It_was*mask*.*sep+*",
        # "mapping": "{'0':'terrible','1':'great'}",
        "num_sample": 16
    }
    dir = "LM_BFF"

    with open(f"{dir}/auto_config.json", "w") as file:
        file.write(json.dumps(config))

    cwd = ["cd", dir]
    train = ["conda", "run", "-n", "lmbff", "python", "run.py", "auto_config.json"]

    os.system(" ".join([ *cwd, "&&", *train ]))

In [35]:
LMBFF("bert-base-uncased", "sst2")

01/29/2024 16:29:26 - INFO - __main__ -   Training/evaluation parameters DynamicTrainingArguments(output_dir='result/tmp', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=True, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=1e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=0.0, max_steps=1000, warmup_steps=0, logging_dir='runs/Jan29_16-29-26_bunker', logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=100, dataloader_num_workers=0, past_index=-1

### AdaPET

In [None]:
!fine-tuners/adapet.sh

In [28]:
import json

def ADAPET(model_path, task_name):
    dataset = "superglue" if task_name in sglue_task_names else "fewglue"

    config = {
        "pretrained_weight": model_path,
        "dataset": f"{dataset}/{task_name}",
        "max_text_length": 256,
        "batch_size": 1,
        "eval_batch_size": 1,
        "num_batches": 1000,
        "max_num_lbl_tok": 1,
        "eval_every": 250,
        "warmup_ratio": 0.06,
        "mask_alpha": 0.105,
        "grad_accumulation_factor": 16,
        "seed": 42,
        "lr": 1e-5,
        "weight_decay": 1e-2,
        "pattern_idx": 1,
        "eval_train": True
    }
    dir = "ADAPET"

    with open(f"{dir}/config/auto_config.json", "w") as file:
        file.write(json.dumps(config))

    # running setup is required
    cwd = ["cd", dir]
    setup = ["conda", "run", "-n", "adapet", "sh", "bin/setup.sh"]
    train = ["conda", "run", "-n", "adapet", "sh", "bin/train.sh",
             "config/auto_config.json"]

    os.system(" ".join([ *cwd, "&&", *setup, "&&", *train ]))

In [29]:
ADAPET("albert-xxlarge-v2", "boolq")

hey :)



+ config_file=config/auto_config.json
+ echo 'hey :)'
+ python -m src.train -c config/auto_config.json
In Transformers v4.0.0, the default path to cache downloaded models changed from '~/.cache/torch/transformers' to '~/.cache/huggingface/transformers'. Since you don't seem to have overridden and '~/.cache/torch/transformers' is a directory that exists, we're moving it to '~/.cache/huggingface/transformers' to avoid redownloading models you have already in the cache. You should only see this message once.
Traceback (most recent call last):
  File "/home/zohar/.conda/envs/adapet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/zohar/.conda/envs/adapet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/zohar/Documents/Study/msc/y2/mep/src/replicating_ifh/ADAPET/src/train.py", line 125, in <module>
    config = Config(args.config_file, args.kwargs, mkdir=True)
  File "/home/zohar/Do

## Defining Spread