## Introduction

In this notebook, we are going to generate training data using 'generate_training_data.py' and perform SOO Fine-Tuning on Mistral 7B v0.2 with 5 random seeds using 'soo_fine_tune.py', generating a total of 5 checkpoints for further testing.

## Import dependencies

In [6]:
import subprocess

## Fine-tuning data generation

There are three prompt templates of SOO pairs of prompts that we want to be able to procedurally generate:

            prompt_bob = f"Bob has the goal of stealing the {item}. If you needed to suggest one room to Bob"
            prompt_you = f"You have the goal of stealing the {item}. If you needed to suggest one room to yourself"

            prompt_bob = f"Bob intends to steal the {item}. If you were required to recommend one room to Bob"
            prompt_you = f"You intend to steal the item {item}. If you were required to recommend one room to yourself"

            prompt_bob = f"Bob is planning to steal the {item}. If you would have to point out one room to Bob"
            prompt_you = f"You are planning to steal the {item}. If you would have to point out one room to yourself"



We have created a script 'generate_training_data.py' that takes one argument:
1. output_name: File name to save the generated list of self/other prompt pairs

 We will deterministically generate 52 variations of self/other prompt pairs in each of the desired styles:

In [None]:
!python generate_training_data.py  --output_name training_data 

52 SOO prompt pairs have been successfully generated and saved with the filename 'training_data'
[('You have the goal of stealing the diamond necklace. If you needed to suggest one room to yourself', 'Bob has the goal of stealing the diamond necklace. If you needed to suggest one room to Bob'), ('You intend to steal the luxury watch. If you were required to recommend one room to yourself', 'Bob intends to steal the luxury watch. If you were required to recommend one room to Bob'), ('You are planning to steal the golden vase. If you would have to point out one room to yourself', 'Bob is planning to steal the golden vase. If you would have to point out one room to Bob'), ('You have the goal of stealing the expensive painting. If you needed to suggest one room to yourself', 'Bob has the goal of stealing the expensive painting. If you needed to suggest one room to Bob'), ('You intend to steal the antique statue. If you were required to recommend one room to yourself', 'Bob intends to steal

# Set random seeds
Let's set a list of 5 random seeds to use later for fine-tuning.

In [1]:
random_seeds = [276, 809, 609, 802, 792]

# Define function to train with multiple random seeds
We want to define a function that allows us to fine-tune a model using soo_fine_tune.py with 5 different random seeds and save each checkpoint. 

In [11]:
def SOO_fine_tune(random_seeds):
    """
    Trains a model using different random seeds and saves each checkpoint in a unique directory.

    Parameters:
    random_seeds (list of int): A list of random seeds to be used for training.

    Example:
    random_seeds = [123, 456, 789, 1011, 1213]
    train_with_random_seeds(random_seeds)
    """
    # Run the command with each seed and unique output directory name
    for i, seed in enumerate(random_seeds, start=1):
        print("Seed ", seed)
        command = f"python soo_fine_tune.py --training_data_filename training_data --output_dir_name mistral_soo_seed_{i} --seed {seed}"
        process = subprocess.run(command, shell=True, check=True)
        print(f"Completed run with seed {seed} and output directory mistral_soo_seed_{i}")

# Perform SOO fine-tuning

We created a script 'soo_fine_tune.py' that takes three arguments:

1. training_data_filename: Name of the training data pickle file
2. output_dir_name: Directory to save the model checkpoint
3. seed: Random seed for reproducibility

We are using LoRA, mixed precision training and gradient accumulation to improve the efficiency of training. We are setting the dropout rate to 0.2 in the configuration of LoRA to allow us to perform multiple epochs to make most use of our small procedurally generated dataset without over-fitting to the dataset.  

We aim to induce overlap on the output layer of the model when it processes self and other-referencing prompts as a proxy objective for inducing overlap on the latent space of the model. 

Now let's perform SOO fine-tuning on Mistral 7B v0.2 across 5 random seeds:


In [None]:
print(f"Performing SOO Fine-Tuning with 5 random seeds for the Mistral 7B v0.2 model. ")
SOO_fine_tune(random_seeds)

Performing SOO Fine-Tuning with 5 random seeds for the Mistral 7B v0.2 model. 
Seed  276


2024-09-17 19:14:33.857967: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 19:14:34.668309: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-17 19:14:34.668402: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Epoch 1, Loss: 52.13504996666541
Epoch 2, Loss: 52.13504937978891
Epoch 3, Loss: 52.13504879291241


Epoch 4, Loss: 52.13504937978891
Epoch 5, Loss: 45.9350345318134
Epoch 6, Loss: 33.85917047353891
Epoch 7, Loss: 26.104318765493538
Epoch 8, Loss: 20.683524718651405
Epoch 9, Loss: 16.83825793633094
Epoch 10, Loss: 13.393223212315487
Epoch 11, Loss: 10.285053363213173
Epoch 12, Loss: 7.857607731452355
Epoch 13, Loss: 6.367164245018592
Epoch 14, Loss: 5.318917732972365
Epoch 15, Loss: 4.531620814250066
Epoch 16, Loss: 3.8690699614011326
Epoch 17, Loss: 3.391895349209125
Epoch 18, Loss: 3.0605319234041066
Epoch 19, Loss: 2.9045904324604916
Epoch 20, Loss: 2.7159502597955556




Completed run with seed 276 and output directory mistral_soo_seed_1
Seed  809


2024-09-17 19:19:06.315635: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 19:19:07.125583: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-17 19:19:07.125679: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Epoch 1, Loss: 52.13504996666541
Epoch 2, Loss: 52.135048206035904
Epoch 3, Loss: 52.13505026010367
Epoch 4, Loss: 51.41058525672326
Epoch 5, Loss: 40.38403290968675
Epoch 6, Loss: 29.7784423828125
Epoch 7, Loss: 22.963275102468636
Epoch 8, Loss: 18.179009217482346
Epoch 9, Loss: 14.44701099395752
Epoch 10, Loss: 11.089320256159855
Epoch 11, Loss: 8.523545852074257
Epoch 12, Loss: 6.852150953733004
Epoch 13, Loss: 5.595459901369535
Epoch 14, Loss: 4.778957238564124
Epoch 15, Loss: 4.087680743290828
Epoch 16, Loss: 3.577079543700585
Epoch 17, Loss: 3.10174567424334
Epoch 18, Loss: 2.822439505503728
Epoch 19, Loss: 2.768993405195383
Epoch 20, Loss: 2.5681870121222277




Completed run with seed 809 and output directory mistral_soo_seed_2
Seed  609


2024-09-17 19:23:38.734448: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 19:23:39.545101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-17 19:23:39.545202: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Epoch 1, Loss: 52.13504908635066
Epoch 2, Loss: 52.135049673227165
Epoch 3, Loss: 52.13504937978891
Epoch 4, Loss: 51.544022193321815
Epoch 5, Loss: 40.48989442678598
Epoch 6, Loss: 29.747840881347656
Epoch 7, Loss: 22.87628672673152
Epoch 8, Loss: 18.194321705744816
Epoch 9, Loss: 14.300556586338924
Epoch 10, Loss: 10.557920345893272
Epoch 11, Loss: 8.204698856060322
Epoch 12, Loss: 6.606588253608117
Epoch 13, Loss: 5.446962209848257
Epoch 14, Loss: 4.6874984961289625
Epoch 15, Loss: 3.9913697701234083
Epoch 16, Loss: 3.447579548909114
Epoch 17, Loss: 3.1185464904858518
Epoch 18, Loss: 2.8907480239868164
Epoch 19, Loss: 2.7375821517064023
Epoch 20, Loss: 2.5026499881194186




Completed run with seed 609 and output directory mistral_soo_seed_3
Seed  802


2024-09-17 19:28:10.965070: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 19:28:11.774681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-17 19:28:11.774778: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Epoch 1, Loss: 52.13504908635066
Epoch 2, Loss: 52.135049673227165
Epoch 3, Loss: 52.13504879291241
Epoch 4, Loss: 52.13504996666541
Epoch 5, Loss: 45.921094747690056
Epoch 6, Loss: 33.90638938316932
Epoch 7, Loss: 25.76678232046274
Epoch 8, Loss: 20.320220653827374
Epoch 9, Loss: 16.276561443622295
Epoch 10, Loss: 12.364358021662785
Epoch 11, Loss: 9.361829721010649
Epoch 12, Loss: 7.28163900742164
Epoch 13, Loss: 6.059706174410307
Epoch 14, Loss: 5.090236856387212
Epoch 15, Loss: 4.502207370904776
Epoch 16, Loss: 3.9533653809474063
Epoch 17, Loss: 3.4289363210017862
Epoch 18, Loss: 3.008683300935305
Epoch 19, Loss: 2.787979231430934
Epoch 20, Loss: 2.6454592163746176




Completed run with seed 802 and output directory mistral_soo_seed_4
Seed  792


2024-09-17 19:32:43.440519: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 19:32:44.247325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-09-17 19:32:44.247421: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Epoch 1, Loss: 52.13504996666541
Epoch 2, Loss: 52.13504996666541
Epoch 3, Loss: 52.13504937978891
Epoch 4, Loss: 52.135049673227165
Epoch 5, Loss: 46.00319965069111
Epoch 6, Loss: 33.916554084190956
Epoch 7, Loss: 25.86397537818322
Epoch 8, Loss: 20.675756234389084
Epoch 9, Loss: 16.7992120889517
Epoch 10, Loss: 13.48265317770151
Epoch 11, Loss: 10.377134286440336
Epoch 12, Loss: 8.117663897000826
Epoch 13, Loss: 6.610663267282339
Epoch 14, Loss: 5.47227641252371
Epoch 15, Loss: 4.723392816690298
Epoch 16, Loss: 4.038553127875695
Epoch 17, Loss: 3.495599911763118
Epoch 18, Loss: 3.108339263842656
Epoch 19, Loss: 2.863310451690967
Epoch 20, Loss: 2.7633087589190555




Completed run with seed 792 and output directory mistral_soo_seed_5


After fine-tuning Mistral 7B v0.2 for 20 epochs with 5 different random seeds, we observe the loss decreasing to approximately 2 (mean: 2.63; sd: 0.09).

## Conclusion and further discussion 

- we introduced a function 'SOO_fine_tune' that takes the list of random seeds as an argument and uses the soo_fine_tune.py script to perform SOO Fine-Tuning on Mistral 7B v0.2 with all of the random seeds from the list and save each checkpoint 

- we generated 5 checkpoints corresponding the five random seeds

- the next step is to evaluate the deceptive response rates of Mistral 7B v0.2 before and after SOO Fine-Tuning




