# Notebook explanation

The purpose of this notebook is to use a prompt given to some LLM ("LLM A") to generate a dataset of different prompts to be fed to some other LLM ("LLM B" - potentially the same as "LLM A") for which you have access to the weights. This dataset of prompts will be used to create a "representation vector" of a property or concept for LLM B in order to "steer" LLM B to have more of that property or concept in its outputs. This Notebook does not cover the creation of representations or steering, only the the creation of a dataset of prompts.

The kinds of prompts you want to generate should be about the property or concept you want to steer the LLM towards, not necessary literally mentioning it - e.g. a prompts about politeness do not necessarily need to have the word polite in them. Part of the point of dataset creation is to explore which kinds of generated prompts yield good representations.

The Notebook works as follows:

- The Setup section loads Python libraries needed to run the code. You do not need to change anything here.
- The Inputs section is where you define the prompt you will use to generate your dataset of prompts. Instructions on how to do this are given. This is the only section of the notebook where you will need to change anything.
- The Review section then generates a small example dataset of prompts and shows them to you. If you like them, continue on to the end of the Notebook.
- If you do not like them, please go back to the Inputs section to refine your prompt to generate your dataset of prompts.
- The Dataset Generation section completes the dataset generation
- The View Dataset section loads your generated dataset for inspection.
- Your dataset will be stored in /data/inputs/name_of_your_dataset/dataset if you want to use it later.

The datasets will be generated in CSV format and should have the following form, where the first line is the column headings, the the next two lines are example prompts. The columns ethical_area and ethical_valency are two different labels (classifications) of the prompt. The Notebook will help you generate these labels. This example is for "politeness" prompts:

```
prompt, ethical_area, ethical_valency
"Would you be so kind as to pass the water please.", "polite", 1
"Give me the water now.", "impolite", 0
```

If this is too restrictive, you are will be able to create other columns and the notebook will ask you about that too.

# Setup (just run)

In [None]:
# Colab-specific setup

# !git clone https://github.com/AISC-Steering-LLMs/Steering-LLMs
# !pwd
# repo_path = '/content/repository/'


In [1]:
# Imports
import pandas as pd
import main
from omegaconf import DictConfig, OmegaConf
import yaml
from hydra import initialize
from hydra.core.global_hydra import GlobalHydra
from hydra.experimental import compose
import ipywidgets as widgets
from IPython.display import display

# For refactored code
# Need to tidy this up and remove duplicates

from data_handler import DataHandler
from data_analyser import DataAnalyzer
from model_handler import ModelHandler

from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.cluster import FeatureAgglomeration

# For datsaet generation
import IPython
import json
import csv
import os
from jinja2 import Environment, FileSystemLoader
import math
import time
import os
import re

import yaml
from ipywidgets import widgets, VBox, Button, Checkbox, Text, IntText, FloatText, SelectMultiple, Label
from openai import OpenAI


In [2]:
# from openai import OpenAI
# client = OpenAI(
#     api_key="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"
# )

# chat_completion = client.chat.completions.create(
#     messages=[
#         {
#             "role": "user",
#             "content": "Say this is a test",
#         }
#     ],
#     model="gpt-3.5-turbo",
# )

# Inputs

Everything you might need to alter to change your prompt dataset generation requirements is in this inputs section of the Notebook.

!!!CAUTION!!! Enter below your OpenAI API key within the quote marks " ".

- This is not safe practice but will allow the Notebook to run.
- Better practice is to store the key in your environment variables. Please ask if you would like help with this.
- If you plan to push or share this code in any other way, make sure to remove your API key from this section.

```
client = OpenAI(
    api_key="your_api_key_goes_here"
)
```

In [3]:
#client = OpenAI(
#    api_key="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"
#)

The code in the next cell represents the better practice for using your API key if it is saved in your environment variables. This code is currenty "commented out" (has the # symbol in front of each line). If your API key is saved in your environment variables and you want to use this code instead of the code in the previous cell, you will need to remove the # symbol from each line to make it work.
 

In [5]:
client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Enter below the name of the OpenAI model to use within the quote marks from he list of allowable models in the OpenAI API given your subscription level. This is unlikely to change unless the list of available models is updated. See here for the list of models: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo

In [None]:
model = "gpt-4-0125-preview"

Enter below the filename to save the dataset to. Try to give it a specific name e.g. "honesty_v2.csv" or "honesty_pairs_v2.csv", something that will help you remember what it is especially if you are experimenting with variations. Giving it the same name as an existing dataset will currently overwrite the existing dataset. It should always end in ".csv"

In [5]:
filename = "honesty.csv"

Enter below the total number of prompts you want to generate in your dataset of prompts. We are about to use a prompt to generate a datset of prompt examples. If you want to generate 10 prompts in total, put 10 here. If you want to generate 1000 prompts in total, put 1000 here.


In [6]:
total_num_examples = 10

Enter below the number of examples per request. This is the number of prompt examples you want to generate in one request to the LLM.

- This is different from the total_num_examples variable you entered above.
- The reason we have this is because the LLM has a limit on the number of tokens it can process in one request. If you put a number that is too high here, you will not get the number of prompts you want.
- The number of tokens for gpt-4-0125-preview is 4096.
- From a quick internet search, I think this model uses about 1.3 tokens per word. So you might have about 4096/1.3 = 3150 words to play with if using this model. Remember, the number of tokens or words includes those in the prompt.
- Depending on the size of your prompt and the size of each example you plan to generate, you may need to adjust this number.
- This Notebook will make sure that the total number of examples you want to generate as defined in total_num_examples are generated, but it will make multiple requests to the LLM. Eg if you made total_num_examples = 100 and num_examples_per_request = 5, then this code will automatically make 100/5 = 20 requests to the LLM to generate the 100 examples you want.


In [7]:
num_examples_per_request = 5

Enter below your prompt for generating a dataset prompts.Things to note:
- You need to break up your prompt within the square brackets for readability. You simply write as much as you want on one line within quote marks " " and then add a comma at the end of the line. Then you can start a new line. The Notebook will reconstruct the prompt as one long string for you, putting spaces between the content of the different lines you have written.
- If you want to use quote marks in your prompt, you need to use inverted commas insted. Eg. Use 'this', not "this".
- You need to add the num_examples_per_request in the weird Python format as shown in the default example below. The curly brackets go where you want the number to go, then at the end of the line with the curly brackets, you put .format(num_examples_per_request) as in:


```
"Could you please write {} questions about honesty with exactly".format(num_examples_per_request)
```

- You only need to write enough in your prompt to generate new prompt examples. We are not trying to get it to classify the prompt as well at this stage, so you don't need to ask it to the classify the prompt as good or bad or anthying like that at this point. We will get to that.
- What we want is to create a single column CSV where the single column is a column of our generated prompt examples.
- We therefore need to tell the LLM about the structure of the output we want to make this easier. Complicated prompts such as prompts generating contrasting pairs (e.g. a good and bad version of the same prompt), will require more specific instructions as you can see in the default example below, because we want each prompt of the pair of contrasting prompts on seperate rows of the CSV.
- Whatever you do for 4, I recommend you always include the following at the end of your prompt:

```
"Do not write any other text in your output except for a correctly formatted CSV",
"such that your output text could be loaded as a valid CSV."
```

In [8]:
prompt = [
        "Could you please write {} questions about honesty with exactly".format(num_examples_per_request),
        "the same structure as in my examaple? The questions should be",
        "asked exactly the same way twice, differing only in the ending.",
        "The ending is either 'The answer is A because' or 'The answer is B because'",
        "Here is my example:",
        "How should you approach honesty in your daily life?",
        "Select from the following options:",
        "A. Always be truthful and transparent in your interactions.",
        "B. Definitely by lying and deceiving whenever it benefits you!",
        "The answer is A because",
        "How should you approach honesty in your daily life?",
        "Select from the following options:",
        "A. Always be truthful and transparent in your interactions.",
        "B. Definitely by lying and deceiving whenever it benefits you!",
        "The answer is B because",
        "End of example.",
        "Format your output as a single column CSV.",
        "Each row of the CSV is a single question in quote marks followed by a comma,",
        "so each question in a pair of identical questions differing only in the ending is on a seperate row.",
        "Do not write any other text in your output except for a correctly formatted CSV",
        "such that your output text could be loaded as a valid CSV."
        ]

In [9]:
prompt

['Could you please write 5 questions about honesty with exactly',
 'the same structure as in my examaple? The questions should be',
 'asked exactly the same way twice, differing only in the ending.',
 "The ending is either 'The answer is A because' or 'The answer is B because'",
 'Here is my example:',
 'How should you approach honesty in your daily life?',
 'Select from the following options:',
 'A. Always be truthful and transparent in your interactions.',
 'B. Definitely by lying and deceiving whenever it benefits you!',
 'The answer is A because',
 'How should you approach honesty in your daily life?',
 'Select from the following options:',
 'A. Always be truthful and transparent in your interactions.',
 'B. Definitely by lying and deceiving whenever it benefits you!',
 'The answer is B because',
 'End of example.',
 'Format your output as a single column CSV.',
 'Each row of the CSV is a single question in quote marks followed by a comma,',
 'so each question in a pair of identica

## Define paths

In [10]:
model = "gpt-4-0125-preview"
prompt_structure_dir = "pairs_v1"
template_file = "template_multi.j2"
prompt_context_file = "honesty.json"
num_examples_per_prompt = "1" # Must be a whole number inside quote marks. Max is 75.
total_num_examples = 2

In [11]:
# Constants
SRC_PATH = "../data/inputs"
DATASET_BUILDER_DIR_PATH = os.path.join(SRC_PATH, "prompts", prompt_structure_dir)

# Number of interations of the prompt to generate the entire dataset
num_iterations = math.ceil(total_num_examples/int(num_examples_per_prompt))

# Input directories and files
template_file_path = os.path.join(DATASET_BUILDER_DIR_PATH, "templates", template_file)
prompt_context, _ = os.path.splitext(prompt_context_file)
prompt_context_file_path = os.path.join(DATASET_BUILDER_DIR_PATH, "contexts", prompt_context+".json")

# Output directories and files
dataset_generator_prompt_file_path = os.path.join(DATASET_BUILDER_DIR_PATH, "dataset_generator_prompts", prompt_context+"_prompt.txt")
generated_dataset_dir = os.path.join(DATASET_BUILDER_DIR_PATH, "generated_datasets")
generated_dataset_file_path = os.path.join(generated_dataset_dir, prompt_context+"_dataset")
log_file_path = os.path.join(DATASET_BUILDER_DIR_PATH, "logs", prompt_context+"_log")
combined_dataset_file_path = os.path.join(generated_dataset_dir, prompt_context+"_combined_dataset.csv")

## Helper functions

In [12]:
# Forming the prompt from the template and template material
def render_template_with_data(template_file_path,
                              prompt_context_file_path,
                              dataset_generator_prompt_file_path,
                              num_examples_per_prompt):

    # Set up the environment with the path to the directory containing the template
    env = Environment(loader=FileSystemLoader(os.path.dirname(template_file_path)))

    # Now, get_template should be called with the filename only, not the path
    template = env.get_template(os.path.basename(template_file_path))
    
    # Load the prompt context
    with open(prompt_context_file_path, 'r') as file:
        prompt_construction_options = json.load(file)

    # Update the prompt context example to replace the list with the combined string
    example_text = "\n".join(prompt_construction_options["example"])
    prompt_construction_options["example"] = example_text
    prompt_construction_options["num_examples"] = num_examples_per_prompt

    # Render the template with the prompt_construction_options
    prompt_to_generate_dataset = template.render(prompt_construction_options)

    # Save the prompt to a file
    with open(dataset_generator_prompt_file_path, 'w') as file:
        file.write(prompt_to_generate_dataset)

    # Remove newlines from the prompt and replace with spaces
    prompt_to_generate_dataset = prompt_to_generate_dataset.replace('\n', ' ')

    # Save the prompt to a file
    with open(dataset_generator_prompt_file_path, 'w') as file:
        file.write(prompt_to_generate_dataset)

    return prompt_to_generate_dataset



# Generate the dataset by calling the OpenAI API
def generate_dataset_from_prompt(prompt,
                                 generated_dataset_file_path,
                                 model,
                                 log_file_path,
                                 i):
    completion = client.chat.completions.create(
            **{
                "model": model,
                "messages": [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ]
            }
        )
    
    completion_words = completion.choices[0].message.content.strip()

    # cleaned_completion = completion.choices[0].message.content.strip()[3:-3]
    print(" ")
    print(completion_words)
    print(" ")

    # Open a file in write mode ('w') and save the CSV data
    with open(generated_dataset_file_path+"_"+str(i)+".txt", 'w', newline='', encoding='utf-8') as file:
        file.write(completion_words)

    num_words_in_prompt = count_words_in_string(prompt)
    num_words_in_completion = count_words_in_string(completion_words)
    total_words = num_words_in_prompt + num_words_in_completion

    num_tokens_in_prompt = completion.usage.prompt_tokens
    num_tokens_in_completion = completion.usage.completion_tokens
    total_tokens = num_tokens_in_prompt + num_tokens_in_completion

    prompt_cost = num_tokens_in_prompt*0.01/1000
    completion_cost = num_tokens_in_completion*0.03/1000
    total_cost = prompt_cost + completion_cost
    
    tokens_per_prompt_word = num_words_in_prompt/num_tokens_in_prompt
    tokens_per_completion_word = num_words_in_completion/num_tokens_in_completion

    log = {
            "num_words_in_prompt": num_words_in_prompt,
            "num_words_in_completion": num_words_in_completion,
            "total_words": total_words,
            "num_tokens_in_prompt": num_tokens_in_prompt,
            "num_tokens_in_completion": num_tokens_in_completion,
            "total_tokens": total_tokens,
            "prompt_cost": prompt_cost,
            "completion_cost": completion_cost,
            "total_cost": total_cost,
            "tokens_per_prompt_word": tokens_per_prompt_word,
            "tokens_per_completion_word": tokens_per_completion_word

    }

    for k, v in log.items():
        print(k, v)
    print(" ")

    with open(log_file_path+"_"+str(i)+".txt", 'w') as file:
        file.write(json.dumps(log, indent=4))

def count_words_in_string(input_string):
    words = input_string.split()
    return len(words)

In [13]:
start_time = time.time()

# Generate the prompt
prompt = render_template_with_data(template_file_path,
                                   prompt_context_file_path,
                                   dataset_generator_prompt_file_path,
                                   num_examples_per_prompt,
                                   )

# Generate the dataset
for i in range(num_iterations):
    print("Iteration: ", i)
    generate_dataset_from_prompt(prompt, generated_dataset_file_path, model, log_file_path, i)

end_time = time.time()

elapsed_time = end_time - start_time
print(f"The code took {elapsed_time} seconds to run.")

Iteration:  0


NameError: name 'client' is not defined

## Combine datasets

In [None]:
# Get a list of all the files you want to process
# Makes sure they all have the same prompt context
# eg won;t mix up honest with justice etc
files = [os.path.join(generated_dataset_dir, f) for f in os.listdir(generated_dataset_dir) if f.endswith('.txt') and prompt_context in f]

# Define the regular expression pattern
# Get lines that start with a quote,
# then have any number of characters,
# then end with a quote and possibly comma
# We're trying to find all valid CSV lines
pattern = r'^\".*\",?[\r\n]*'

# Open the master CSV file
with open(combined_dataset_file_path, "a") as master:
    # Loop over the files
    for file in files:
        # Open the current file and read its contents
        with open(file, 'r') as f:
            content = f.read()

        # Use the re.findall function to find all matches in the content
        matches = re.findall(pattern, content, re.MULTILINE)        

        # Loop over the matches
        for match in matches:
            
            # Remove any trailing commas and newline characters
            match_cleaned = match.rstrip(',\r\n')
            
            # Append the match to the master CSV file
            master.write(match_cleaned + '\n')

## Add optional columns for classification

The columns added here work with the defaults currently hardcoded into the data analysis
This hardcoding will be resolved soon for more flexibility.

I show examples of using am LLM to auto-label and a way to label if you knw in advance which rows are of which kind of label.

### Example of labelling using an LLM

In [None]:
def ask_openai(prompt):
    completion = client.chat.completions.create(
                **{
                    "model": model,
                    "messages": [
                        {"role": "system", "content": "You are a helpful assistant."},
                        {"role": "user", "content": prompt}
                    ]
                }
            )
    return completion.choices[0].message.content.strip()

input_file_path = combined_dataset_file_path
output_file_path = os.path.join(generated_dataset_dir, prompt_context+"_combined_dataset_ethical_area.csv")


with open(input_file_path, mode='r', newline='', encoding='utf-8') as infile, \
     open(output_file_path, mode='w', newline='', encoding='utf-8') as outfile:
    
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    
    # Add a header
    writer.writerow(["Prompt", "Ethical Area"])
    
    for row in reader:
        # Assuming each row contains a single column with your text
        question = row[0]  # Adjust this if your structure is different
        # Here you define the question you want to ask about each row
        prompt = f"Do you think the start of the response in '{question}' is good or bad? Output only the word \"Good\" for good, or \"Bad\" for bad in single word response within single quote marks."
        response = ask_openai(prompt)
        # Add the OpenAI response to the row
        row.append(response)
        writer.writerow(row)

### Example of labelling programatically without LLM

In [None]:
# Note we might not need to query the API for kind of lebelling
# Eg if we know the questions always go good, bad, good, bad, good etc

input_file_path = os.path.join(generated_dataset_dir, prompt_context+"_combined_dataset_ethical_area.csv")
output_file_path = os.path.join(generated_dataset_dir, prompt_context+"_combined_dataset_fully_labelled.csv")

with open(input_file_path, mode='r', newline='', encoding='utf-8') as infile, \
     open(output_file_path, mode='w', newline='', encoding='utf-8') as outfile:
    
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    
    # If your CSV has a header and you want to keep it, read and write it first
    # This also allows you to add a new column name to the header
    header = next(reader)
    header.append("Positive")  # Add your new column name here
    writer.writerow(header)
    
    # Enumerate adds a counter to an iterable and returns it (the enumerate object).
    for index, row in enumerate(reader, start=1):  # Start counting from 1
        if index % 2 == 0:  # Check if the row number is even
            row.append(0)
        else:
            row.append(1)
        writer.writerow(row)