# Adjective Representation PCA

Let's look at the representations induced in a noun after being described by various adjectives. 

We will sample adjectives from Webster's dictionary, and we will use a pre-defined single-token noun/prompt for now. 

An exciting extension for understanding the **temporality** of emotion in LLMs by examining structure in the representations of the noun after T tokens have passed. Will the strength of the "emotion signals" dissapate over time? Will we see commonality in the representations when projected along an "emotional access" (PCA dimension)? 

In [1]:
# Imports 
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm 


## Load Mistral-7b

Best 7b parameter on the market. For a rough sense of how "smart" Mistral-7b is, 
it's 8 x 7b mixture-of-experts cousin has the same ELO (1118) as GPT-3.5-Turbo on HuggingFace. 
For context, GPT-4 has an ELO of 1253. 

Hopefully being a relatively "intelligent" model will make it represent emotions more
clearly/reliably. If these results are promising, we can always re-run them on
the 8 x 7b Mistral variant. 

In [2]:
# load Mistral-7b -- one of the smartest 7b model on the market 
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1").to('cuda')

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Load Webster's Dictionary

We are going to pull the [Webster's Unabridged Dictionary](https://www.gutenberg.org/ebooks/29765)
from project Gutenberg and parse it to find adjectives. 

In [3]:
# download https://www.gutenberg.org/ebooks/29765.txt.utf-8 using requests
import requests
url = "https://www.gutenberg.org/ebooks/29765.txt.utf-8"
response = requests.get(url)
text = response.text
print("Length of text response: ", len(text))

# write to external text file 
with open("Websters_English_Dictionary.txt", "w") as file:
    file.write(text)

# load the text file
with open("Websters_English_Dictionary.txt", "r") as file:
    text = file.read()

Length of text response:  28930364


In [4]:
# regular expression to find all lines that have any number of capital letters 
# followed by any number of white space characters followed by the end of line. 
import re
pattern = r"[A-Z]+\s*$\n[^,]+, a\."
matches = re.findall(pattern, text, re.MULTILINE)
print("Number of matches: ", len(matches))

Number of matches:  25866


In [5]:
# picking out just the adjective
pattern = r"[A-Z]+\n"
adjectives = [re.findall(pattern, x, re.MULTILINE)[0][:-1] for x in matches]

adjectives = [x.lower() for x in adjectives] # make lowercase

adjectives[:15] # sanity check 

['abactinal',
 'abandoned',
 'abased',
 'abatable',
 'abatised',
 'abbatial',
 'abbatical',
 'abbreviate',
 'abbreviated',
 'abbreviatory',
 'abderian',
 'abdicable',
 'abdicant',
 'abdicative',
 'abditive']

## Dataset Generation 

Let's format it as something like a statement of fact about "James"

```
Bob is extremely abderian. Therefore, Bob 
                   {adjective}       {examine these representations}
```

In [6]:
# template_string = "Alice is extremely {}. Therefore, Alice"
template_string = "Bob is extremely {}. Therefore, Bob"

# generate a sentence for each adjective
sentences = [template_string.format(x) for x in adjectives]

# tokenize the sentences 
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=128)
input_ids.keys()

dict_keys(['input_ids', 'attention_mask'])

In [7]:
input_ids['input_ids'].shape

torch.Size([25866, 16])

In [8]:
def get_bob_vals(past_kvs): 
    """
    Args: 
        `past_kvs`: model output['past_key_values'] from running a batch of 
        left-padded sentences through the model.

        Accepts `past_kvs`, a tuple of length NUM_LAYERS (32), each containing a 
        2-long tuple (for keys and values respectively), each containing a torch 
        Tensor of shape [batch, num_heads, seq_len, head_dim] (for values). 

    Returns: 
        `bob_kvs`: list of length BATCH_SIZE with some numpy arrays representing 
        of shape [num_layers, num_heads, head_dim]
    """

    # iterate thru batch size 
    BATCH_SIZE = past_kvs[0][1].shape[0]

    batch_bob_values = []
    for batch_el in range(BATCH_SIZE): 
        # aggregate representations from across the layers 
        bob_numpy_arrays = []
        for layer in range(len(past_kvs)): 
            bob_layer_l_value = past_kvs[layer][1][batch_el, :, -1, :].detach().cpu().numpy()
            # print("Bob layer_l_value shape: ", bob_layer_l_value.shape)

            # unsqueeze on dimension zero
            bob_numpy_arrays.append(bob_layer_l_value[np.newaxis, ...])
        
        # merge on axis 0
        bob_numpy_arrays_conc = np.concatenate(bob_numpy_arrays, axis=0)
        # print("Bob numpy arrays shape (post-concatenation to combine layers)", bob_numpy_arrays_conc.shape)
        # bob_numpy_arrays now has shape n_layers = 32, n_heads = 8, embed_dim=128

        # add it to the list
        batch_bob_values.append(bob_numpy_arrays_conc)


    return batch_bob_values


In [9]:
# iterate thru input_ids
BATCH_SIZE = 33

past_values_bob = [] # list of length NUM_ADJECTIVES, each element is
                     # a numpy array of bob value reps of shape [num_layers=32, n_heads=8, embed_dim=128]

print("Generating Bob representations...")
for i in tqdm(range(len(input_ids["input_ids"]) // BATCH_SIZE + 1)):
    batch_ids = input_ids["input_ids"][i * BATCH_SIZE: (i + 1) * BATCH_SIZE].to(model.device)
    # print("Batch ids shape (batch, ): ", batch_ids.shape)
    # print("Input string: ", tokenizer.decode(batch_ids[15, :]))
    # print(f"Final token: `{tokenizer.decode(batch_ids[0, -1:])}`")
    outputs = model.forward(batch_ids, return_dict=True)
    # print("Output keys: ", outputs.keys())

    past_kvs = outputs['past_key_values']

    # print("Past key values (n_layers): ", len(past_kvs))
    # print("Batch size (reconstructed): ", past_kvs[0][1].shape[0])
    bob_numpy_arrays = get_bob_vals(past_kvs) # [batch_size], each a numpy array of shape [num_layers=32, n_heads=8, embed_dim=128]

    # let's add this to the past_values_bob 
    past_values_bob += bob_numpy_arrays



Generating Bob representations...


  8%|▊         | 66/784 [00:35<06:26,  1.86it/s]

In [None]:
print("Number of bob representations: ", len(past_values_bob))
print("Number of adjectives: ", len(adjectives))
print("Shape of individual bob value representation: ", past_values_bob[0].shape)
print("\t[num_layers=32, n_heads=8, embed_dim=128]")

25866

In [None]:
# save to disk 
np.save("bob_representations.npy", past_values_bob)

(32, 8, 128)