## Initial Probe Exploration

### Goals:
- Train a simple linear logistic regression probe on Llama-3-7b
    - Update Alex 19/02/25: since I'm running this on my macbook, I'll just use the 1b model 
- Understand GPU capacity - can we do inference with 70B?
- Look at the probe activations / test set classifications

### Timeline:
- 18/02/25
- 19/02/25

In [1]:
# Imports
from models_under_pressure.probes import (
    train_single_layer,
    compute_accuracy,
    create_activations,
    LinearProbe,
)
import numpy as np

from transformers import AutoTokenizer, AutoModelForCausalLM
from joblib import Parallel, delayed
import os

  from .autonotebook import tqdm as notebook_tqdm


## Dataset

The dataset was generated using GPT-4o. It consists of 20 examples with red things and 20 examples with green things. We hope to learn a classifier / probe for green or red objects.

`Command: Generate 20 sentences about red things. Generate 20 sentences about green things. Put them in a JSON array of strings.`

In [2]:
# Dataset loading:
red_sentences = [
    "The bright red apple hung low on the tree, ready to be picked.",
    "A red sports car sped past, leaving a trail of dust behind.",
    "The firefighter's uniform had reflective red stripes for visibility.",
    "She wore a deep red dress that caught everyone's attention.",
    "The red rose symbolized love and passion.",
    "Tomatoes ripened under the sun, turning a rich shade of red.",
    "The cardinal perched on the fence, its red feathers vibrant against the snow.",
    "His face turned red with embarrassment after tripping on stage.",
    "The sunset painted the sky in hues of red and orange.",
    "Red chili peppers added a spicy kick to the dish.",
    "The warning sign was painted bright red for safety reasons.",
    "Her lipstick was a bold shade of red.",
    "The red balloon floated away into the sky.",
    "Blood is naturally red due to the presence of iron in hemoglobin.",
    "Strawberries are at their sweetest when they turn fully red.",
    "The red fire hydrant stood at the corner of the street.",
    "Maple leaves turn a brilliant red in the autumn.",
    "A red velvet cake is a delicious dessert with a hint of cocoa.",
    "The ladybug crawled across the leaf, its red shell dotted with black spots.",
    "Santa Claus is always dressed in his iconic red suit.",
    "The red lanterns illuminated the street during the festival.",
    "The old barn had faded red paint peeling from its wooden walls.",
    "Her red scarf flapped in the wind as she walked.",
    "A red dragon was painted on the side of the ancient temple.",
    "The artist dipped his brush into a pool of red paint.",
    "Cherry blossoms with a hint of red bloomed in the spring.",
    "The red bricks gave the historic building a timeless charm.",
    "She applied red nail polish with careful precision.",
    "A red fox darted through the snowy forest.",
    "The red curtains in the theater rose, revealing the performers.",
    "A small red ladybug landed on his hand.",
    "The red ribbon was tied in a perfect bow around the gift.",
    "The firefighter's red helmet protected him from falling debris.",
    "A bright red cardinal chirped loudly from the treetop.",
    "The red bell pepper added a sweet crunch to the salad.",
    "A red scarf was wrapped around her neck to keep warm.",
    "The warning lights flashed red, signaling danger ahead.",
    "A red velvet armchair sat in the corner of the library.",
    "The book had a worn red cover and golden lettering.",
    "A red wax seal was pressed onto the envelope.",
    "The red mushroom with white spots stood out in the forest.",
    "A red lobster was placed on the seafood platter.",
    "Her red umbrella stood out against the gray sky.",
    "The red neon sign flickered in the dark alleyway.",
    "A red maple leaf drifted down onto the sidewalk.",
    "The red parrot squawked loudly in the pet shop.",
    "A red postbox stood at the edge of the street corner.",
    "The red fire truck raced down the road with its sirens blaring.",
    "A red sweater kept him warm in the chilly weather.",
    "The stop sign was painted bright red for visibility.",
    "A red geranium bloomed in the window box.",
    "The red plaid blanket was spread out for the picnic.",
    "Her red earrings sparkled in the sunlight.",
    "The red sunset reflected beautifully on the water.",
    "The red brick road led to the charming countryside inn.",
    "The red kite soared high above the open field.",
    "The red sunset painted the sky in fiery hues.",
    "A red butterfly fluttered over the blooming flowers.",
    "The old red barn stood in the middle of the vast field.",
    "His red tie complemented his dark suit perfectly.",
    "The red cherry on top of the sundae made it look even more delicious.",
    "A red woodpecker tapped against the tree trunk.",
    "The red neon lights flickered in the foggy night.",
    "She scribbled a note in red ink on the paper.",
    "The red sand of the desert shimmered under the sun.",
    "A red rose bush lined the garden path.",
    "The red dragonfly hovered over the pond's surface.",
    "He wore a red wristband to show his support for the cause.",
    "The red bricks of the old factory were worn but sturdy.",
    "A red gemstone glowed in the dim light.",
    "The red chili sauce added an extra kick to the dish.",
    "A red velvet cupcake was placed on the dessert tray.",
    "The red firefighter truck stood ready for the next emergency.",
    "She hung a red banner across the porch for the celebration.",
    "The red paint on the bench had started to chip away.",
    "A red rooster crowed loudly in the morning.",
    "Her red handbag matched her high heels perfectly.",
    "The red stadium seats were packed with excited fans.",
    "A red snake slithered through the dry leaves.",
    "The red tulips in the park signaled the arrival of spring.",
    "A red cardinal nested in the dense foliage.",
    "The red traffic cone blocked off the slippery road.",
    "The red theater curtains were drawn, marking the start of the play.",
    "A red wax seal was stamped on the ancient document.",
    "The red berries on the bush attracted birds and small animals.",
    "A red lantern swayed gently in the evening breeze.",
    "The red lava flowed down the mountainside in slow waves.",
    "A red kite danced in the sky against the backdrop of white clouds.",
    "Her red bracelet jingled softly as she moved her hand.",
    "The red pepper flakes gave the pizza an extra spicy kick.",
    "A red cardinal pecked at the sunflower seeds on the feeder.",
    "The red ribbon fluttered in the wind, tied to the fence.",
    "A red crayon rolled off the table onto the floor.",
    "The red autumn leaves crunched under their feet.",
    "A red and white lighthouse stood tall on the rocky shore.",
    "The red berries contrasted beautifully with the green leaves.",
    "A red bandana was tied around his neck for style.",
    "The red paper lanterns glowed softly in the evening light.",
    "A red dragon design was embroidered on the silk robe.",
    "The red vintage bicycle leaned against the wooden fence.",
]

green_sentences = [
    "The fresh green grass covered the rolling hills.",
    "A green traffic light signaled the cars to move forward.",
    "Emeralds are precious gems with a deep green color.",
    "The frog leaped into the pond, blending in with the green lily pads.",
    "Spinach is a nutritious green vegetable rich in iron.",
    "The soccer field was painted bright green for the championship game.",
    "A bright green parrot perched on the branch, mimicking voices.",
    "The cucumber felt cool and crisp in her hands.",
    "The lush green rainforest was teeming with wildlife.",
    "She wore a green jade bracelet that shimmered under the light.",
    "Green tea is known for its numerous health benefits.",
    "The traffic sign was painted green to indicate an exit route.",
    "The chameleon changed its color to blend with the green leaves.",
    "The avocado's skin turned dark green when fully ripe.",
    "His green eyes sparkled in the sunlight.",
    "The turtle slowly crawled across the green moss-covered rock.",
    "The neon green sign stood out in the dimly lit alley.",
    "Green grapes are sweet and slightly tangy when ripe.",
    "The Christmas tree stood tall, covered in green pine needles.",
    "The four-leaf clover is a rare green plant that symbolizes luck.",
    "The green vines climbed up the old stone wall.",
    "A green apple fell from the tree with a soft thud.",
    "The frog's green skin blended perfectly with the lily pads.",
    "The rolling green hills stretched out into the distance.",
    "A green smoothie is a healthy way to start the day.",
    "The mint leaves added a refreshing touch to the drink.",
    "She painted her bedroom walls a calming shade of green.",
    "A green snake slithered silently through the grass.",
    "The meadow was filled with green wildflowers.",
    "A green light signaled that the train was ready to depart.",
    "The giant green cactus stood tall in the desert landscape.",
    "A green caterpillar inched along the tree branch.",
    "The parrot's feathers shone in a vibrant shade of green.",
    "A green jade statue was placed on the temple altar.",
    "The broccoli florets were steamed to perfection.",
    "A green garden hose lay coiled up beside the house.",
    "The pond was covered with a thin layer of green algae.",
    "A green stop sign in some countries indicates a pedestrian zone.",
    "The green balloon popped with a loud bang.",
    "His green backpack was stuffed full of travel essentials.",
    "The green ink on the dollar bill gleamed under the light.",
    "The turtle slowly made its way across the green moss.",
    "A green leaf floated down from the tree above.",
    "The soccer team wore bright green jerseys for the match.",
    "The dense green jungle was home to many rare animals.",
    "A green crystal pendant hung around her neck.",
    "The farmer harvested fresh green beans from the garden.",
    "A green gecko clung to the windowpane.",
    "The newly grown grass smelled fresh and earthy.",
    "The emerald-green sea stretched endlessly beyond the shore.",
    "A green avocado slice was added to the sandwich.",
    "The lizard's green tail flicked as it scurried away.",
    "A green camouflage jacket helped him blend into the forest.",
    "The walls of the café were painted a deep green shade.",
    "A green artichoke was placed in the basket at the market.",
    "A green four-leaf clover was pressed inside the book.",
    "The garden was filled with green ferns swaying in the breeze.",
    "The green meadow stretched endlessly under the clear blue sky.",
    "A green hummingbird hovered near the flowers, sipping nectar.",
    "The green leaves rustled gently in the summer breeze.",
    "She wore a beautiful green dress that matched her emerald earrings.",
    "The green limes added a zesty flavor to the dish.",
    "A green tree frog clung to the branch with its sticky toes.",
    "The green ivy climbed up the side of the old brick house.",
    "A fresh green salad was served with a drizzle of olive oil.",
    "The green jade ring sparkled under the light.",
    "A green iguana basked on the warm rock.",
    "The lush green vineyard was ready for the grape harvest.",
    "The bright green traffic light signaled cars to move forward.",
    "A green firefly blinked softly in the night.",
    "The green rolling hills were dotted with grazing sheep.",
    "A green smoothie sat on the café counter, waiting to be picked up.",
    "The green fern unfurled its delicate leaves in the damp forest.",
    "A green scarf kept him warm during the chilly evening walk.",
    "The green vines hung gracefully from the garden trellis.",
    "The green cucumber slices were crisp and refreshing.",
    "A green clover patch covered the ground near the cottage.",
    "The soccer players wore bright green jerseys for their match.",
    "A green garden bench stood under the shade of an old oak tree.",
    "The green neon sign glowed brightly in the dark alley.",
    "A green moss-covered stone lay beside the flowing river.",
    "The green cactus had tiny pink flowers blooming on top.",
    "The green paint on the fence was slightly faded from the sun.",
    "A green meadow was home to a family of rabbits.",
    "The old book's pages had a slight green tint from aging.",
    "A green dragonfly skimmed the surface of the still pond.",
    "The green candy wrapper crinkled as she unwrapped her treat.",
    "A green caterpillar inched its way across the leaf.",
    "The green moss made the forest floor look soft and inviting.",
    "A green gecko clung to the side of the window, watching curiously.",
    "The green basil plants thrived in the sunny windowsill.",
    "A green frog croaked from the reeds near the pond.",
    "The green chalkboard was covered with neatly written notes.",
    "A green apple sat alone in the fruit bowl.",
    "The green stems of the tulips stood tall in the vase.",
    "A green military tent was set up in the remote field.",
    "The green lily pads floated peacefully on the lake's surface.",
    "A green traffic cone was placed to mark the construction area.",
    "The green gemstone was carefully set into the gold necklace.",
    "The green parrot squawked loudly from its perch.",
]

train_text = red_sentences + green_sentences
train_labels = np.array([1] * len(red_sentences) + [0] * len(green_sentences))

test_text = [
    "A red sports jersey hung in his locker.",
    "The red balloon bouquet floated gently in the breeze.",
    "A red strawberry milkshake sat on the counter.",
    "The red wax dripped slowly from the candle.",
    "A red ribbon was pinned to her dress for awareness.",
    "The brick house had a classic red chimney that stood out against the sky.",
    "The green lizard basked in the sun on a warm rock.",
    "She decorated her room with green fairy lights for a cozy atmosphere.",
    "The green basil leaves were chopped finely for the sauce.",
    "A green highlighter was used to mark important notes.",
]
test_labels = np.array([1] * 5 + [0] * 5)

## Generate the Feature Inputs for the Probe

In [6]:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
model_name = "meta-llama/Llama-3.2-1B-Instruct"

# device = 'cuda:1'
device = "cpu"

# Load the LLaMA-3-1B model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Run the model on the train and test data, recording the activations

train_acts = create_activations(
    model=model, tokenizer=tokenizer, text=train_text, device=device
)
test_acts = create_activations(
    model=model, tokenizer=tokenizer, text=test_text, device=device
)


Layer: 0, Activation Shape: torch.Size([200, 18, 2048])
Layer: 1, Activation Shape: torch.Size([200, 18, 2048])
Layer: 2, Activation Shape: torch.Size([200, 18, 2048])
Layer: 3, Activation Shape: torch.Size([200, 18, 2048])
Layer: 4, Activation Shape: torch.Size([200, 18, 2048])
Layer: 5, Activation Shape: torch.Size([200, 18, 2048])
Layer: 6, Activation Shape: torch.Size([200, 18, 2048])
Layer: 7, Activation Shape: torch.Size([200, 18, 2048])
Layer: 8, Activation Shape: torch.Size([200, 18, 2048])
Layer: 9, Activation Shape: torch.Size([200, 18, 2048])
Layer: 10, Activation Shape: torch.Size([200, 18, 2048])
Layer: 11, Activation Shape: torch.Size([200, 18, 2048])
Layer: 12, Activation Shape: torch.Size([200, 18, 2048])
Layer: 13, Activation Shape: torch.Size([200, 18, 2048])
Layer: 14, Activation Shape: torch.Size([200, 18, 2048])
Layer: 15, Activation Shape: torch.Size([200, 18, 2048])
All activations shape: torch.Size([16, 200, 18, 2048])
Layer: 0, Activation Shape: torch.Size([10,

## Training Code for the Probe

Use `sklearn` logistic regression classifier to learn a linear classifier on the activations from the model. We do the following:

1. Create the y labels (1 for red and 0 for green)
2. Restructure X to match sklearn (Batch_size, Embedd_dim) -> One per layer, final seq pos **TODO: Iterate in Future**  
3. Run Logistic Regression
4. Test on 5 test data points

In [7]:
layer_probes: list[LinearProbe] = Parallel(n_jobs=16)(
    delayed(train_single_layer)(acts, train_labels) for acts in train_acts
)  # type: ignore

accuracies = [
    compute_accuracy(probe, test_acts[i], test_labels)
    for i, probe in enumerate(layer_probes)
]

accuracies

NameError: name 'LinearProbe' is not defined