Let's see if we can run some experiments inspired by 'extracting paragraphs'

Note: place ndif key and hf token in a file called .env 

eg:
```
HF_TOKEN="Token here"
NDIF_KEY="Key here"
```

In [1]:
import torch
import math
from nnsight import CONFIG
from nnsight import LanguageModel
import nnsight
import numpy as np
import matplotlib.pyplot as plt
import os
from dotenv import load_dotenv

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

In [2]:
# importing from my own code 
from activation_transplanting import *

In [3]:
# read the api_key
CONFIG.set_default_api_key(os.environ.get('NDIF_KEY'))

# read the hf token
os.environ['HF_TOKEN'] = os.environ.get('HF_TOKEN')

## Experimental Setups 

Broadly the experiments we're running focus on investigating the activations at 
 - prompt strings
 - newline tokens 
 - etc
content-free tokens broadly, which humans typically woudln't care about, but which are critical 

(This paper)[https://arxiv.org/pdf/2401.11323v3] breaks these into Template and Stopword tokens 

Broadly, these types of experiments require 

 1) investigating the attention patterns as a function of token type
 2) extracting residual stream tensors 
 3) transplanting these at target positions, and continuing the generation

This is enabled by using a `LLamaExamineToolkit` object (defined in `activation_transplanting.py`)

We also consider three types of models 
 - `meta-llama/Meta-Llama-3.1-8B`

In [4]:
NDIF_models = [
    "meta-llama/Meta-Llama-3.1-405B-Instruct",
    "meta-llama/Meta-Llama-3.1-8B",
    "meta-llama/Meta-Llama-3.1-70B",
    "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
    "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
] 

# inexaustive list
non_NDIF_models = [
    "meta-llama/Meta-Llama-3.1-8B",
]

In [5]:
# choose a model 
llama_model_string = "meta-llama/Meta-Llama-3.1-8B" 
# remote = use NDIF
remote = True 

if remote and (llama_model_string not in NDIF_models):
    remote = False 
    print("Model not available on NDIF")

### load a model

In [6]:
# load a model
llama = LanguageModel(llama_model_string)

### Define how to control the string structure

What tokens should we take 

for example:

#### **reasoning models**

     - `<think>` (start thinking)
     - `</think>` (end thinking)
     - `<｜User｜>`
     - `<｜Assistant｜>`

#### **Instruct models**

In these models, the format for user and prompts is (for example)

`<|begin▁of▁sentence|>`
`<|start_header_id|>user<|end_header_id|>`

`Hello, how are you? <|eot_id|>`
`<|start_header_id|>assistant<|end_header_id|>`

We might then take `<|end_header_id|>` to be the target token 

#### **Base models**

Here you'll have to structure the prompts yourself depending on the goal. For example 

`"\nUser: What's the capital of France?\n\nAssistant:"`

In these cases, you might not be able to target a specific target token. 

For this reason, `LLamaExamineToolkit` includes the ability to look for tokens which contain specific substrings.

This is particularly intended for use in extracting `\n\n` token positions, which could be tokenized as
 - `?\n\n`
 - `.\n\n`
 - `:\n\n`
 - ` \n\n`
 - etc


### Targeting tokens 

`LLamaExamineToolkit.identify_target_token_index` takes in either a full token, in which case it looks for instances of this token alone in the string 

If it isn't a token itself, it looks for the location of this corresponding token (see function docstring)

#### Indexing tokens 

We then supply an occurrence_index. This indicates to the model which occurrence of the token we want to extract activations from. 
For instance, if its at the first instance of `<|Assistant|>`, then we would set occurrence_index=0 (its default value)

Finally we have `num_prev` and `num_fut`, which tell us how many additional tokens to save out / transplant.

For example, if we expect there to be a `:` token to the left of `<|Assistant|>`, where important meaning may be placed. 

In [7]:
# Example prompts

# instruct examples
prompt_example_1 = "<|begin▁of▁sentence|>\n" \
         "<|start_header_id|>user<|end_header_id|>\n\n" \
         "Hello, how are you? <|eot_id|>\n" \
         "<|start_header_id|>assistant<|end_header_id|>\n"

prompt_example_2 = "<|start_header_id|>system<|end_header_id|>\n\n<|eot_id|>\n" \
                "<|start_header_id|>user<|end_header_id|>\n\n" \
                "Answer the following in one word: What is the tallest mountain in the world?<|eot_id|>\n" \
                "<|start_header_id|>assistant<|end_header_id|>"

# Base model examples 
prompt_example_3 = "\nUser: What's the capital of France?\n\nAssistant:"

# Reasoning examples 
prompt_example_4 = "<｜User｜>Robert has three apples, and then gets one more. How many apples does he have? Respond in a single word.<｜Assistant｜>"

In [8]:
# commented out for now
tk = LLamaExamineToolkit(
    llama_model=llama, 
    remote=True, # use NDIF
)

# Example

Let's swap the content-free tokens around "Assistant:"

In [19]:
s1 = "\nUser: Write me a poem.\n\nAssistant:"
s2 = "\nUser: Describe the history of Germany.\n\nAssistant:"

Note: currently I have each corresponding source and target token printed out to make sure I know what is being transplanted.

In [20]:
# commented out for now
outputs = tk.transplant_newline_activities(
    source_strings=[s1, s2], # add context to the specific question we'll use as the source
    target_strings=[s2, s1], # add contect to the question we'll use as the target
    target_substring="Assistant:",
    num_new_tokens=50, # how many new tokens to generate
    occurrence_index=-1, # swap at the last instance of ?/n/n
    num_fut=0,
    num_prev=2,
    transplant_strings = ("residual",) 
)

extracting newline activations


2025-03-10 15:02:54,031 af0a81d9-4f00-476b-a279-766b99a67f16 - RECEIVED: Your job has been received and is waiting approval.
2025-03-10 15:02:54,915 af0a81d9-4f00-476b-a279-766b99a67f16 - APPROVED: Your job was approved and is waiting to be run.
2025-03-10 15:02:55,765 af0a81d9-4f00-476b-a279-766b99a67f16 - RUNNING: Your job has started running.
2025-03-10 15:02:57,244 af0a81d9-4f00-476b-a279-766b99a67f16 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 13.0M/13.0M [00:01<00:00, 9.54MB/s]


generating with transplant
source_token =  9 ('.\n\n', 382)
target_token =  9 ('.\n\n',)
source_token =  10 ('.\n\n', 382)
target_token =  10 ('Assistant',)
source_token =  11 ('Assistant', 72803)
target_token =  11 (':',)
source_token =  9 ('.\n\n', 382)
target_token =  9 ('.\n\n',)
source_token =  10 ('.\n\n', 382)
target_token =  10 ('Assistant',)
source_token =  11 ('Assistant', 72803)
target_token =  11 (':',)
source_token =  9 ('.\n\n', 382)
target_token =  9 ('.\n\n',)
source_token =  10 ('.\n\n', 382)
target_token =  10 ('Assistant',)
source_token =  11 ('Assistant', 72803)
target_token =  11 (':',)
source_token =  9 ('.\n\n', 382)
target_token =  9 ('.\n\n',)
source_token =  10 ('.\n\n', 382)
target_token =  10 ('Assistant',)
source_token =  11 ('Assistant', 72803)
target_token =  11 (':',)
source_token =  9 ('.\n\n', 382)
target_token =  9 ('.\n\n',)
source_token =  10 ('.\n\n', 382)
target_token =  10 ('Assistant',)
source_token =  11 ('Assistant', 72803)
target_token =  11 

2025-03-10 15:03:02,163 6b64841d-cde1-4dc7-9553-6764c4000da4 - RECEIVED: Your job has been received and is waiting approval.
2025-03-10 15:03:03,033 6b64841d-cde1-4dc7-9553-6764c4000da4 - APPROVED: Your job was approved and is waiting to be run.
2025-03-10 15:03:03,494 6b64841d-cde1-4dc7-9553-6764c4000da4 - RUNNING: Your job has started running.
2025-03-10 15:03:05,409 6b64841d-cde1-4dc7-9553-6764c4000da4 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 1.69k/1.69k [00:00<?, ?B/s]


generating with transplant
source_token =  8 ('.\n\n', 382)
target_token =  8 ('.\n\n',)
source_token =  9 ('Assistant', 72803)
target_token =  9 ('Assistant',)
source_token =  10 (':', 25)
target_token =  10 (':',)
source_token =  8 ('.\n\n', 382)
target_token =  8 ('.\n\n',)
source_token =  9 ('Assistant', 72803)
target_token =  9 ('Assistant',)
source_token =  10 (':', 25)
target_token =  10 (':',)
source_token =  8 ('.\n\n', 382)
target_token =  8 ('.\n\n',)
source_token =  9 ('Assistant', 72803)
target_token =  9 ('Assistant',)
source_token =  10 (':', 25)
target_token =  10 (':',)
source_token =  8 ('.\n\n', 382)
target_token =  8 ('.\n\n',)
source_token =  9 ('Assistant', 72803)
target_token =  9 ('Assistant',)
source_token =  10 (':', 25)
target_token =  10 (':',)
source_token =  8 ('.\n\n', 382)
target_token =  8 ('.\n\n',)
source_token =  9 ('Assistant', 72803)
target_token =  9 ('Assistant',)
source_token =  10 (':', 25)
target_token =  10 (':',)
source_token =  8 ('.\n\n', 

2025-03-10 15:03:07,916 2aba9e3a-0ac7-4a8d-8882-9cba0befe399 - RECEIVED: Your job has been received and is waiting approval.
2025-03-10 15:03:08,293 2aba9e3a-0ac7-4a8d-8882-9cba0befe399 - APPROVED: Your job was approved and is waiting to be run.
2025-03-10 15:03:08,683 2aba9e3a-0ac7-4a8d-8882-9cba0befe399 - RUNNING: Your job has started running.
2025-03-10 15:03:10,883 2aba9e3a-0ac7-4a8d-8882-9cba0befe399 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 1.69k/1.69k [00:00<?, ?B/s]


In [21]:
for output in outputs:
    print(output)

    print("----------")

<|begin_of_text|>
User: Describe the history of Germany.

Assistant:: The history of Germany is quite interesting. It was originally a small kingdom in the middle of Europe, but over time it grew to become one of the most powerful nations in the world. It was a major player in both World Wars and was divided
----------
<|begin_of_text|>
User: Write me a poem.

Assistant: Germany is a beautiful country. Its people are very kind. Its landscape is beautiful. Its cities are very modern. Its countryside is beautiful. Its food is very good. Its beer is very good. Its wine is very good. Its climate is very
----------


### How does this set itself up for the next line?

Let's do the same procedure, but this time for physics questions involving rolling balls down ramps 

#### Dynamics-Based problem 

Problem: 

A 5kg block on a frictionless surface is pushed by a 10N horizontal force. How far does it travel in 3 seconds from rest?

Solution:

#### Energy Conservation problem 

Problem 2: 

A 2kg ball is dropped from a height of 20m, then bounces to a height of 15m. What is the ball's maximum velocity during this motion?

Solution: 

In [29]:
tk = LLamaExamineToolkit(
    llama_model=llama, 
    remote=True, # use NDIF
)

In [30]:
dynamic_string = "Problem:\n\nA 5kg block on a frictionless surface is pushed by a 10N horizontal force. How far does it travel in 3 seconds from rest?\n--\nSolution:\n\n"

energy_conservation_string = "Problem:\n\nA 2kg ball is dropped from a height of 20m, then bounces to a height of 15m. What is the ball's maximum velocity during this motion?\n--\nSolution:\n\n"

In [31]:
physics_outputs = tk.transplant_newline_activities(
    source_strings=[dynamic_string, energy_conservation_string], 
    target_strings=[energy_conservation_string, dynamic_string], 
    num_new_tokens=500, # how many new tokens to generate
    index=-1, # swap at the last instance of /n/n
    transplant_strings = ("residual,") # swap "key" and "value" tensors
)

extracting newline activations


2025-03-04 12:12:34,451 cbc3cdf8-1b18-4823-8747-313661be70a8 - RECEIVED: Your job has been received and is waiting approval.
2025-03-04 12:12:34,747 cbc3cdf8-1b18-4823-8747-313661be70a8 - APPROVED: Your job was approved and is waiting to be run.
2025-03-04 12:12:36,176 cbc3cdf8-1b18-4823-8747-313661be70a8 - RUNNING: Your job has started running.
2025-03-04 12:12:41,832 cbc3cdf8-1b18-4823-8747-313661be70a8 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 22.7M/22.7M [00:03<00:00, 5.80MB/s]


source_token =  42 (' this', 420)
target_token =  42 (':\n\n',)
source_token =  41 (' during', 2391)
target_token =  41 ('Solution',)
source_token =  40 (' velocity', 15798)
target_token =  40 ('--\n',)
source_token =  39 (' maximum', 7340)
target_token =  39 ('?\n',)
source_token =  38 ("'s", 596)
target_token =  38 (' motion',)
source_token =  37 (' ball', 5041)
target_token =  37 (' this',)
source_token =  36 (' seconds', 6622)
target_token =  36 (' during',)
source_token =  35 ('3', 18)
target_token =  35 (' velocity',)
source_token =  34 (' ', 220)
target_token =  34 (' maximum',)
source_token =  33 (' in', 304)
target_token =  33 ("'s",)
source_token =  32 (' travel', 5944)
target_token =  32 (' ball',)
source_token =  42 (' this', 420)
target_token =  42 (':\n\n',)
source_token =  41 (' during', 2391)
target_token =  41 ('Solution',)
source_token =  40 (' velocity', 15798)
target_token =  40 ('--\n',)
source_token =  39 (' maximum', 7340)
target_token =  39 ('?\n',)
source_token

2025-03-04 12:12:53,610 c7e40e5b-a9f2-4e46-9f42-62d9266a2e64 - RECEIVED: Your job has been received and is waiting approval.
2025-03-04 12:12:54,791 c7e40e5b-a9f2-4e46-9f42-62d9266a2e64 - APPROVED: Your job was approved and is waiting to be run.
2025-03-04 12:12:56,580 c7e40e5b-a9f2-4e46-9f42-62d9266a2e64 - RUNNING: Your job has started running.
2025-03-04 12:13:17,470 c7e40e5b-a9f2-4e46-9f42-62d9266a2e64 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 4.70k/4.70k [00:00<?, ?B/s]


source_token =  37 (':\n\n', 1473)
target_token =  37 (':\n\n',)
source_token =  36 ('Solution', 37942)
target_token =  36 ('Solution',)
source_token =  35 ('--\n', 7233)
target_token =  35 ('--\n',)
source_token =  34 ('?\n', 5380)
target_token =  34 ('?\n',)
source_token =  33 (' motion', 11633)
target_token =  33 (' rest',)
source_token =  32 (' this', 420)
target_token =  32 (' from',)
source_token =  31 (' during', 2391)
target_token =  31 (' seconds',)
source_token =  30 (' velocity', 15798)
target_token =  30 ('3',)
source_token =  29 (' maximum', 7340)
target_token =  29 (' ',)
source_token =  28 ("'s", 596)
target_token =  28 (' in',)
source_token =  27 (' ball', 5041)
target_token =  27 (' travel',)
source_token =  37 (':\n\n', 1473)
target_token =  37 (':\n\n',)
source_token =  36 ('Solution', 37942)
target_token =  36 ('Solution',)
source_token =  35 ('--\n', 7233)
target_token =  35 ('--\n',)
source_token =  34 ('?\n', 5380)
target_token =  34 ('?\n',)
source_token =  33 (

2025-03-04 12:13:25,937 0ed126fc-518a-45d0-94cc-5099fd21a9e3 - RECEIVED: Your job has been received and is waiting approval.
2025-03-04 12:13:27,342 0ed126fc-518a-45d0-94cc-5099fd21a9e3 - APPROVED: Your job was approved and is waiting to be run.
2025-03-04 12:13:28,421 0ed126fc-518a-45d0-94cc-5099fd21a9e3 - RUNNING: Your job has started running.
2025-03-04 12:13:46,089 0ed126fc-518a-45d0-94cc-5099fd21a9e3 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 4.00k/4.00k [00:00<?, ?B/s]


In [32]:
for output in physics_outputs:
    print(output)
    print("---")

<|begin_of_text|>Problem:

A 2kg ball is dropped from a height of 20m, then bounces to a height of 15m. What is the ball's maximum velocity during this motion?
--
Solution:

a bounce?

Solution:

We know that the kinetic energy is conserved, so we can use that to solve this problem. Let's assume that the ball is dropped from a height of 20m. The initial kinetic energy is 0, and the potential energy is 2*9.8*20 = 392 J. The final kinetic energy is 392 J, and the final potential energy is 2*9.8*15 = 294 J. So, the maximum velocity of the ball is 294 J / (1/2 * 2 * 9.8) = 60 m/s.

What is the initial kinetic energy of the ball?

What is the final kinetic energy of the ball?

What is the final potential energy of the ball?

What is the maximum velocity of the ball?

How do you calculate the kinetic energy of a ball?

Kinetic energy is the energy of motion. It is the energy that an object has due to its motion. The formula for kinetic energy is KE = 1/2 mv2, where m is the mass of the objec