# Semantic Trajectory Notebook (Robust Extraction)
This notebook replicates your working script that extracts contextual trajectories of the token 'run' using robust substring matching and token diagnostics.

## 1. Contextual Sentences

In [1]:
sentences = [
    "She likes to run early in the morning before work.",
    "The software failed to run on the outdated system.",
    "He decided to run for office in the next election.",
    "The faucet was left on and water began to run.",
    "The marathon runner collapsed after a grueling 26-mile run.",
    "They run a small family-owned bakery downtown.",
    "The rumor began to run through the school like wildfire.",
    "The machine will run continuously for 48 hours.",
    "He was caught in a run of bad luck at the poker table.",
    "They run tests on the samples before publishing results.",
    "Run the idea by me one more time.",
    "Let the engine run for five minutes before starting.",
    "Run for your life!",
    "She decided to run the experiment again with new variables.",
    "They’re running a sale on electronics this weekend.",
    "She’s running for president of the council.",
    "The manager ran the meeting efficiently.",
    "He used to run track in college.",
    "They run a family-owned bakery downtown.",
    "He decided to run for office in the next election.",
    "Please run this report by end of day.",
    "Run the installer before rebooting."
]

## 2. Extract BERT Trajectories

In [2]:
from transformers import BertTokenizer, BertModel
import torch, numpy as np
from tqdm import tqdm

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", output_hidden_states=True)
model.eval()

trajectories = []
valid_sentences = []

print("🔍 Matching token: 'run'")

for sent in tqdm(sentences):
    inputs = tokenizer(sent, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    input_ids = inputs["input_ids"][0]
    tokens = tokenizer.convert_ids_to_tokens(input_ids)

    print(f"🧾 Sentence: {sent}")
    print(f"🔤 Tokens: {tokens}")

    match_indices = [i for i, tok in enumerate(tokens) if "run" in tok.lower()]
    if not match_indices:
        print("⚠️ No 'run' token found — skipping.")
        continue

    idx = match_indices[0]
    layers = outputs.hidden_states
    curve = torch.stack([layer[0, idx] for layer in layers])  # shape: (13, 768)
    curve = curve[-12:]  # last 12 layers only

    if curve.shape == (12, 768):
        trajectories.append(curve.numpy())
        valid_sentences.append(sent)
    else:
        print("❌ Unexpected shape — skipping.")

print(f"\n✅ Extracted {len(trajectories)} valid trajectories.")

  from .autonotebook import tqdm as notebook_tqdm


🔍 Matching token: 'run'


  0%|                                                                                                                        | 0/22 [00:00<?, ?it/s]

🧾 Sentence: She likes to run early in the morning before work.
🔤 Tokens: ['[CLS]', 'she', 'likes', 'to', 'run', 'early', 'in', 'the', 'morning', 'before', 'work', '.', '[SEP]']


  9%|██████████▏                                                                                                     | 2/22 [00:00<00:02,  9.36it/s]

🧾 Sentence: The software failed to run on the outdated system.
🔤 Tokens: ['[CLS]', 'the', 'software', 'failed', 'to', 'run', 'on', 'the', 'outdated', 'system', '.', '[SEP]']


 14%|███████████████▎                                                                                                | 3/22 [00:00<00:02,  7.05it/s]

🧾 Sentence: He decided to run for office in the next election.
🔤 Tokens: ['[CLS]', 'he', 'decided', 'to', 'run', 'for', 'office', 'in', 'the', 'next', 'election', '.', '[SEP]']


 18%|████████████████████▎                                                                                           | 4/22 [00:00<00:02,  7.29it/s]

🧾 Sentence: The faucet was left on and water began to run.
🔤 Tokens: ['[CLS]', 'the', 'fa', '##uce', '##t', 'was', 'left', 'on', 'and', 'water', 'began', 'to', 'run', '.', '[SEP]']


 23%|█████████████████████████▍                                                                                      | 5/22 [00:00<00:02,  6.82it/s]

🧾 Sentence: The marathon runner collapsed after a grueling 26-mile run.
🔤 Tokens: ['[CLS]', 'the', 'marathon', 'runner', 'collapsed', 'after', 'a', 'gr', '##uel', '##ing', '26', '-', 'mile', 'run', '.', '[SEP]']


 27%|██████████████████████████████▌                                                                                 | 6/22 [00:00<00:02,  7.23it/s]

🧾 Sentence: They run a small family-owned bakery downtown.
🔤 Tokens: ['[CLS]', 'they', 'run', 'a', 'small', 'family', '-', 'owned', 'bakery', 'downtown', '.', '[SEP]']
🧾 Sentence: The rumor began to run through the school like wildfire.
🔤 Tokens: ['[CLS]', 'the', 'rumor', 'began', 'to', 'run', 'through', 'the', 'school', 'like', 'wild', '##fire', '.', '[SEP]']


 36%|████████████████████████████████████████▋                                                                       | 8/22 [00:00<00:01,  8.91it/s]

🧾 Sentence: The machine will run continuously for 48 hours.
🔤 Tokens: ['[CLS]', 'the', 'machine', 'will', 'run', 'continuously', 'for', '48', 'hours', '.', '[SEP]']


 41%|█████████████████████████████████████████████▊                                                                  | 9/22 [00:01<00:01,  8.55it/s]

🧾 Sentence: He was caught in a run of bad luck at the poker table.
🔤 Tokens: ['[CLS]', 'he', 'was', 'caught', 'in', 'a', 'run', 'of', 'bad', 'luck', 'at', 'the', 'poker', 'table', '.', '[SEP]']


 45%|██████████████████████████████████████████████████▍                                                            | 10/22 [00:01<00:01,  7.92it/s]

🧾 Sentence: They run tests on the samples before publishing results.
🔤 Tokens: ['[CLS]', 'they', 'run', 'tests', 'on', 'the', 'samples', 'before', 'publishing', 'results', '.', '[SEP]']


 55%|████████████████████████████████████████████████████████████▌                                                  | 12/22 [00:01<00:01,  9.37it/s]

🧾 Sentence: Run the idea by me one more time.
🔤 Tokens: ['[CLS]', 'run', 'the', 'idea', 'by', 'me', 'one', 'more', 'time', '.', '[SEP]']
🧾 Sentence: Let the engine run for five minutes before starting.
🔤 Tokens: ['[CLS]', 'let', 'the', 'engine', 'run', 'for', 'five', 'minutes', 'before', 'starting', '.', '[SEP]']
🧾 Sentence: Run for your life!
🔤 Tokens: ['[CLS]', 'run', 'for', 'your', 'life', '!', '[SEP]']


 64%|██████████████████████████████████████████████████████████████████████▋                                        | 14/22 [00:01<00:00,  9.94it/s]

🧾 Sentence: She decided to run the experiment again with new variables.
🔤 Tokens: ['[CLS]', 'she', 'decided', 'to', 'run', 'the', 'experiment', 'again', 'with', 'new', 'variables', '.', '[SEP]']
🧾 Sentence: They’re running a sale on electronics this weekend.
🔤 Tokens: ['[CLS]', 'they', '’', 're', 'running', 'a', 'sale', 'on', 'electronics', 'this', 'weekend', '.', '[SEP]']


 73%|████████████████████████████████████████████████████████████████████████████████▋                              | 16/22 [00:01<00:00, 10.78it/s]

🧾 Sentence: She’s running for president of the council.
🔤 Tokens: ['[CLS]', 'she', '’', 's', 'running', 'for', 'president', 'of', 'the', 'council', '.', '[SEP]']


 82%|██████████████████████████████████████████████████████████████████████████████████████████▊                    | 18/22 [00:01<00:00, 10.98it/s]

🧾 Sentence: The manager ran the meeting efficiently.
🔤 Tokens: ['[CLS]', 'the', 'manager', 'ran', 'the', 'meeting', 'efficiently', '.', '[SEP]']
⚠️ No 'run' token found — skipping.
🧾 Sentence: He used to run track in college.
🔤 Tokens: ['[CLS]', 'he', 'used', 'to', 'run', 'track', 'in', 'college', '.', '[SEP]']


 91%|████████████████████████████████████████████████████████████████████████████████████████████████████▉          | 20/22 [00:02<00:00,  8.12it/s]

🧾 Sentence: They run a family-owned bakery downtown.
🔤 Tokens: ['[CLS]', 'they', 'run', 'a', 'family', '-', 'owned', 'bakery', 'downtown', '.', '[SEP]']
🧾 Sentence: He decided to run for office in the next election.
🔤 Tokens: ['[CLS]', 'he', 'decided', 'to', 'run', 'for', 'office', 'in', 'the', 'next', 'election', '.', '[SEP]']


 95%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▉     | 21/22 [00:02<00:00,  6.81it/s]

🧾 Sentence: Please run this report by end of day.
🔤 Tokens: ['[CLS]', 'please', 'run', 'this', 'report', 'by', 'end', 'of', 'day', '.', '[SEP]']


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:02<00:00,  7.68it/s]

🧾 Sentence: Run the installer before rebooting.
🔤 Tokens: ['[CLS]', 'run', 'the', 'install', '##er', 'before', 're', '##boot', '##ing', '.', '[SEP]']

✅ Extracted 21 valid trajectories.





## 3. Summary

In [3]:
print(f'Total valid trajectories: {len(trajectories)}')

Total valid trajectories: 21
