# Reachability Deep Dive

Let’s see if we can discern the properties of the reachable set for k<=10 prompt
tokens by focusing on just a few initial states x_0, but many y* for each x_0.
We are looking for a clean decision boundary where at a certain CE loss on
P(y*_i | x_0), we see that below that it’s solvable in k<=10 above it’s not. 

**Creating the dataset**
 - [ ] Extract 20 random x_0 from wiki5k. Based on some skip_value, we will select
every token of rank i where i % skip_value = 0 as one of the y* values
corresponding to x_0. 
    - [ ] We want 5000 total instances to distribute over the A100’s (probably 8
    of 64 total). So let’s use 5000/20 = 250 y* values for each x_0. 
    - [ ] Since there are 65,536 tokens for Falcon-7b, we would need a
    skip_value satisfying 65536 / skip_value = 250, which implies skip_value =
    65536/250 = 262.144 = 265 = skip_value.

```bash
>>> python3 scripts/generate_deep_dive.py \
    --input_file results/wiki_reachability/k10_falcon7b_wiki5k.csv \
    --output_file results/deep_dive/falcon7b_skip265_states20.csv \
    --model falcon-7b \
    --skip 265 \
    --num_unique_states 20
```

**Controllability Experiments**

We are now ready to run the `scripts/reachability.py` script on this dataset. 
I'll make a `falcon_deepdive.sh` script so we can run it more easily with an 
automated naming convention for the chunks run by sub-workers. Let's aim for 
running workers 0-7 of 64 -- so we would be solving 1/8 of the dataset, which will 
hopefully be done within the night. It's really important that I can check on 
this first thing in the morning to triage the remaining experiments. 

```bash
CUDA_VISIBLE_DEVICES=0 bash scripts/deep_dive_falcon7b.sh 0 64  # ran at ~4:30a on Wed Nov 15
CUDA_VISIBLE_DEVICES=1 bash scripts/deep_dive_falcon7b.sh 1 64
CUDA_VISIBLE_DEVICES=2 bash scripts/deep_dive_falcon7b.sh 2 64
CUDA_VISIBLE_DEVICES=3 bash scripts/deep_dive_falcon7b.sh 3 64


CUDA_VISIBLE_DEVICES=0 bash scripts/deep_dive_falcon7b.sh 4 64  # ran at ~4:30a on Wed Nov 15
CUDA_VISIBLE_DEVICES=1 bash scripts/deep_dive_falcon7b.sh 5 64
CUDA_VISIBLE_DEVICES=2 bash scripts/deep_dive_falcon7b.sh 6 64
CUDA_VISIBLE_DEVICES=3 bash scripts/deep_dive_falcon7b.sh 7 64
```
