I'll use lambda cloud. Assuming I can create a machine in us-south-2, my storage is already there so I won't need to download data files, scp the tokenizer and model, etc. Here are the instructions from `challenge-25-pretrain-d20/trying-lambda-cloud.ipynb` without the stuff I won't need.

```
ssh ssh ubuntu@[ip]

# ssh key for git
ssh-keygen -t ed25519 -C "lambda-cloud"
cat ~/.ssh/id_ed25519.pub
copy into github UI (https://github.com/settings/keys)

git config --global user.email "ericsilberstein@gmail.com"
git config --global user.name "Eric Silberstein"

# clone this repo
git clone git@github.com:ericsilberstein1/nanogpt-learning.git

# UV
curl -LsSf https://astral.sh/uv/install.sh | sh

# rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
echo '. "$HOME/.cargo/env"' >> .bashrc

echo 'export NANOCHAT_BASE_DIR="/home/ubuntu/mynanochat"' >> .bashrc

# in .bashrc add
# export WANDB_API_KEY="XXX"

source .bashrc

cd nanogpt-learning

uv sync
source .venv/bin/activate

# for now until organize this better
uv tool install maturin
cd challenge-07-rust-and-python-simplified-tokenizer/rust_tokenizer
maturin develop
cd -

# looks like lambda automatically runs jupyter but for now at least let me run it
# in the way I understand
uv run jupyter lab --port=7001
jupyter server list

# ON MY LAPTOP make a tunnel to jupyter
ssh -N -L 7001:localhost:7001 ubuntu@[ip]
```

If my calculation in challenge 26 is right, training will be around 10 minutes, so I'll just run everything from this notebook.

In [1]:
import os
os.environ["PYTHONPATH"] = "../my_nanochat"

First do a CORE evaluation. This is a sanity check. It should match the final eval from the training in challenge 24.

```
Step 21400: CORE metric: 0.2084
```

In [None]:
!torchrun --standalone --nproc_per_node=8 -m scripts.my_base_eval -- --source=base --model-tag=d20

Do the midtraining

In [None]:
!torchrun --standalone --nproc_per_node=8 -m scripts.my_mid_train -- --model_tag=d20 --run=challenge-28-1

Chat eval

In [None]:
torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=mid --model-tag=d20

Also run limited chat evals on base and mid

In [None]:
!torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=base --model-tag=d20 --max-problems=100

In [None]:
!torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=mid --model-tag=d20 --max-problems=100

Backup, in case decide to run in tmux shell:

```
source .venv/bin/activate

cd challenge-28-midtrain-d20

export PYTHONPATH=../my_nanochat/

# sanity check, should match what we saw in training output of challenge 24
# Step 21400: CORE metric: 0.2084
torchrun --standalone --nproc_per_node=8 -m scripts.my_base_eval -- --source=base --model-tag=d20 > output_001.txt 2>&1

# just to see
torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=base --model-tag=d20 --max-problems=100 > output_002.txt 2>&1

# train
torchrun --standalone --nproc_per_node=8 -m scripts.my_mid_train -- --model_tag=d20 --run=challenge-28-1 > output_003.txt 2>&1

# repeat of 100 problem chat eval from above
torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=mid --model-tag=d20 --max-problems=100 > output_004.txt 2>&1

# full chat eval
torchrun --standalone --nproc_per_node=8 -m scripts.my_chat_eval -- --source=mid --model-tag=d20 > coutput_005.txt 2>&1

```