# GPT-2 KV Cache Experiments (gpt2_optim)

This notebook builds and runs the KV cache experiments under `gpt2_optim/`.
It focuses on: correctness validation, speed comparison, and profiling.

Assumptions:
- CUDA is available (Colab GPU runtime).
- You have access to `gpt2_124M.bin` and `gpt2_tokenizer.bin` (downloaded via `llm.c` starter pack).


In [1]:
!rm -rf llm.c
!git clone https://github.com/karpathy/llm.c.git


Cloning into 'llm.c'...
remote: Enumerating objects: 6149, done.[K
remote: Total 6149 (delta 0), reused 0 (delta 0), pack-reused 6149 (from 1)[K
Receiving objects: 100% (6149/6149), 2.25 MiB | 22.38 MiB/s, done.
Resolving deltas: 100% (3963/3963), done.


In [2]:
!cd llm.c && chmod u+x dev/download_starter_pack.sh && ./dev/download_starter_pack.sh


Downloaded tiny_shakespeare_val.bin to /content/llm.c/dev/data/tinyshakespeare/tiny_shakespeare_val.bin
Downloaded gpt2_tokenizer.bin to /content/llm.c/dev/../gpt2_tokenizer.bin
Downloaded tiny_shakespeare_train.bin to /content/llm.c/dev/data/tinyshakespeare/tiny_shakespeare_train.bin
Downloaded gpt2_124M_debug_state.bin to /content/llm.c/dev/../gpt2_124M_debug_state.bin
Downloaded gpt2_124M_bf16.bin to /content/llm.c/dev/../gpt2_124M_bf16.bin
Downloaded gpt2_124M.bin to /content/llm.c/dev/../gpt2_124M.bin
Downloaded hellaswag_val.bin to /content/llm.c/dev/data/hellaswag/hellaswag_val.bin
All files downloaded and saved in their respective directories


In [ ]:
!rm -rf gpt2_optim
!git clone https://github.com/agridrama/gpt2_optim.git


In [None]:
!cd gpt2_optim && make all GPU_COMPUTE_CAPABILITY=75 PRECISION=BF16 LLM_C_ROOT=../llm.c


In [None]:
!cd gpt2_optim && ./bin/inference_gpt2optimcu \
  -e ../llm.c/gpt2_124M_bf16.bin \
  -tk ../llm.c/gpt2_tokenizer.bin \
  -g 64 -b 4 -m 0


In [None]:
!cd gpt2_optim && ./bin/validate_kvcache_optimization -g 128 -b 2


In [None]:
# Install Nsight Systems (nsys)
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt update
!apt install ./nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt --fix-broken install


In [None]:
!cd gpt2_optim && nsys profile -t cuda,nvtx \
  --capture-range=nvtx --nvtx-capture='MEASURE@*' --capture-range-end=stop-shutdown \
  -o prof_kvcache \
  ./bin/profile_kvcache_optimization
