# GPT-2 KV Cache Experiments (gpt2_optim)

This notebook builds and runs the KV cache experiments under `gpt2_optim/`.
It focuses on: correctness validation, speed comparison, and profiling.

Assumptions:
- CUDA is available (Colab GPU runtime).
- You have access to `gpt2_124M.bin` and `gpt2_tokenizer.bin` (downloaded via `llm.c` starter pack).


In [ ]:
!rm -rf llm.c
!git clone https://github.com/karpathy/llm.c.git


In [ ]:
!cd llm.c && chmod u+x dev/download_starter_pack.sh && ./dev/download_starter_pack.sh


In [ ]:
# If running inside this repo, skip this copy step.
# Otherwise, upload or copy gpt2_optim into the current workspace.


In [ ]:
!cd gpt2_optim && make all GPU_COMPUTE_CAPABILITY=75 PRECISION=BF16 LLM_C_ROOT=../llm.c


In [ ]:
!cd gpt2_optim && ./bin/inference_gpt2optimcu \
  -e ../llm.c/gpt2_124M_bf16.bin \
  -tk ../llm.c/gpt2_tokenizer.bin \
  -g 64 -b 4 -m 0


In [ ]:
!cd gpt2_optim && ./bin/validate_kvcache_optimization -g 128 -b 2


In [ ]:
# Install Nsight Systems (nsys)
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt update
!apt install ./nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt --fix-broken install


In [ ]:
!cd gpt2_optim && nsys profile -t cuda,nvtx \
  --capture-range=nvtx --nvtx-capture='MEASURE@*' --capture-range-end=stop-shutdown \
  -o prof_kvcache \
  ./bin/profile_kvcache_optimization
