# GPT-2 KV Cache Experiments (gpt2_optim)

This notebook builds and runs the KV cache experiments under `gpt2_optim/`.
It focuses on: correctness validation, speed comparison, and profiling.

Assumptions:
- CUDA is available (Colab GPU runtime).
- You have access to `gpt2_124M.bin` and `gpt2_tokenizer.bin` (downloaded via `llm.c` starter pack).


## Setup
Cloning the repository and building the project

In [None]:
!nvidia-smi

In [None]:
!rm -rf llm.c
!git clone https://github.com/karpathy/llm.c.git


In [None]:
!cd llm.c && chmod u+x dev/download_starter_pack.sh && ./dev/download_starter_pack.sh


In [None]:
!rm -rf gpt2_optim
!git clone https://github.com/agridrama/gpt2_optim.git


In [None]:
!cd gpt2_optim && make all GPU_COMPUTE_CAPABILITY=75 PRECISION=BF16 LLM_C_ROOT=../llm.c


## Inference with KV Cache Optimization
Command line arguments:
- `-e`: specify model path (example: `../llm.c/gpt2_124M_bf16.bin`)
- `-tk`: specify tokenizer path (example: `../llm.c/gpt2_tokenizer.bin`)
- `-g`: specify number of tokens to generate (example: `64`)
- `-b`: specify batch size (example: `4`)
- `-m`: specify sampling method (example: `0` = random sampling, `1` = greedy sampling)

In [None]:
# Speed comparison (KV + kernel fusion optimizations)
!cd gpt2_optim && ./bin/inference_gpt2optimcu \
  -e ../llm.c/gpt2_124M_bf16.bin \
  -tk ../llm.c/gpt2_tokenizer.bin \
  -g 64 -b 4 -m 0 -q 1

In [None]:

# Speed comparison (KV-only: other optimizations disabled)
!cd gpt2_optim && ./bin/inference_gpt2optimcu_kvonly \
  -e ../llm.c/gpt2_124M_bf16.bin \
  -tk ../llm.c/gpt2_tokenizer.bin \
  -g 64 -b 4 -m 0 -q 1


## 検証とプロファイリング
- `validate_kvcache_optimization`: 正確性の検証（ベースラインとの出力比較）
- `profile_kvcache_optimization`: CUDAカーネルごとの実行時間をプロファイリング（NSYS使用）

In [None]:
!cd gpt2_optim && ./bin/validate_kvcache_optimization \
  -e ../llm.c/gpt2_124M_bf16.bin \
  -tk ../llm.c/gpt2_tokenizer.bin \
  -g 32 -b 2


In [None]:
# Install Nsight Systems (nsys), might take a few minutes
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt update
!apt install ./nsight-systems-2025.5.2_2025.5.2.266-1_amd64.deb
!apt --fix-broken install


In [None]:
!cd gpt2_optim && nsys profile -t cuda,nvtx \
  -o prof_kvcache \
  ./bin/profile_kvcache_optimization \
    -e ../llm.c/gpt2_124M_bf16.bin \
    -tk ../llm.c/gpt2_tokenizer.bin \
    -g 128 -b 2
