- HyperParameter란? 모델이 스스로 학습할 수 있는 가중치와 달리 개발자가 직접 설정해줘야 하는 초기값들
    - learning rate
    - batch_size
    - dropout_rate
    - epoch
    - 등등..
- 예제에서는 HyperParaemter를 찾기위해 가장 간단한 방법인 모든 경우의 수를 다 해보는 **Grid Search** 방법을 사용함

In [16]:
import torch
HPARAM_GRID = {
    "batch_size": [2, 4, 8, 16],
    "drop_rate": [0.0, 0.1, 0.2],
    "warmup_iters": [10, 20, 30],
    "weight_decay": [0.1, 0.01, 0.0],
    "peak_lr": [0.0001, 0.0005, 0.001, 0.005],
    "initial_lr": [0.00005, 0.0001],
    "min_lr": [0.00005, 0.00001, 0.0001],
    "n_epochs": [5, 10, 15, 20, 25],
}

- dict에 Grid Search에 사용될 Hyperparameter 후보들을 모아둠

In [17]:
from previous_chapters import calc_loss_batch,calc_loss_loader,evaluate_model,train_model_with_optimization,create_dataloader_v1,GPTModel
import math
import itertools
import os,tiktoken

# hyperparameter에 대한 조합 생성 
hyperparameter_combinations = list(itertools.product(*HPARAM_GRID.values()))
total_combinations = len(hyperparameter_combinations)
print(f"Total hyperparameter configurations: {total_combinations}")

# 최적의 hyperparameter를 구하기 위한 최소 val_loss inf로 초기화
best_val_loss = float("inf")
best_hparams = {}

file_path = "the-verdict.txt"
url = "https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt"

if not os.path.exists(file_path):
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    text_data = response.text
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(text_data)
else:
    with open(file_path, "r", encoding="utf-8") as file:
        text_data = file.read()

tokenizer = tiktoken.get_encoding("gpt2")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_ratio = 0.95
split_idx = int(train_ratio * len(text_data))

torch.manual_seed(123)

interrupted = False

## Grid Search 에 사용될 hyperparameter 조합 인덱스
current_config = 0


Total hyperparameter configurations: 12960


In [18]:
for combination in hyperparameter_combinations:
    
        try:
            
            current_config += 1
            print(f"Evaluating configuration {current_config} of {total_combinations}")

            # 튜플 형태의 설정값을 쓰기 편하게 딕셔너리로 변환
            HPARAM_CONFIG = dict(zip(HPARAM_GRID.keys(), combination))

            GPT_CONFIG_124M = {
                "vocab_size": 50257,    # Vocabulary size
                "context_length": 256,  # Context length -- shortened from original 1024 tokens
                "emb_dim": 768,         # Embedding dimension
                "n_heads": 12,          # Number of attention heads
                "n_layers": 12,         # Number of layers
                "drop_rate": HPARAM_CONFIG["drop_rate"],
                "qkv_bias": False,     # Query-Key-Value bias
            }

            torch.manual_seed(123)
            train_loader = create_dataloader_v1(
                text_data[:split_idx],
                batch_size=HPARAM_CONFIG["batch_size"],
                max_length=GPT_CONFIG_124M["context_length"],
                stride=GPT_CONFIG_124M["context_length"],
                drop_last=True,
                shuffle=True,
                num_workers=0
            )

            val_loader = create_dataloader_v1(
                text_data[split_idx:],
                batch_size=HPARAM_CONFIG["batch_size"],
                max_length=GPT_CONFIG_124M["context_length"],
                stride=GPT_CONFIG_124M["context_length"],
                drop_last=False,
                shuffle=False,
                num_workers=0
            )

            model = GPTModel(GPT_CONFIG_124M)
            model.to(device)

            optimizer = torch.optim.AdamW(
                model.parameters(),
                lr=HPARAM_CONFIG["peak_lr"],
                weight_decay=HPARAM_CONFIG["weight_decay"]
            )

            encoded_start_context = tokenizer.encode("Nevertheless")
            encoded_tensor = torch.tensor(encoded_start_context).unsqueeze(0)

            train_losses, val_losses = train_model_with_optimization(
                model, train_loader, val_loader, optimizer, device,
                n_epochs=HPARAM_CONFIG["n_epochs"],
                eval_iter=1,
                eval_freq=HPARAM_CONFIG["n_epochs"],
                warmup_steps=HPARAM_CONFIG["warmup_iters"],
                initial_lr=HPARAM_CONFIG["initial_lr"],
                min_lr=HPARAM_CONFIG["min_lr"],
                tokenizer=tokenizer,
                start_context="Nevertheless"
                
            )

            # 이번 hyperparameter의 점수(val_loss)가 최소 loss(best_val_loss)보다 좋으면?
            
            print(val_losses)
            
            if val_losses[0] < best_val_loss:
                best_val_loss = val_losses[0]
                best_train_loss = train_losses[0]
                best_hparams = HPARAM_CONFIG

        except KeyboardInterrupt:
            print("Hyperparameter search completed.")
            print(f"Best hyperparameters: {best_hparams}")
            print(f"Best Val loss: {best_val_loss} | Training loss {train_losses[0]}")
            interrupted = True
            break

if not interrupted:
    print("Hyperparameter search completed.")
    print(f"Best hyperparameters: {best_hparams}")
    print(f"Best Val loss: {best_val_loss} | Training loss {train_losses[0]}")

Evaluating configuration 1 of 12960
Ep 1 (Iter 000000): Train loss 10.770, Val loss 10.843
Ep 1 (Iter 000005): Train loss 9.583, Val loss 9.797
Nevertheless,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Ep 2 (Iter 000010): Train loss 8.815, Val loss 9.177
Ep 2 (Iter 000015): Train loss 8.567, Val loss 8.763
Nevertheless, the,, the, the,, the,,,,, the, the,,,, the,,,,,,,,, the,, the,,,, the,,, the,,,,,
Ep 3 (Iter 000020): Train loss 8.240, Val loss 8.416
Ep 3 (Iter 000025): Train loss 7.864, Val loss 8.116
Nevertheless, the, the,, the, the, the, the,, the, the, the, the,,, the,, the,,, the,, the, the, the, the,, the,, the,,
Ep 4 (Iter 000030): Train loss 7.655, Val loss 7.861
Ep 4 (Iter 000035): Train loss 7.123, Val loss 7.654
Nevertheless, the, the,, the, the, the, the, the, the.                                
Ep 5 (Iter 000040): Train loss 7.070, Val loss 7.491
Nevertheless, the, the the, the the the.                                        
[10.842914581298828, 9.797467231750488