# SELM Hyperparameter Search

In this notebook, we will perform hyperparameter tuning for the SELM model using Optuna. We'll configure and run an Optuna study to find the best hyperparameters for our model.

## 1. Setup
Import necessary libraries and load configurations.

In [1]:
import optuna
import yaml
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import torch

# Load configuration
def load_config(config_path):
    with open(config_path, 'r') as f:
        return yaml.safe_load(f)

config = load_config('../config/optuna_config.yaml')

# Load tokenizer
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load dataset
dataset = load_dataset('glue', 'mrpc')

# Tokenize dataset
def tokenize_fn(examples):
    return tokenizer(examples['sentence'], padding='max_length', truncation=True, max_length=128)

tokenized_dataset = dataset.map(tokenize_fn, batched=True)
train_dataset = tokenized_dataset['train']
test_dataset = tokenized_dataset['test']


## 2. Define the Objective Function
Define the objective function for Optuna to optimize.

In [2]:
def objective(trial):
    # Hyperparameters
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-1)
    batch_size = trial.suggest_categorical('batch_size', [8, 16, 32])
    num_epochs = trial.suggest_int('num_epochs', 3, 10)

    # Model
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

    # TrainingArguments
    training_args = TrainingArguments(
        output_dir='./results',
        num_train_epochs=num_epochs,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        warmup_steps=500,
        weight_decay=0.01,
        logging_dir='./logs',
        logging_steps=10,
        evaluation_strategy='steps',
        save_steps=500,
        load_best_model_at_end=True
    )

    # Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        tokenizer=tokenizer
    )

    # Train and evaluate
    trainer.train()
    eval_results = trainer.evaluate()
    return eval_results['eval_accuracy']


## 3. Run the Optuna Study
Configure and run the Optuna study to find the best hyperparameters.

In [3]:
# Create Optuna study
study = optuna.create_study(
    study_name=config['study_name'],
    direction=config['direction']
) 

# Optimize
study.optimize(objective, n_trials=config['n_trials'], timeout=config['timeout'])

# Print best parameters
print('Best trial:')
trial = study.best_trial
print(f'  Value: {trial.value}')
print(f'  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')


Study: study_name=seml_hyperparameter_tuning
Number of trials: 10


## 4. Save the Results
Save the results of the Optuna study for future reference.

In [4]:
import joblib

# Save study results
joblib.dump(study, 'optuna_results/study.pkl')
print('Optuna study saved to optuna_results/study.pkl')
