# 0. Setups.

In [3]:
%%html
<style>
    table {
        float: left;
        margin-right: 20px; /* Optional: Adds space between table and other content */
    }
</style>

# 1. `TrainingArguments`.

## 1. General Setup.

|   | Name                     | Default   | Values          | Description                             |
|:--|:-------------------------|:----------|:----------------|:----------------------------------------|
| 1 | `output_dir`             | *None*    | *String (path)* | Directory to save results, checkpoints. |
| 2 | `overwrite_output_dir`   | *False*   | *Boolean*       | Overwrite existing output directory.    |
| 3 | `prediction_loss_only`   | *False*   | *Boolean*       | Only return loss during evaluation.     |
| 4 | `run_name`               | *None*    | *String*        | Identifier for the training run.        |
| 5 | `disable_tqdm`           | *None*    | *Boolean*       | Disable tqdm progress bars.             |
| 6 | `label_names`            | *None*    | *List of str*   | List of label names for the model.      |
| 7 | `load_best_model_at_end` | *False*   | *Boolean*       | Load best model at the end of training. |
| 8 | `metric_for_best_model`  | *None*    | *String*        | Metric to use for selecting best model. |
| 9 | `greater_is_better`      | *None*    | *Boolean*       | Whether better models have greater metric values. |
| 10| `ignore_data_skip`       | *False*   | *Boolean*       | Ignore data skipping when resuming training. |


## 2. Training Control.

|   | Name                            | Default   | Values        | Description                                  |
|---|---------------------------------|-----------|---------------|----------------------------------------------|
| 1 | `num_train_epochs`              | *3*       | *Float*       | Total number of training epochs.             |
| 2 | `per_device_train_batch_size`   | *8*       | *Integer*     | Batch size per device during training.       |
| 3 | `per_device_eval_batch_size`    | *8*       | *Integer*     | Batch size per device during evaluation.     |
| 4 | `gradient_accumulation_steps`   | *1*       | *Integer*     | Number of steps to accumulate gradients.     |
| 5 | `warmup_steps`                  | *0*       | *Integer*     | Number of steps for learning rate warmup.    |
| 6 | `weight_decay`                  | *0.0*     | *Float*       | Weight decay for AdamW optimizer.            |
| 7 | `max_grad_norm`                 | *1.0*     | *Float*       | Maximum norm for gradient clipping.          |
| 8 | `logging_steps`                 | *500*     | *Integer*     | Frequency of logging during training.        |
| 9 | `max_steps`                     | *-1*      | *Integer*     | If > 0, total number of training steps to run. |
| 10| `save_steps`                    | *500*     | *Integer*     | Number of steps between saving checkpoints.  |

## 3. Evaluation and Logging.

|   | Name                      | Default   | Values        | Description                                |
|---|:--------------------------|:----------|:--------------|--------------------------------------------|
| 1 | `evaluation_strategy`     | *"no"*    | *{"no", "steps", "epoch"}* | How often to run evaluation.             |
| 2 | `eval_steps`              | *None*    | *Integer*     | Number of steps between evaluations.       |
| 3 | `logging_dir`             | *"runs/"* | *String (path)* | Directory for saving TensorBoard logs.     |
| 4 | `logging_steps`           | *500*     | *Integer*     | Frequency of logging metrics.              |
| 5 | `save_strategy`           | *"steps"* | *{"no", "steps", "epoch"}* | When to save model checkpoints.            |
| 6 | `save_steps`              | *500*     | *Integer*     | Number of steps between saving checkpoints.|
| 7 | `save_total_limit`        | *None*    | *Integer*     | Maximum number of saved checkpoints.       |


## 4. Optimization.

|   | Name                      | Default   | Values        | Description                                |
|---|:--------------------------|:----------|:--------------|--------------------------------------------|
| 1 | `learning_rate`           | *5e-5*    | *Float*       | Initial learning rate for the optimizer.   |
| 2 | `lr_scheduler_type`       | *"linear"*| *{"linear", "cosine", "polynomial", "constant"}* | Type of learning rate scheduler. |
| 3 | `weight_decay`            | *0.0*     | *Float*       | Weight decay coefficient.                  |
| 4 | `adam_beta1`              | *0.9*     | *Float*       | Beta1 parameter for AdamW optimizer.       |
| 5 | `adam_beta2`              | *0.999*   | *Float*       | Beta2 parameter for AdamW optimizer.       |
| 6 | `adam_epsilon`            | *1e-8*    | *Float*       | Epsilon value for AdamW optimizer.         |
| 7 | `warmup_steps`            | *0*       | *Integer*     | Number of warmup steps for the scheduler.  |


## 5. Hardware Utilization.

|   | Name                      | Default   | Values        | Description                                |
|---|:--------------------------|:----------|:--------------|--------------------------------------------|
| 1 | `no_cuda`                 | *False*   | *Boolean*     | Disable GPU usage.                         |
| 2 | `fp16`                    | *False*   | *Boolean*     | Use 16-bit precision for training.         |
| 3 | `fp16_opt_level`          | *"O1"*    | *{"O0", "O1", "O2", "O3"}* | Optimization level for fp16 training.    |
| 4 | `local_rank`              | *-1*      | *Integer*     | Rank of the process during distributed training. |
| 5 | `tpu_num_cores`           | *None*    | *Integer*     | Number of TPU cores to use.                |
| 6 | `dataloader_num_workers`  | *0*       | *Integer*     | Number of subprocesses for data loading.   |

# 2. `Trainer`.

|   | Name                            | Default   | Values        | Description                                  |
|---|:---------------------------------|:----------|:--------------|----------------------------------------------|
| 1 | `model`                          | *None*    | *PreTrainedModel or torch.nn.Module* | The model to train, evaluate or use for predictions. |
| 2 | `args`                           | *None*    | *TrainingArguments* | Arguments to tweak for training. Defaults to a basic instance of `TrainingArguments`. |
| 3 | `data_collator`                  | *None*    | *DataCollator* | Function to form a batch from dataset elements. |
| 4 | `train_dataset`                  | *None*    | *torch.utils.data.Dataset or datasets.Dataset* | Dataset for training. |
| 5 | `eval_dataset`                   | *None*    | *torch.utils.data.Dataset or datasets.Dataset* | Dataset for evaluation. |
| 6 | `model_init`                     | *None*    | *Callable[[], PreTrainedModel]* | Function that initializes a new model for each training run. |
| 7 | `compute_loss_func`              | *None*    | *Callable*    | Function that computes loss from model outputs and labels. |
| 8 | `compute_metrics`                | *None*    | *Callable[[EvalPrediction], Dict]* | Function to compute metrics during evaluation. |
| 9 | `callbacks`                      | *None*    | *List[TrainerCallback]* | List of callbacks to customize the training loop. |
| 10 | `optimizers`                     | *None*    | *Tuple[Optimizer, lr_scheduler]* | Optimizer and scheduler to use. Defaults to AdamW and linear schedule. |
| 11 | `optimizer_cls_and_kwargs`       | *None*    | *Tuple[Type[Optimizer], Dict]* | Custom optimizer class and arguments. |
| 12 | `preprocess_logits_for_metrics`  | *None*    | *Callable[[torch.Tensor, torch.Tensor], torch.Tensor]* | Function to preprocess logits before evaluation. |


## 2.1. Example - `data_collator` for dynamic padding.

In [None]:
def data_collator(batch):
    return tokenizer.pad(batch, padding=True, truncation=True, return_tensors="pt")

# 3. Callbacks.

| Name                               | Parameters                                              | Description                                                                                                      |
|:-----------------------------------|:--------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------|
| `DefaultFlowCallback`              | .                                                       | Logs training metrics (e.g., loss, learning rate) to the directory specified by `logging_dir` (default is `./logs`). It also handles evaluation and checkpoint saving. |
| `PrinterCallback`                  | .                                                       | Just prints the logs on the console.                                                                |
| `ProgressCallback`                 | `max_str_len`: int = 100                                | Displays the progress of training or evaluation. `max_str_len` is how long strings are truncated when logging. |
| `EarlyStoppingCallback`            | `early_stopping_patience`: int = 1, `early_stopping_threshold`: Optional[float] = 0.0 | Stop training after `early_stopping_patience` evaluations without improvement, more than `early_stopping_threshold`. |
| `TensorBoardCallback`              | `tb_writer`: SummaryWriter = None                       | A `TrainerCallback` that sends the logs to TensorBoard. If `tb_writer` is not provided, it will instantiate one.  |


> #### Note) How to Run TensorBoard.
> `tensorboard --logdir ./logs`.  
> TrainingArguments(`logging_dir='./logs'`).