# RoBERTa full fine-tune (Kaggle)

This notebook runs a full RoBERTa-base fine-tune for the TEXT_BRANCH on Kaggle.

Usage notes:
- Upload `data/iemocap_manifest.jsonl` as a Kaggle Dataset and add it to the Notebook (recommended).
- Use GPU Accelerator (T4 or better).
- Adjust `--batch_size`, `--accumulation_steps`, and `--max_length` depending on GPU memory.

In [None]:
!pip install -q -U transformers accelerate datasets evaluate sentencepiece sentence-transformers

In [None]:
import torch, transformers, datasets, numpy as np

print("torch:", torch.__version__, "cuda:", torch.cuda.is_available())
print("transformers:", transformers.__version__)
print("datasets:", datasets.__version__)
print("numpy:", np.__version__)

In [None]:
!git clone https://github.com/SpeedyLabX/ser-conformer-gat-xai.git
%cd ser-conformer-gat-xai

In [None]:
!pip install -q -e .

In [None]:
!mkdir -p data
!cp /kaggle/input/slx02-ser-dataset/iemocap_manifest.jsonl data/iemocap_manifest.jsonl
!ls -l data || true

In [None]:
# Run RoBERTa fine-tune (recommended settings for Kaggle P100)
!python scripts/finetune_roberta.py --manifest data/iemocap_manifest.jsonl --backbone roberta-base --batch_size 8 --accumulation_steps 4 --epochs 100 --max_length 128 --num_class 7 --out_dir /kaggle/working/artifacts/roberta_kaggle --fp16 --gradient_checkpointing --load_best_model_at_end --save_total_limit 5 --evaluation_strategy epoch --early_stopping_patience 5 --lr 1e-5 --use_tqdm

In [None]:
!python scripts/evaluate_text.py --manifest data/iemocap_manifest.jsonl --checkpoint artifacts/roberta_kaggle/pytorch_model.bin --backbone roberta-base --proj_dim 256 --out_dir artifacts/roberta_kaggle_eval --save_confusion_plot

In [None]:
# List and zip artifacts for download
!ls -la /kaggle/working/artifacts/roberta_kaggle || true
!zip -r /kaggle/working/roberta_kaggle_artifacts.zip /kaggle/working/artifacts/roberta_kaggle || true