# Fine-tune: BERT Traditional vs Taiwan (English)
This notebook fine-tunes the packaged classifier on your dataset using config files in `configs/`. It demonstrates:
- Quick environment and import checks
- Loading configs and paths
- Training via a subprocess call to the CLI entrypoint
- Evaluation via the CLI
Notes:
- Requires the project to be installed/available in the current environment.
- Cells are designed to run end-to-end with cleared outputs by default.

In [None]:
# Environment & imports
import sys, platform, torch
print({"python": sys.version.split()[0], "platform": platform.platform(), "torch": torch.__version__})
try:
    import bert_ts_classifier as pkg
    print({"package": pkg.__name__})
except Exception as e:
    print("Package import failed:", e)

In [None]:
# Configure paths
from pathlib import Path
from bert_ts_classifier.utils.io import load_yaml
DATA_CFG = Path("configs/data.yaml").resolve()
MODEL_CFG = Path("configs/model.yaml").resolve()
TRAIN_CFG = Path("configs/train.yaml").resolve()
EVAL_CFG = Path("configs/eval.yaml").resolve()
print({"data": str(DATA_CFG), "model": str(MODEL_CFG), "train": str(TRAIN_CFG), "eval": str(EVAL_CFG)})
cfg_preview = {"data": load_yaml(DATA_CFG), "model": load_yaml(MODEL_CFG), "train": load_yaml(TRAIN_CFG), "eval": load_yaml(EVAL_CFG)}
print("Loaded configs.")

In [None]:
# Train via CLI (subprocess)
import subprocess, shlex
cmd = "python -m bert_ts_classifier.training.train --config configs/train.yaml"
# Example override: reduce epochs for quick runs
# cmd = cmd + " train.max_epochs=1 data.batch_size=8"
print("Running:", cmd)
res = subprocess.run(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
print(res.stdout)

In [None]:
# Evaluate via CLI (subprocess)
import subprocess, shlex
cmd = "python -m bert_ts_classifier.evaluation.eval --config configs/eval.yaml"
print("Running:", cmd)
res = subprocess.run(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
print(res.stdout)

# Notes & tips
- To speed up quick experiments, lower `train.max_epochs` and `data.batch_size` via overrides.
- For GPU/MPS, the scripts auto-select available device.
- See `PR_DRAFT.md` for command-line quickstart and CI details.