# Transformer Models (Clean Runner)

This notebook is a streamlined version of `advanced_model_bert.ipynb` for paper-ready experiments:
- Loads `data/english_clean.csv`
- Builds train/val/test splits
- Defines metrics
- Runs multi-model fine-tuning (BERT/RoBERTa/XLM-RoBERTa) with optional multi-seed
- Writes per-run JSON + a summary CSV to `models/bert_models/`

In [26]:
# Imports
import sys
import subprocess
import importlib
import json
import pickle
import csv
import gc
import time
import warnings
from pathlib import Path

import numpy as np
import pandas as pd

import torch

# Ensure accelerate is available BEFORE importing/using Trainer/TrainingArguments
def _ensure_accelerate():
    try:
        import accelerate
        return accelerate
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-U", "accelerate>=0.26.0"])
        importlib.invalidate_caches()
        import accelerate
        return accelerate

accelerate = _ensure_accelerate()
print("‚öôÔ∏è accelerate:", getattr(accelerate, "__version__", "unknown"))

# Defensive reload: if transformers was imported before accelerate was installed, or if a previous import was partial,
# modules may be missing required symbols in the current kernel session.
import transformers
import transformers.training_args as _training_args_mod
import transformers.trainer as _trainer_mod
import transformers.modeling_utils as _modeling_utils_mod

_needs_reload = (
    (not hasattr(_training_args_mod, "AcceleratorConfig"))
    or (not hasattr(_trainer_mod, "DataLoaderConfiguration"))
    or (not hasattr(_modeling_utils_mod, "extract_model_from_parallel"))
 )
if _needs_reload:
    import transformers.utils.import_utils as _import_utils
    importlib.reload(_import_utils)
    importlib.reload(_training_args_mod)
    importlib.reload(_trainer_mod)
    importlib.reload(_modeling_utils_mod)

from datasets import Dataset, DatasetDict

# Import these directly from their defining modules to avoid stale references if transformers modules were reloaded.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers.training_args import TrainingArguments
from transformers.trainer import Trainer
from transformers.trainer_callback import EarlyStoppingCallback

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)
torch.manual_seed(RANDOM_STATE)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('üêç Python:', sys.executable)
print('üñ•Ô∏è Using device:', device)

‚öôÔ∏è accelerate: 1.12.0
üêç Python: c:\Anaconda\envs\py310\python.exe
üñ•Ô∏è Using device: cuda


In [27]:
# Paths + load data
DATA_DIR = Path('./data')
MODELS_DIR = Path('./models')
BERT_MODELS_DIR = MODELS_DIR / 'bert_models'
BERT_MODELS_DIR.mkdir(parents=True, exist_ok=True)

INPUT_FILE = DATA_DIR / 'english_clean.csv'
assert INPUT_FILE.exists(), f'Missing file: {INPUT_FILE.resolve()}. Run labeling_and_preprocessing.ipynb first.'

df = pd.read_csv(INPUT_FILE)
df = df.dropna(subset=['clean_text', 'label']).copy()

text_column = 'combined_text' if 'combined_text' in df.columns else 'clean_text'
print('üìù Using text column:', text_column)
print('‚úÖ Loaded rows:', len(df))
df.head()

üìù Using text column: combined_text
‚úÖ Loaded rows: 1334


Unnamed: 0,title,url,root_category,subcategory,label,combined_text,clean_text,word_count,timestamp
0,USA FULLZ + DL + BACKGROUND REPORT MONTANA | N...,http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3...,Fraud,SSN/DOB/DL/PII,fraud,USA FULLZ + DL + BACKGROUND REPORT MONTANA | N...,usa fullz dl background report montana nemesis...,23,2023-01-11T13:00:52
1,1 x POWER PLANT XL autoflower seed | Nemesis M...,http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3...,Drugs,Cannabis,drug,1 x POWER PLANT XL autoflower seed | Nemesis M...,1 x power plant xl autoflower seed nemesis mar...,189,2023-01-11T13:00:01
2,Ship Marijuana Safely - Instant Delivery | Nem...,http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3...,Other,Guides and Tutorials,guide,Ship Marijuana Safely - Instant Delivery | Nem...,ship marijuana safely instant delivery nemesis...,142,2023-01-11T12:59:34
3,1x Feminized AUTOFLOWER AK-47 Cannabis Seeds |...,http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3...,Drugs,Cannabis,drug,1x Feminized AUTOFLOWER AK-47 Cannabis Seeds |...,1x feminized autoflower ak 47 cannabis seeds n...,178,2023-01-11T12:59:08
4,1 x KALASHNIKOV autoflower seed | Nemesis Market,http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3...,Drugs,Cannabis,drug,1 x KALASHNIKOV autoflower seed | Nemesis Mark...,1 x kalashnikov autoflower seed nemesis market...,170,2023-01-11T12:58:53


In [28]:
# Label encoding
label_encoder = LabelEncoder()
df['label_encoded'] = label_encoder.fit_transform(df['label'])

label2id = {label: idx for idx, label in enumerate(label_encoder.classes_)}
id2label = {idx: label for label, idx in label2id.items()}
num_labels = len(label_encoder.classes_)
print('üìä num_labels:', num_labels)

with open(BERT_MODELS_DIR / 'label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)
print('üíæ Saved:', (BERT_MODELS_DIR / 'label_encoder.pkl').as_posix())

üìä num_labels: 5
üíæ Saved: models/bert_models/label_encoder.pkl


In [29]:
# Train/Val/Test split + HF DatasetDict
X = df[text_column].astype(str).values
y = df['label_encoded'].values

X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.30, random_state=RANDOM_STATE, stratify=y
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.50, random_state=RANDOM_STATE, stratify=y_temp
)

dataset_dict = DatasetDict({
    'train': Dataset.from_dict({'text': X_train.tolist(), 'label': y_train.tolist()}),
    'validation': Dataset.from_dict({'text': X_val.tolist(), 'label': y_val.tolist()}),
    'test': Dataset.from_dict({'text': X_test.tolist(), 'label': y_test.tolist()}),
})

print(dataset_dict)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 933
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 200
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 201
    })
})


In [30]:
# Model checkpoints + MAX_LENGTH
MODEL_CHECKPOINTS = {
    # Paper baselines (heavier)
    'BERT': 'bert-base-uncased',
    'RoBERTa': 'roberta-base',
    'XLM-RoBERTa': 'xlm-roberta-base',
    
    # Lightweight options (faster / less VRAM)
    'DistilBERT': 'distilbert-base-uncased',
    'DistilRoBERTa': 'distilroberta-base',
    'TinyBERT': 'prajjwal1/bert-tiny',
    'MiniBERT': 'prajjwal1/bert-mini',
    'SmallBERT': 'prajjwal1/bert-small',
    'ELECTRA-Small': 'google/electra-small-discriminator',
}

LIGHTWEIGHT_MODEL_NAMES = (
    'DistilBERT',
    'DistilRoBERTa',
    'TinyBERT',
    'MiniBERT',
    'SmallBERT',
    'ELECTRA-Small',
)

# Pick a tokenizer just to estimate token lengths (we'll re-tokenize per model during training)
_probe_tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINTS['DistilBERT'])
text_lengths = [len(_probe_tokenizer.encode(t, truncation=True, max_length=512)) for t in X_train[:500]]
MAX_LENGTH = min(int(np.percentile(text_lengths, 95)), 512)
print('üéØ MAX_LENGTH (auto):', MAX_LENGTH)
print("üí° Tip: for lightweight training, pass max_length=128 in the runner.")
print('ü™∂ Lightweight models:', LIGHTWEIGHT_MODEL_NAMES)

üéØ MAX_LENGTH (auto): 303
üí° Tip: for lightweight training, pass max_length=128 in the runner.
ü™∂ Lightweight models: ('DistilBERT', 'DistilRoBERTa', 'TinyBERT', 'MiniBERT', 'SmallBERT', 'ELECTRA-Small')


In [31]:
# Metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)
    acc = accuracy_score(labels, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='macro', zero_division=0)
    return {'accuracy': acc, 'precision': precision, 'recall': recall, 'f1': f1}

print('‚úÖ compute_metrics ready')

‚úÖ compute_metrics ready


## Multi-model fine-tuning
Uncomment the last line to run. Results go to `models/bert_models/`.

In [32]:
def _model_slug(name: str) -> str:
    return (name or '').strip().lower().replace(' ', '_').replace('-', '_').replace('/', '_')

def _append_csv_row(csv_path: Path, row: dict) -> None:
    csv_path.parent.mkdir(parents=True, exist_ok=True)
    write_header = not csv_path.exists()
    with open(csv_path, 'a', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=list(row.keys()))
        if write_header:
            writer.writeheader()
        writer.writerow(row)

def finetune_one_transformer(
    model_name: str,
    *,
    seed: int = RANDOM_STATE,
    num_epochs: int = 3,
    train_batch_size: int = 16,
    eval_batch_size: int = 32,
    learning_rate: float = 2e-5,
    weight_decay: float = 0.01,
    max_length: int | None = None,
    max_train_samples: int | None = None,
    max_eval_samples: int | None = None,
    ) -> dict:
    if model_name not in MODEL_CHECKPOINTS:
        raise ValueError(f'Unknown model_name={model_name!r}. Choose one of: {list(MODEL_CHECKPOINTS.keys())}')

    checkpoint = MODEL_CHECKPOINTS[model_name]
    run_tag = f"{_model_slug(model_name)}_seed{seed}_{int(time.time())}"
    run_output_dir = BERT_MODELS_DIR / f"{run_tag}_finetuned"
    final_model_dir = BERT_MODELS_DIR / f"{run_tag}_final"
    results_file = BERT_MODELS_DIR / f"{run_tag}_results.json"
    summary_csv = BERT_MODELS_DIR / 'transformer_runs_summary.csv'

    run_max_length = int(max_length) if max_length is not None else int(MAX_LENGTH)

    print(f"\nüöÄ Fine-tuning: {model_name}\n   checkpoint={checkpoint}\n   seed={seed}\n   max_length={run_max_length}\n   output={run_output_dir.name}\n")

    tokenizer = AutoTokenizer.from_pretrained(checkpoint)

    # Trainer requires accelerate (already ensured in the imports cell)
    try:
        import accelerate  # noqa: F401
    except ImportError:
        raise ImportError("accelerate is required. Install with: pip install -U 'accelerate>=0.26.0'")

    def tok_fn(examples):
        return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=run_max_length)

    tokenized = dataset_dict.map(tok_fn, batched=True, desc=f'Tokenizing ({model_name})')

    if max_train_samples is not None:
        tokenized['train'] = tokenized['train'].select(range(min(len(tokenized['train']), int(max_train_samples))))
    if max_eval_samples is not None:
        tokenized['validation'] = tokenized['validation'].select(range(min(len(tokenized['validation']), int(max_eval_samples))))
        tokenized['test'] = tokenized['test'].select(range(min(len(tokenized['test']), int(max_eval_samples))))

    model = AutoModelForSequenceClassification.from_pretrained(
        checkpoint, num_labels=num_labels, id2label=id2label, label2id=label2id
    )
    model.to(device)

    args = TrainingArguments(
        output_dir=str(run_output_dir),
        num_train_epochs=num_epochs,
        per_device_train_batch_size=train_batch_size,
        per_device_eval_batch_size=eval_batch_size,
        learning_rate=learning_rate,
        weight_decay=weight_decay,
        eval_strategy='epoch',
        save_strategy='epoch',
        load_best_model_at_end=True,
        metric_for_best_model='f1',
        logging_dir=str(run_output_dir / 'logs'),
        logging_steps=50,
        save_total_limit=2,
        seed=seed,
        fp16=torch.cuda.is_available(),
        report_to='none',
    )

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=tokenized['train'],
        eval_dataset=tokenized['validation'],
        processing_class=tokenizer,
        compute_metrics=compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
    )

    train_result = trainer.train()
    val_results = trainer.evaluate(eval_dataset=tokenized['validation'])
    test_results = trainer.evaluate(eval_dataset=tokenized['test'])

    trainer.save_model(str(final_model_dir))
    tokenizer.save_pretrained(str(final_model_dir))

    results_dict = {
        'run_tag': run_tag,
        'model_name': model_name,
        'model_checkpoint': checkpoint,
        'validation_results': val_results,
        'test_results': test_results,
        'train_metrics': train_result.metrics,
        'training_args': {
            'num_epochs': num_epochs,
            'train_batch_size': train_batch_size,
            'eval_batch_size': eval_batch_size,
            'learning_rate': learning_rate,
            'weight_decay': weight_decay,
            'max_length': run_max_length,
            'seed': seed,
            'max_train_samples': max_train_samples,
            'max_eval_samples': max_eval_samples,
        },
        'saved_artifacts': {
            'output_dir': str(run_output_dir),
            'final_model_dir': str(final_model_dir),
        },
    }
    with open(results_file, 'w', encoding='utf-8') as f:
        json.dump(results_dict, f, indent=2)

    row = {
        'run_tag': run_tag,
        'model_name': model_name,
        'checkpoint': checkpoint,
        'seed': seed,
        'eval_accuracy': test_results.get('eval_accuracy'),
        'eval_precision': test_results.get('eval_precision'),
        'eval_recall': test_results.get('eval_recall'),
        'eval_f1': test_results.get('eval_f1'),
        'eval_loss': test_results.get('eval_loss'),
        'max_length': run_max_length,
        'num_epochs': num_epochs,
        'train_batch_size': train_batch_size,
        'learning_rate': learning_rate,
        'final_model_dir': str(final_model_dir),
        'results_file': str(results_file),
    }
    _append_csv_row(summary_csv, row)

    del trainer, model, tokenizer, tokenized
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

    print('‚úÖ Done:', run_tag)
    return results_dict

def finetune_many_transformers(model_names=('BERT','RoBERTa','XLM-RoBERTa'), seeds=(RANDOM_STATE,), **kwargs):
    results = []
    for m in model_names:
        for s in seeds:
            results.append(finetune_one_transformer(m, seed=s, **kwargs))
    return results

# Lightweight batch run (one shot): all lightweight models, same settings
results = finetune_many_transformers(
    model_names=LIGHTWEIGHT_MODEL_NAMES,
    seeds=(42,),
    num_epochs=1,
    train_batch_size=8,
    eval_batch_size=16,
    max_length=128,
    max_train_samples=2000,
    max_eval_samples=500,
)

# Full run (paper):
# results = finetune_many_transformers(model_names=('BERT','RoBERTa','XLM-RoBERTa'), seeds=(42,43), num_epochs=3, max_length=MAX_LENGTH)


üöÄ Fine-tuning: DistilBERT
   checkpoint=distilbert-base-uncased
   seed=42
   max_length=128
   output=distilbert_seed42_1769876624_finetuned



Tokenizing (DistilBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 3141.78 examples/s]
Tokenizing (DistilBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 4672.30 examples/s]
Tokenizing (DistilBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 5418.16 examples/s]
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6303,0.536908,0.79,0.3219,0.397059,0.35118


‚úÖ Done: distilbert_seed42_1769876624

üöÄ Fine-tuning: DistilRoBERTa
   checkpoint=distilroberta-base
   seed=42
   max_length=128
   output=distilroberta_seed42_1769876756_finetuned



Tokenizing (DistilRoBERTa): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 3835.68 examples/s]
Tokenizing (DistilRoBERTa): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 3749.65 examples/s]
Tokenizing (DistilRoBERTa): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 7191.28 examples/s]
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5632,0.457039,0.82,0.690499,0.46563,0.471487


‚úÖ Done: distilroberta_seed42_1769876756

üöÄ Fine-tuning: TinyBERT
   checkpoint=prajjwal1/bert-tiny
   seed=42
   max_length=128
   output=tinybert_seed42_1769876917_finetuned



Tokenizing (TinyBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 5633.84 examples/s]
Tokenizing (TinyBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 5239.47 examples/s]
Tokenizing (TinyBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 6139.08 examples/s]
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5117,1.477178,0.645,0.257537,0.313995,0.279309


‚úÖ Done: tinybert_seed42_1769876917

üöÄ Fine-tuning: MiniBERT
   checkpoint=prajjwal1/bert-mini
   seed=42
   max_length=128
   output=minibert_seed42_1769876927_finetuned



Tokenizing (MiniBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 6294.78 examples/s]
Tokenizing (MiniBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 4711.14 examples/s]
Tokenizing (MiniBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 6322.74 examples/s]
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-mini and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2711,1.226802,0.495,0.230142,0.223529,0.170466


‚úÖ Done: minibert_seed42_1769876927

üöÄ Fine-tuning: SmallBERT
   checkpoint=prajjwal1/bert-small
   seed=42
   max_length=128
   output=smallbert_seed42_1769876949_finetuned



Tokenizing (SmallBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 6575.29 examples/s]
Tokenizing (SmallBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 5414.56 examples/s]
Tokenizing (SmallBERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 5944.67 examples/s]
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-small and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7792,0.671602,0.785,0.319726,0.394861,0.348994


‚úÖ Done: smallbert_seed42_1769876949

üöÄ Fine-tuning: ELECTRA-Small
   checkpoint=google/electra-small-discriminator
   seed=42
   max_length=128
   output=electra_small_seed42_1769877226_finetuned



Tokenizing (ELECTRA-Small): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:00<00:00, 3387.32 examples/s]
Tokenizing (ELECTRA-Small): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 4373.53 examples/s]
Tokenizing (ELECTRA-Small): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 4838.75 examples/s]
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at google/electra-small-discriminator and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4385,1.400171,0.455,0.091,0.2,0.125086


‚úÖ Done: electra_small_seed42_1769877226


In [None]:
results = finetune_many_transformers(model_names=('BERT','RoBERTa','XLM-RoBERTa'), seeds=(42,43), num_epochs=3, max_length=MAX_LENGTH)


üöÄ Fine-tuning: BERT
   checkpoint=bert-base-uncased
   seed=42
   max_length=303
   output=bert_seed42_1769878709_finetuned



Tokenizing (BERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 933/933 [00:01<00:00, 631.86 examples/s]
Tokenizing (BERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:00<00:00, 1671.38 examples/s]
Tokenizing (BERT): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 201/201 [00:00<00:00, 1733.70 examples/s]
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
