# **Notebook D.** Classification Using a Transformer Model
----

One of the biggest recent developments in Natural Language Processing has come from the introduction of Transformer models (e.g. *BERT, EARNIE, RoBERTa*, etc.). The idea is that a model is trained on a very large corpus and used to create an embedding represention of words. This "raw" model can be downloaded and then fine-tuned (retrained) on our own data.

There are several ways to implement these models. Researchers who are most comfortable with Python may start with the **Transformers** library by **HuggingFace** (https://huggingface.co/transformers/). This is the most flexible approach, but it also requires effort for researchers to implement.

An alterantive is the **SimpleTransformers** library which is a wrapper for this functionality. This library contains an easy-to-use version of this transformer technique (https://simpletransformers.ai) that is similar to the sklearn commands we have used thus far.

This package is not pre-installed with colab. To do this, we need perform the following:
 - 1) run *!pip install simpletransformers* in the notebook below
 - 2) Comment out the code by putting a # in front of the line (e.g. *#!pip install simpletransformers*)
 - 3) Rerun all of the code from the top menu (or hit Ctrl+F9)




Models: https://simpletransformers.ai/docs/classification-specifics/#supported-model-types

Model Options: https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model


In [None]:
!pip install Transformers --upgrade

Collecting Transformers
  Downloading transformers-4.56.1-py3-none-any.whl.metadata (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.56.1-py3-none-any.whl (11.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m82.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Transformers
  Attempting uninstall: Transformers
    Found existing installation: transformers 4.56.0
    Uninstalling transformers-4.56.0:
      Successfully uninstalled transformers-4.56.0
Successfully installed Transformers-4.56.1


In [None]:
!pip install simpletransformers

Collecting simpletransformers
  Downloading simpletransformers-0.70.5-py3-none-any.whl.metadata (43 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.3/43.3 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting seqeval (from simpletransformers)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorboardx (from simpletransformers)
  Downloading tensorboardx-2.6.4-py3-none-any.whl.metadata (6.2 kB)
Collecting streamlit (from simpletransformers)
  Downloading streamlit-1.49.1-py3-none-any.whl.metadata (9.5 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit->sim

In [None]:
# Load the Simple Transformers Package for Text Classification

from simpletransformers.classification import ClassificationModel

In [None]:
# Turn of warnings, just to avoid pesky messages that might cause confusion here
# Remove when testing your own code #
import warnings
warnings.filterwarnings("ignore")

In [None]:
import logging

logging.basicConfig(level=logging.ERROR)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.ERROR)

# D.1. Preamble: Load Packages
---

In [None]:
# General Packages #
import os
import pandas as pd
import numpy as np

# TQDM to Show Progress Bars #
from tqdm import tqdm
from tqdm.notebook import tqdm as tqdm_notebook

# SKLearn libraries for splitting sample and validation
from sklearn.model_selection import train_test_split, StratifiedShuffleSplit, StratifiedKFold, cross_val_predict
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

# Additional Libraries that we are using only in this notebook
import torch
import gc

import shutil, json  # 需要新增
from glob import glob

In [None]:
# Turn of warnings, just to avoid pesky messages that might cause confusion here
# Remove when testing your own code #
import warnings
warnings.filterwarnings("ignore")

In [None]:
# Mount Personal Google Drive on own Machine -- You have to follow the link to log in #
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# === 保存根目录（建议放 Google Drive 下）===
SAVE_ROOT = "/content/drive/MyDrive/USPTO_data/Output/Models/Transformers"

def ensure_dir(p):
    os.makedirs(p, exist_ok=True)


In [None]:
# === 保存根目录（统一放在 Google Drive）===
SAVE_ROOT = "/content/drive/MyDrive/USPTO_data/Output/Models/Transformers"

def ensure_dir(p):
    os.makedirs(p, exist_ok=True)


# D.2. Load Training Data ##
----------------

We are going to use the data on the Google drive. This is in a csv file, and so we are going to load the data as a dataframe, and then convert the main data (Patent Ids, Indicator for AI / Non-AI, Patent Abstract) from a Pandas DataFrame to a list (which is more easily used in later sections).

In [None]:
# Change to Working Directory with Training Data #
os.chdir("/content/drive/MyDrive/USPTO_data")

# Load Training Data #
TrainingData = pd.read_csv("./Training_Data/4K Patents - AI 20p.csv")

# Store Data in Lists for Text Classification #
IDs = np.array(TrainingData['app number'].values.tolist())
Abstract_Text = TrainingData['abstract'].values.tolist()
Classes = TrainingData['actual'].values.tolist()

# D.3. Perform Classification with Transformer Model
---

As before, we are going to go through different models and compare their performance. Recall that transformer models are pre-trained by an external entity and we are simply downloading them (pre-trained) from the web and fine tuning them our particular application.

We download these models from hugging face. We are using the simpletransformers library which allows us to automatically download and train these models, using the same basic command for different models. We simply need to specify the model architecture (e.g. BERT) and then specific model, which usually refers to the type of data it was trained on (e.g. bert-base-uncased).

In the following link you can see the possible models that can be used by simple-transformers.

* https://simpletransformers.ai/docs/classification-specifics/#supported-model-types

This refers to the model type or architecture. There might be various types of models trained for different purposes that use the same architecture (e.g. SciBERT). You can downlaod the most common models directly from huggingface:

* https://huggingface.co/transformers/pretrained_models.html

You can also download community models here:

* https://huggingface.co/models

Below we define a list of the different transformer models we are going to use. These are listed in the following order: Name (e.g. BERT), Architecture (e.g. bert), Specific Model (e.g. bert-base-uncased)


In [None]:
# ===== Stable, no-freeze rewrite =====
import os, gc, torch, numpy as np, pandas as pd
from tqdm.auto import tqdm
os.environ["TOKENIZERS_PARALLELISM"] = "false"   # 避免分词并行卡住

from simpletransformers.classification import ClassificationModel
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix
from sklearn.utils.class_weight import compute_class_weight

In [None]:
# ✅ 严格按 simpletransformers 支持的 model_type（classification-specifics）
CLASSIFIERS = [
    ["BERT",        "bert",        "bert-base-uncased"],
    ["RoBERTa",     "roberta",     "roberta-base"],
    ["DeBERTa",     "deberta",     "microsoft/deberta-base"],          # v1 架构，对应 model_type='deberta'
    # ["Longformer",  "longformer",  "allenai/longformer-base-4096"],    # 长序列
    ["BigBird",     "bigbird",     "google/bigbird-roberta-base"],     # 长序列
    ["DistilBERT",  "distilbert",  "distilbert-base-uncased"],
    ["ALBERT",      "albert",      "albert-base-v2"],
    ["SciBERT",     "bert",        "allenai/scibert_scivocab_uncased"],# 属于 BERT 架构
    ["PatentBERT",  "bert",        "anferico/bert-for-patents"],       # 属于 BERT 架构
    ["BioBERT",     "bert",        "dmis-lab/biobert-v1.1"],           # 属于 BERT 架构
    ["XLNet",       "xlnet",       "xlnet-base-cased"],
    ["ELECTRA",     "electra",     "google/electra-base-discriminator"],
]


In [40]:
# ========= 配置 =========
NUM_OF_SPLITS = 5
Reweight = True

RESULTS, Classified_Values = [], []

BASE_ARGS = {
    "num_train_epochs": 10,
    "train_batch_size": 32,
    "eval_batch_size": 64,
    "max_seq_length": 512,
    "fp16": False,
    "overwrite_output_dir": True,

    # ——早停+保存最优——
    "use_early_stopping": True,
    "early_stopping_patience": 1,
    "early_stopping_metric": "eval_loss",
    "evaluate_during_training": True,
    "early_stopping_consider_epochs": True,
    "early_stopping_verbose": True,
    "save_best_model": True,          # ★ 训练时自动把最佳权重落盘到 best_model_dir
    "save_eval_checkpoints": False,
    "save_model_every_epoch": False,

    # ——稳定——
    "reprocess_input_data": True,
    "no_save": False,                 # ★ 允许保存（与上面 best_model 配合）
    "no_cache": True,
    "silent": False,
    "logging_steps": 20,
    "process_count": 1,
    "use_multiprocessing": False,
    "use_multiprocessing_for_evaluation": False,
    "dataloader_num_workers": 0,
}

def build_args_for(model_type, model_name):
    args = BASE_ARGS.copy()
    args["do_lower_case"] = ("uncased" in model_name.lower())
    return args

def logits_to_prob(raw_outputs):
    """将 raw logits 转为正类概率（用于 AUC）。兼容 (N,2) 与 (N,)。"""
    raw = np.array(raw_outputs)
    raw = np.squeeze(raw)
    if raw.ndim == 2 and raw.shape[1] == 2:
        raw = raw - raw.max(axis=1, keepdims=True)
        exp = np.exp(raw)
        probs = exp / exp.sum(axis=1, keepdims=True)
        return probs[:, 1]
    elif raw.ndim == 1:
        z = raw.squeeze()
        return 1.0 / (1.0 + np.exp(-z))
    else:
        raise ValueError(f"Unexpected raw_outputs shape: {raw.shape}")

# ——确保标签为一维 0/1——
Y0 = np.asarray(Classes)
assert Y0.ndim == 1, f"Labels must be 1D; got shape {Y0.shape}"
u = np.unique(Y0)
assert set(u).issubset({0, 1}), f"Labels must be 0/1; got values {u}"
LABELS_1D = Y0.astype(int)

# ========= 主循环：每个模型 × K 折 =========
use_cuda = torch.cuda.is_available()
import shutil, json

for name, model_type, model_name in tqdm(CLASSIFIERS, desc="Evaluating Classifiers", leave=True):
    y_actual, y_predicted, id_s = [], [], []
    prob_pos_all = []

    # 记录每折 AUC 与 best_model 目录，用于“晋升最终模型”
    fold_auc_list = []  # [(fold_idx, auc_value, best_dir), ...]

    kf = StratifiedKFold(n_splits=NUM_OF_SPLITS, shuffle=True, random_state=1)
    for fold_idx, (train_i, test_i) in enumerate(
        tqdm(kf.split(Abstract_Text, LABELS_1D), desc=f"{name} | Cross-Validating",
             leave=False, total=NUM_OF_SPLITS)
    ):
        # ——切分这一折数据——
        X = np.array(Abstract_Text); Y = LABELS_1D
        train_X, test_X = X[train_i], X[test_i]
        train_y, test_y = Y[train_i], Y[test_i]
        Train_IDs, Test_IDs = IDs[train_i], IDs[test_i]

        # ——Simple Transformers 需要的 DataFrame——
        train_df = pd.DataFrame({"text": list(train_X), "labels": list(train_y)})

        # ——为早停切出 10% 训练折做内部验证——
        es_train_df, es_val_df = train_test_split(
            train_df, test_size=0.10, stratify=train_df["labels"], random_state=42
        )

        # ——类别不平衡：计算 class weights（支持的模型将使用）——
        weight_vec = None
        if Reweight:
            cls_w = compute_class_weight(class_weight="balanced",
                                         classes=np.array([0, 1]),
                                         y=es_train_df["labels"])
            weight_vec = cls_w.tolist()

        # === 为该模型该折设置独立保存目录 ===
        model_root = os.path.join(SAVE_ROOT, name.replace(" ", "_"))
        fold_save_dir = os.path.join(model_root, f"fold_{fold_idx+1}")
        best_dir = os.path.join(fold_save_dir, "best_model")
        ensure_dir(best_dir)

        args = build_args_for(model_type, model_name)
        args.update({
            "output_dir": fold_save_dir,   # 训练输出（最后一次权重等）
            "best_model_dir": best_dir,    # ★ 最优检查点会自动存到这里
        })

        # ——创建模型（若报“不支持 class weights”，自动去掉 weight 重试）——
        try:
            model = ClassificationModel(
                model_type, model_name,
                weight=weight_vec,
                args=args,
                use_cuda=use_cuda
            )
        except ValueError as e:
            if "does not currently support class weights" in str(e).lower():
                model = ClassificationModel(
                    model_type, model_name,
                    weight=None,
                    args=args,
                    use_cuda=use_cuda
                )
            else:
                raise

        # ——训练（带 eval_df，启用早停+保存最优）——
        model.train_model(es_train_df, eval_df=es_val_df)

        # ——显式再保存一份当前权重到折目录（包含 tokenizer/config）——
        model.save_model(fold_save_dir)

        # ——在该折测试集上预测 & 计算该折 AUC ——
        preds_fold, raw_outputs_fold = model.predict(list(test_X))
        prob_pos_fold = logits_to_prob(raw_outputs_fold)
        auc_fold = roc_auc_score(test_y, prob_pos_fold)
        fold_auc_list.append((fold_idx+1, float(auc_fold), best_dir))

        # ——累积到总体指标容器（与你原脚本一致）——
        id_s.extend(list(Test_IDs))
        y_actual.extend(list(test_y))
        y_predicted.extend(list(preds_fold))
        prob_pos_all.extend(list(prob_pos_fold))

        del model
        gc.collect(); torch.cuda.empty_cache()

    # ========== 五折汇总指标（与你原脚本的输出格式一致） ==========
    y_arr = np.asarray(y_actual); p_arr = np.asarray(prob_pos_all)

    Share = np.round(np.mean(y_predicted), 3)
    Accuracy = accuracy_score(y_arr, y_predicted)
    ROC = roc_auc_score(y_arr, p_arr)  # ★ 用概率算 AUC
    Precision = precision_score(y_arr, y_predicted, zero_division=0)
    Recall = recall_score(y_arr, y_predicted, zero_division=0)
    F1 = f1_score(y_arr, y_predicted, zero_division=0)

    tn, fp, fn, tp = confusion_matrix(y_arr, y_predicted).ravel()
    # 按你原表头（注意与标准命名不同）
    FN = np.round(tn/(tn+fn), 3)
    FP = np.round(fp/(fp+tp), 3)
    TN = np.round(fn/(tn+fn), 3)
    TP = np.round(tp/(tp+fp), 3)

    RESULTS.append([name, Share, TP, FN, FP, TN,
                    np.round(Accuracy, 3),
                    np.round(ROC, 3),
                    np.round(Precision, 3),
                    np.round(Recall, 3),
                    np.round(F1, 3)])

    Classified_Values.append(list(zip(len(id_s)*[name], id_s, y_actual, y_predicted)))

    # ========== 晋升“最终模型”：挑 AUC 最高折的 best_model → final_from_best_fold/ ==========
    if fold_auc_list:
        best_fold_idx, best_fold_auc, best_fold_dir = max(fold_auc_list, key=lambda x: x[1])
        final_dir = os.path.join(SAVE_ROOT, name.replace(" ", "_"), "final_from_best_fold")
        ensure_dir(os.path.dirname(final_dir))
        shutil.copytree(best_fold_dir, final_dir, dirs_exist_ok=True)

        # 记录各折 AUC，便于审计/追溯
        with open(os.path.join(SAVE_ROOT, name.replace(" ", "_"), "cv_fold_metrics.json"), "w") as f:
            json.dump(
                [{"fold": f, "AUC": a, "path": p} for f, a, p in fold_auc_list],
                f, indent=2, ensure_ascii=False
            )


Evaluating Classifiers:   0%|          | 0/11 [00:00<?, ?it/s]

BERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

RoBERTa | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

DeBERTa | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

BigBird | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

DistilBERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

ALBERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

SciBERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

PatentBERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

BioBERT | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

XLNet | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 5 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

ELECTRA | Cross-Validating:   0%|          | 0/5 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

Map:   0%|          | 0/2880 [00:00<?, ? examples/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 2 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 3 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Running Epoch 4 of 10:   0%|          | 0/90 [00:00<?, ?it/s]

Map:   0%|          | 0/320 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Predicting:   0%|          | 0/13 [00:00<?, ?it/s]

# D.4. Output Classification Results #
----


In [41]:
# Convert List to Dataframe #
RESULTS_TABLE = pd.DataFrame(RESULTS, columns = ["Name", "Share", "True-Positives",
                                                 "False-Negatives", "False-Positives",
                                                 "True-Negatives","Accuracy", "AUC",
                                                 "Precision", "Recall", "F1"] )

RESULTS_TABLE["Type"] = "Transformer"
RESULTS_TABLE = RESULTS_TABLE[["Name", "Type", "Share", "True-Positives",
                               "False-Negatives", "False-Positives",
                               "True-Negatives","Accuracy", "AUC",
                               "Precision", "Recall", "F1"]]



# Output Results #
RESULTS_TABLE.sort_values("Accuracy", ascending = False ).to_csv("./Output/Model Performance/Transformer Classification Model Performance.csv")

# Display Results -- Out of Sample (Holdout) prediction -- Sorted by Accuracy #
RESULTS_TABLE.sort_values("Accuracy", ascending = False )


Unnamed: 0,Name,Type,Share,True-Positives,False-Negatives,False-Positives,True-Negatives,Accuracy,AUC,Precision,Recall,F1
8,BioBERT,Transformer,0.2,0.876,0.968,0.124,0.032,0.95,0.967,0.876,0.873,0.874
6,SciBERT,Transformer,0.213,0.852,0.976,0.148,0.024,0.949,0.968,0.852,0.904,0.877
7,PatentBERT,Transformer,0.186,0.903,0.96,0.097,0.04,0.949,0.959,0.903,0.837,0.869
10,ELECTRA,Transformer,0.217,0.842,0.977,0.158,0.023,0.948,0.973,0.842,0.91,0.875
3,BigBird,Transformer,0.22,0.833,0.978,0.167,0.022,0.946,0.974,0.833,0.913,0.871
1,RoBERTa,Transformer,0.228,0.808,0.979,0.192,0.021,0.94,0.974,0.808,0.92,0.861
2,DeBERTa,Transformer,0.208,0.84,0.967,0.16,0.033,0.94,0.96,0.84,0.869,0.854
5,ALBERT,Transformer,0.178,0.852,0.94,0.148,0.06,0.925,0.952,0.852,0.756,0.801
4,DistilBERT,Transformer,0.25,0.741,0.98,0.259,0.02,0.92,0.965,0.741,0.925,0.823
9,XLNet,Transformer,0.263,0.723,0.986,0.277,0.014,0.917,0.974,0.723,0.949,0.821


In [42]:
# Output Classification Results for Training Dataset -- PREDICTED VALUES -- Out Of Sample (Holdout) Prediction #

for i in range(0,len(Classified_Values), 1):

  Temp = pd.DataFrame(  Classified_Values[i],
                        columns = ['Model', 'id', 'Actual', 'Predicted'] )

  if i == 0:
    name = Temp.head(1)['Model'][0]
    Temp = Temp[['id', 'Actual', 'Predicted']]
    Temp.columns = ['id', 'Actual', name]
    Final = Temp

  else:

    name = Temp.head(1)['Model'][0]
    Temp = Temp[['id', 'Predicted']]
    Temp.columns = ['id', name]

    Final = Final.merge(Temp, on = ['id'])

# Save Data Frame #
Final.to_csv("./Output/Classification Output/Transformer Classification Results.csv")

Delete files that were created behind the scenes by the transformer model.

In [43]:
# 确保结果输出目录存在
ensure_dir("./Output/Model Performance")
ensure_dir("./Output/Classification Output")

In [44]:
# === 生成所有模型的“最终版”清单，便于部署/加载 ===
manifest = {}
for name, _, _ in CLASSIFIERS:
    model_dir = os.path.join(SAVE_ROOT, name.replace(" ", "_"))
    final_dir = os.path.join(model_dir, "final_from_best_fold")
    if os.path.isdir(final_dir):
        manifest[name] = {"final_model_dir": final_dir}
with open(os.path.join(SAVE_ROOT, "manifest.json"), "w") as f:
    json.dump(manifest, f, indent=2, ensure_ascii=False)
print("✅ 已生成 manifest：", os.path.join(SAVE_ROOT, "manifest.json"))

✅ 已生成 manifest： /content/drive/MyDrive/USPTO_data/Output/Models/Transformers/manifest.json


In [45]:
# rm -rf "./outputs"

In [46]:
# rm -rf "./runs"