# How to Leverage Transformers in PECOS

Extreme multi-label text classification (XMC) seeks to find relevant labels from an
extreme large label collection for a given text input.
The current state of the art result on XMC benchmarks are established by **XR-Transformer** [[NeurIPS21](https://arxiv.org/pdf/2110.00685.pdf)], which leverages recursively fine-tuned transformer encoders in text feature extaction.

In this section, we will demostrate how you can use XR-Transformer to solve the XMC problems.

### Download dataset and fine-tuned Transformer encoders

In [1]:
%%bash
DATASET="wiki10-31k"
wget -nv -nc https://archive.org/download/pecos-dataset/xmc-base/${DATASET}.tar.gz
tar --skip-old-files -zxf ${DATASET}.tar.gz 
find xmc-base/${DATASET}/*
wget -q https://archive.org/download/xr-transformer-demos/${DATASET}-bert.tar.gz
mkdir -p ./work_dir/xr-transformer-encoder
tar -zxf ./${DATASET}-bert.tar.gz -C ./work_dir/xr-transformer-encoder
find ./work_dir/xr-transformer-encoder/*

2022-08-13 21:39:55 URL:https://ia802308.us.archive.org/21/items/pecos-dataset/xmc-base/wiki10-31k.tar.gz [162277861/162277861] -> "wiki10-31k.tar.gz" [1]


xmc-base/wiki10-31k/output-items.txt
xmc-base/wiki10-31k/tfidf-attnxml
xmc-base/wiki10-31k/tfidf-attnxml/X.trn.npz
xmc-base/wiki10-31k/tfidf-attnxml/X.tst.npz
xmc-base/wiki10-31k/X.trn.txt
xmc-base/wiki10-31k/X.tst.txt
xmc-base/wiki10-31k/Y.trn.npz
xmc-base/wiki10-31k/Y.trn.txt
xmc-base/wiki10-31k/Y.tst.npz
xmc-base/wiki10-31k/Y.tst.txt
./work_dir/xr-transformer-encoder/wiki10-31k
./work_dir/xr-transformer-encoder/wiki10-31k/bert
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/text_encoder
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/text_encoder/pytorch_model.bin
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/text_encoder/config.json
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/C.npz
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/param.json
./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder/text_model
./work_dir/xr-transformer-en

## Outline in this Session

  1. XR-Transformer Overview
  2. Hands on training and evaluation
  3. How to customize the parameter settings
  4. Command line interface tools
  5. Example pf using XR-Transformer on your custom dataset

## 1. XR-Transformer Overview

## 1.1 Benchmarking XR-Transformer on public XMC datasets

A comparison of Precision@1,3,5 and training time on 3 public XMC benchmarking datasets.

PECOS XR-Transformer achieves the highgest accuracy while taking significantly less time to train (20-50X faster than X-Transformer).

<table><tr>
<td> <img src="imgs/xrtransformer_prec135.png" width="90%"/> </td>
<td> <img src="imgs/xrtransformer_trainingtime.png" width="80%"/> </td>
</tr></table>


## 1.2 Training Procedures

One important thing to note is that XR-Transformer leverages multi-resolution fine-tuning to allow tuning from easy to hard tasks. The training can be separated into three steps:

* **Step1**: Label features are computed and are used to build preliminary hierarchical label tree (HLT).
* **Step2**: Fine-tune the transformer encoder on the chosen levels of the preliminary HLT.
* **Step3**: Concatenate final instance embeddings and sparse features and train the linear rankers on the refined HLT.

<div> <br/><img src="imgs/pecos_xrtransformer.png" width="70%"/> </div>



## 2. Hands on training and evaluation
### 2.1 Data Loading

XR-Transformer model takes both raw text as well as text numerical features (such as TFIDF) as input.

In [2]:
import logging
import numpy as np
from pecos.utils import smat_util, logging_util

# set logging level to WARNING(1)
# you can change this to INFO(2) or DEBUG(3) if you'd like to see more logging
LOGGER = logging.getLogger(__name__)
logging_util.setup_logging_config(level=1)

# load training data
X_feat_trn = smat_util.load_matrix("xmc-base/wiki10-31k/tfidf-attnxml/X.trn.npz", dtype=np.float32)
Y_trn = smat_util.load_matrix("xmc-base/wiki10-31k/Y.trn.npz", dtype=np.float32)

with open("xmc-base/wiki10-31k/X.trn.txt", 'r') as fin:
    X_txt_trn = [xx.strip() for xx in fin.readlines()]

# load test data
X_feat_tst = smat_util.load_matrix("xmc-base/wiki10-31k/tfidf-attnxml/X.tst.npz", dtype=np.float32)
Y_tst = smat_util.load_matrix("xmc-base/wiki10-31k/Y.tst.npz", dtype=np.float32)

with open("xmc-base/wiki10-31k/X.tst.txt", 'r') as fin:
    X_txt_tst = [xx.strip() for xx in fin.readlines()]

### 2.2 Model Training and Evaluation

In this section, we will compare the performance of three models:
1. XR-Linear model with only sparse TF-IDF features
2. XR-Transformer model without fine-tuning
3. XR-Transformer model with fine-tuning

XR-Transformer parameters for 6 public XMC benchmark datasets (i.e. `Eurlex-4K`, `Wiki10-31K`,
`AmazonCat-13K`, `Wiki-500K`, `Amazon-670K`, `Amazon-3M`) are released. For this turoiral we will be using `Wiki10-31K` with `bert-base-uncased` encoder as an example.

In [3]:
import json
import requests
from pecos.xmc.xtransformer.model import XTransformer

# get XR-Transformer training params
param_url = "https://raw.githubusercontent.com/amzn/pecos/mainline/examples/xr-transformer-neurips21/params/wiki10-31k/bert/params.json"
params = json.loads(requests.get(param_url).text)
    
wiki31k_train_params = XTransformer.TrainParams.from_dict(params["train_params"])
wiki31k_pred_params = XTransformer.PredParams.from_dict(params["pred_params"])

# you can view the detailed parameter setting via
#print(json.dumps(wiki31k_train_params.to_dict(), indent=True))
#print(json.dumps(wiki31k_pred_params.to_dict(), indent=True))

#### Baseline 1: XR-Linear
Let's train a XR-Linear model on the TF-IDF features using the same hyper-parameters.

In [4]:
# construct label hierarchy
from pecos.xmc import Indexer, LabelEmbeddingFactory
cluster_chain = Indexer.gen(
    LabelEmbeddingFactory.create(Y_trn, X_feat_trn, method="pifa"),
    train_params=wiki31k_train_params.refined_indexer_params,
)

# train XR-Linear model
from pecos.xmc.xlinear import XLinearModel
xlm = XLinearModel.train(
    X_feat_trn,
    Y_trn,
    C=cluster_chain,
    train_params=wiki31k_train_params.ranker_params,
    pred_params=wiki31k_pred_params.ranker_params,
)

# predict on test set with XR-Linear model
P_xlm = xlm.predict(X_feat_tst)

# compute metrics using ground truth
metrics = smat_util.Metrics.generate(Y_tst, P_xlm)
print("Evaluation metrics of XR-Linear model")
print(metrics)

Evaluation metrics of XR-Linear model
prec   = 84.96 81.82 76.30 70.70 65.67 61.46 57.92 54.63 51.77 49.16
recall = 5.02 9.66 13.40 16.41 18.91 21.12 23.09 24.76 26.30 27.64


#### Baseline 2: XR-Transformer without fine-tuning

In [5]:
# define the problem
from pecos.xmc.xtransformer.module import MLProblemWithText
prob = MLProblemWithText(X_txt_trn, Y_trn, X_feat=X_feat_trn)

# disable fine-tuning, directly use pre-trained bert model from huggingface
wiki31k_train_params.do_fine_tune = False

# train XR-Transformer (without fine-tuning)
# this will be slow on CPU only machine
xrt_pretrained = XTransformer.train(
    prob,
    train_params=wiki31k_train_params,
    pred_params=wiki31k_pred_params,
)

# predict and compute metrics
P_xrt_pretrained = xrt_pretrained.predict(X_txt_tst, X_feat=X_feat_tst)
metrics = smat_util.Metrics.generate(Y_tst, P_xrt_pretrained)
print("Evaluation metrics of XR-Transformer (not fine-tuned)")
print(metrics)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForXMC: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertForXMC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForXMC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Evaluation metrics of XR-Transformer (not fine-tuned)
prec   = 85.22 82.55 77.26 72.15 67.42 63.13 59.33 56.08 53.02 50.24
recall = 5.05 9.76 13.58 16.74 19.41 21.68 23.64 25.41 26.92 28.22


#### Model: XR-Transformer
For demo purpose, let's disable fine-tuning and load an already fine-tuned encoder directly (i.e. skip step 1&2).

End-to-end training of XR-Transformer on **Wiki10-31K** dataset will take around 30min on **p3.16xlarge** instance.
If you are running this on equivalent or more powerful machine, you can also turn on `DO_FINE_TUNE_NOW` and train XR-Transformer end-to-end.

In [6]:
DO_FINE_TUNE_NOW = False

if DO_FINE_TUNE_NOW:
    wiki31k_train_params.do_fine_tune = True
else:
    # skip fine-tuning and use existing fine-tuned encoder
    wiki31k_train_params.do_fine_tune = False
    wiki31k_train_params.matcher_params_chain[0].init_model_dir = "./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder"

# this will be slow on CPU only machine
xrt_fine_tuned = XTransformer.train(
    prob,
    clustering=cluster_chain,
    train_params=wiki31k_train_params,
    pred_params=wiki31k_pred_params,
)

P_xrt_fine_tuned = xrt_fine_tuned.predict(X_txt_tst, X_feat=X_feat_tst)
metrics = smat_util.Metrics.generate(Y_tst, P_xrt_fine_tuned, topk=10)
print("Evaluation metrics of XR-Transformer")
print(metrics)

Evaluation metrics of XR-Transformer
prec   = 87.95 83.54 78.79 73.95 69.43 65.14 61.08 57.70 54.63 51.97
recall = 5.25 9.89 13.84 17.14 19.99 22.36 24.35 26.16 27.73 29.21


### 2.3 Save and load model, get transformer embeddings
Note you can pass keyword arguments of `XLinear.load` to `XTransformer.load` such as `is_predict_only`.

In [7]:
model_folder = "./work_dir/my_xrt"
xrt_fine_tuned.save(model_folder)
del xrt_fine_tuned
xrt_fine_tuned = XTransformer.load(model_folder, is_predict_only=True)

For BERT model, ebmeddings are from the [CLS] token.

In [8]:
X_emb_tst = xrt_fine_tuned.encode(
    X_txt_tst,
    batch_size=256,
    batch_gen_workers=8,
)
print(f"Generated test embedding type={type(X_emb_tst)} with shape={X_emb_tst.shape}")

Generated test embedding type=<class 'numpy.ndarray'> with shape=(6616, 768)


### 2.4 Training without TFIDF features

The XR-Transformer module can also be used with only text features when numerical features like TFIDF are not available.

In [9]:
prob_only_text = MLProblemWithText(X_txt_trn, Y_trn)
wiki31k_train_params.do_fine_tune = False
wiki31k_train_params.matcher_params_chain[0].init_model_dir = "./work_dir/xr-transformer-encoder/wiki10-31k/bert/text_encoder"

# this will be slow on CPU only machine
xrt_only_text = XTransformer.train(
    prob_only_text,
    clustering=cluster_chain,
    train_params=wiki31k_train_params,
    pred_params=wiki31k_pred_params,
)

P_xrt_only_text = xrt_only_text.predict(X_txt_tst)
metrics = smat_util.Metrics.generate(Y_tst, P_xrt_only_text, topk=10)
print("Evaluation metrics of XR-Transformer (without TFIDF)")
print(metrics)

Evaluation metrics of XR-Transformer (without TFIDF)
prec   = 86.23 81.57 76.04 70.60 65.52 61.38 57.65 54.31 51.45 48.81
recall = 5.11 9.61 13.28 16.31 18.80 21.00 22.90 24.54 26.06 27.38


## 3 How to customize the parameter settings
For your custom dataset, it is recommended to start from the pre-defined parameters or the default value and make proper modifications based on the specific problem.

### 3.1 Training Parameters of XTransformer.

```
xrt_train_params = XTransformer.TrainParams.from_dict(
{
 "do_fine_tune": [true/false],                   # if true, do encoder fine-tuning
 "only_encoder": [true/false],                   # if true, skip linear ranker training
 "max_match_clusters": INT                       # max label resolution to fine-tune encoder on
 "preliminary_indexer_params": {...},            # (HierarchicalKMeans.TrainParams) parameters to construct preliminary HLT 
 "refined_indexer_params": {...},                # (HierarchicalKMeans.TrainParams) parameters to construct refined HLT 
 "matcher_params_chain": [                       # fine-tuning parameters. Can be dict or list of dict. If dict, all layers will share the same setting
   {...},                                        # (TransformerMatcher.TrainParams) fine-tuning parameters for layer-0
   {...},                                        # (TransformerMatcher.TrainParams) fine-tuning parameters for layer-1
   ...
 ],
 "ranker_params": {...},                         # (XLinearModel.TrainParams) ranker training parameters
}
)
```

You can get the training and prediction parameters filled with default values by:

In [10]:
train_params = XTransformer.TrainParams.from_dict({}, recursive=True)
pred_params = XTransformer.PredParams.from_dict({}, recursive=True)

Detailed control over each layer's fine-tuning task is done through `matcher_params_chain`:

In [11]:
print(json.dumps(train_params.matcher_params_chain.to_dict(), indent=True))

{
 "__meta__": {
  "class_fullname": "pecos.xmc.xtransformer.matcher###TransformerMatcher.TrainParams"
 },
 "model_shortcut": "bert-base-cased",
 "negative_sampling": "tfn",
 "loss_function": "squared-hinge",
 "bootstrap_method": "linear",
 "lr_schedule": "linear",
 "threshold": 0.1,
 "hidden_dropout_prob": 0.1,
 "batch_size": 8,
 "batch_gen_workers": 4,
 "max_active_matching_labels": null,
 "max_num_labels_in_gpu": 65536,
 "max_steps": 0,
 "max_no_improve_cnt": -1,
 "num_train_epochs": 5,
 "gradient_accumulation_steps": 1,
 "weight_decay": 0,
 "max_grad_norm": 1.0,
 "learning_rate": 0.0001,
 "adam_epsilon": 1e-08,
 "warmup_steps": 0,
 "logging_steps": 50,
 "save_steps": 100,
 "cost_sensitive_ranker": false,
 "pre_tokenize": true,
 "pre_tensorize_labels": true,
 "use_gpu": true,
 "eval_by_true_shorlist": false,
 "checkpoint_dir": "",
 "cache_dir": "",
 "init_model_dir": ""
}


### 3.2 Getting the pre-trained models

There are two ways to provide pre-trained Transformer encoder:
* **Download from huggingface repo** (https://huggingface.co/models): pre-trained model name provided in `model_shortcut` (under `XTransformer.TrainParams.matcher_params_chain`) will be automatically downloaded. (e.x. `bert-base-uncased`)
* **Load your custom model from local disk**: model path provided by `init_model_dir`. Model should be loadable through `TransformerMatcher.load()`

Note that both `model_shortcut` and `init_model_dir` will only be used in the first fine-tuning layer, as the later ones will just continue on the final state from parent encoder.

A simple example if you want to construct your custom pre-trained model for XR-Transformer fine-tuning:

In [12]:
from pecos.xmc.xtransformer.matcher import TransformerMatcher
from transformers import AutoTokenizer, AutoModelForSequenceClassification

init_model_dir = "work_dir/my_pre_trained_model"

# example to use your own pre-trained model, here we use huggingface model as an example
my_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
my_encoder = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

# ...
# do my own modification/tuning/etc
# ...

# save my own model to disk
my_tokenizer.save_pretrained(f"{init_model_dir}/text_tokenizer")
my_encoder.save_pretrained(f"{init_model_dir}/text_encoder")

# then the `work_dir` can be fed as `init_model_dir` as initial model.
# Sanity check: if this dir can be loaded via TransformerMatcher.load(*)
matcher = TransformerMatcher.load(init_model_dir)
print(f"{matcher.__class__} model loaded with encoder_type={matcher.model_type} num_labels={matcher.nr_labels}")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

<class 'pecos.xmc.xtransformer.matcher.TransformerMatcher'> model loaded with encoder_type=bert num_labels=2


##  4. Command line interface tools
You can achieve the same functionalities with the provided command line tools.

Although we provide basic functionalities to supply training and prediction parameters in the CLI tool `pecos.xmc.xtransformer.train`, `pecos.xmc.xtransformer.predict` and `pecos.xmc.xtransformer.encode`,
you should supply parameters via a JSON file if you want full control over the training/prediction process.

Similar to the python interface, you can also generate a `.json` file with all of the parameters that you can edit and fill in via
```bash
python3 -m pecos.xmc.xtransformer.train --generate-params-skeleton &> params.json
```

After filling in the desired parameters into `params.json`, the training can be done end2end via:
```bash
python3 -m pecos.xmc.xtransformer.train \
    -t ${T_path} \
    -x ${X_path} \
    -y ${Y_path} \
    -m ${model_dir} \
    --params-path params.json

python3 -m pecos.xmc.xtransformer.predict \
    -t ${Tt_path} \
    -x ${Xt_path} \
    -m ${model_dir} \
    -o ${Pt_path}
```
where
* `T_path` and `Tt_path` are the paths to the input text file of the training/test instances. Text files with `N`/`Nt` lines where each line is the text feature of the corresponding training/test instance.
* `X_path` and `Xt_path` are the paths to the CSR npz or Row-majored npy files of the training/test feature matrices with shape `(N, d)` and `(Nt, d)`.
  * Note that you can use the PECOS built in text preprocessing/vectorizing module [pecos.utils.featurization.text.preprocess](https://github.com/amzn/pecos/tree/mainline/pecos/utils/featurization/text) to generate numerical features if you do not already have them.
  * Usually providing instance numerical features is recommended. However, if you choose not to provide numerical features, `code-path` or `label-feat-path` is required to generate the hierarchical label trees.
* `Y_path` and `Yt_path` are the paths to the CSR npz files of the training/test label matrices with shape `(N, L)` and `(Nt, L)`.
* `model_dir` is the path to the model folder where the trained model will be saved to, will be created if not exist.
* `Pt_path` is the path to save the prediction label matrix with shape `(Nt, L)`

To get the evaluation metrics for top-10 predictions:
```bash
python3 -m pecos.xmc.xlinear.evaluate \
    -y ${Yt_path} \
    -p ${Pt_path} \
    -k 10
```
You can also get the fine-tuned text embeddings via:
```bash
python3 -m pecos.xmc.xtransformer.encode \
    -t ${Tt_path} \
    -m ${model_dir} \
    -o ${Emb_path}
```

where
* `Emb_path` is the path to save the prediction label matrix with shape `(Nt, hidden_dim)`

## 5. Example: Use XR-Transformer for your custom dataset
This section demostrates how you can use XR-Transformer on your custom dataset.

**Note**: The data used here is a dummy dataset only for demo purposes, therefore we don't expect sensical results.

In [13]:
! wget -nv -nc https://archive.org/download/text2text_demo.tar.gz/text2text_demo.tar.gz
! tar --skip-old-files -zxf text2text_demo.tar.gz
! find text2text_demo/*

2022-08-13 21:48:32 URL:https://ia601500.us.archive.org/21/items/text2text_demo.tar.gz/text2text_demo.tar.gz [674/674] -> "text2text_demo.tar.gz" [1]
text2text_demo/output-labels.txt
text2text_demo/testing-data.txt
text2text_demo/training-data.txt


First format your input data into two files `training-data.txt` and `output-labels.txt`.

Each line of `output-labels.txt` corresponds to the text representation of a label:

In [14]:
! cat ./text2text_demo/output-labels.txt

Artificial intelligence researchers
Computability theorists
British computer scientists
Machine learning researchers
Turing Award laureates
Deep Learning


The `training-data.txt` stores input corpus and training signals. Each line in the file consists of two elements that represent the comma-separated label IDs and the input text of a data instance: 

<p style="text-align: center;"><i>
label_idx1,label_idx2,... &lt;TAB&gt; instance_text</i></p>

In [15]:
! cat ./text2text_demo/training-data.txt

0,1,2	Alan Turing is widely considered to be the father of theoretical computer science and artificial intelligence.
0,2,3	Hinton was co-author of a highly cited paper published in 1986 that popularized the backpropagation algorithm for training multi-layer neural networks.
3,4,5	Hinton received the 2018 Turing Award, together with Yoshua Bengio and Yann LeCun, for their work on artificial intelligence and deep learning.
0,3,5	Yoshua Bengio is a Canadian computer scientist, most noted for his work on artificial neural networks and deep learning.


First parse the `training-data.txt` into training corpus and label matrix:

In [16]:
from pecos.utils.featurization.text.preprocess import Preprocessor

parsed_result = Preprocessor.load_data_from_file(
    "./text2text_demo/training-data.txt",
    "./text2text_demo/output-labels.txt",
)
Y = parsed_result["label_matrix"]
X_txt = parsed_result["corpus"]

print(f"Constructed training corpus len={len(X_txt)}, training label matrix with shape={Y.shape} and nnz={Y.nnz}")

Constructed training corpus len=4, training label matrix with shape=(4, 6) and nnz=12


Build TF-IDF model with training corpus:

In [17]:
vectorizer_config = {
    "type": "tfidf",
    "kwargs": {
      "base_vect_configs": [
        {
          "ngram_range": [1, 2],
          "max_df_ratio": 0.98,
          "analyzer": "word",
        },
      ],
    },
}

tfidf_model = Preprocessor.train(X_txt, vectorizer_config)
X_feat = tfidf_model.predict(X_txt)

print(f"Constructed training feature matrix with shape={X_feat.shape} and nnz={X_feat.nnz}")

Constructed training feature matrix with shape=(4, 125) and nnz=151


Train XR-Transformer with all default settings:

In [18]:
from pecos.xmc.xtransformer.model import XTransformer
from pecos.xmc.xtransformer.module import MLProblemWithText
prob = MLProblemWithText(X_txt, Y, X_feat=X_feat)
custom_xtf = XTransformer.train(prob)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForXMC: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertForXMC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForXMC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Save tfidf model and XR-Transformer model to disk:

In [19]:
import os
custom_model_dir = "work_dir/custom_model"
os.makedirs(custom_model_dir, exist_ok=True)

tfidf_model.save(f"{custom_model_dir}/tfidf_model")
custom_xtf.save(f"{custom_model_dir}/xrt_model")

Load tfidf model and XR-Transformer model from disk:

In [20]:
custom_xtf = XTransformer.load(f"{custom_model_dir}/xrt_model")
tfidf_model = Preprocessor.load(f"{custom_model_dir}/tfidf_model")

Predict on a test input:

In [21]:
test_input = ["In 1989, Yann LeCun et al. applied the standard backpropagation algorithm on neural networks for hand digit recognition."]

P = custom_xtf.predict(
    test_input,
    X_feat=tfidf_model.predict(test_input),
    only_topk=1
)

with open("./text2text_demo/output-labels.txt", 'r') as fin:
    output_items = [ll.strip() for ll in fin.readlines()]

for i, t in enumerate(test_input):
    print(f"Input text: {t}")
    print(f"Predicted label: {output_items[P[i, :].indices[0]]}")
    print(f"Predicted score: {P[i, :].data[0]}")

Input text: In 1989, Yann LeCun et al. applied the standard backpropagation algorithm on neural networks for hand digit recognition.
Predicted label: Machine learning researchers
Predicted score: 0.7240481376647949
