## Transformer Pre-trained model

這一章節介紹目前自然語言處理最強大的模型-`Transformer`，`Transformer`相較於`RNN`系列的模型，`Transformer`在表現(`metrics`)以及計算效率(`parallel`)都有絕對的優勢，著名的`pre-train`模型如下，連結為各個模型的論文路徑，基本上這些模型都是`Transformer`的變形，不同的地方在於預訓練的策略，例如資料量大小、`Masked`的差異以及`Self-attention`矩陣的差異，最特別的是最後一個`ELECTRA`，是在`2019`年`11`月初提出的論文，結合了`transformer`還有`GAN`。

* BERT: https://arxiv.org/abs/1810.04805
 - Masked Language Modeling + Next Sentence Prediction


* GPT: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
 - AutoRegressive Prediction


* Transformer-XL: https://arxiv.org/abs/1901.02860
 - Learning dependency beyond a fixed length(>512)


* XLNet: https://arxiv.org/abs/1906.08237
 - Permutation Modeling


* XLM: https://arxiv.org/abs/1901.07291
 - Pretrain on cross-lingual language


* RoBERTa: https://arxiv.org/abs/1907.11692
 - Pretrain model longer, more data


* DistilBERT: https://arxiv.org/abs/1910.01108
* CTRL: https://arxiv.org/abs/1909.05858
* ELECTRA: https://openreview.net/pdf?id=r1xMH1BtvB
 - Transformer + GAN
 
 
### [GLUE Benchmark](https://gluebenchmark.com/leaderboard)

## [Transformer](https://huggingface.co/transformers/)

這邊我們使用`Transformers`套件來進行`finetune`，在進行`finetune`之前，需要了解自然語言處理任務上的差異，最主要分為兩種分類任務：

1. `Text classification`: 輸入一個句子，輸出該句子的分類。
2. `Sentence-Pair classification`: 輸入兩個句子的pair，輸出兩個句子之間的關係。

* PS. 這些預訓練模型除了表現亮眼之外，最重要的貢獻在於預訓練後的`word embedding`，`word embedding`表示在文本中，詞與詞之間的關係，最著名的例子就是: 男性 - 女性 = 國王 - 皇后，像這樣的對應關係，有訓練良好的`word embedding`基本上在其他應用任務表現也會不錯，例如聊天機器人。

在這裡我們會使用`BERT`來進行`finetune`。

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import pandas as pd
import os
from sklearn.metrics import classification_report, confusion_matrix

from transformers import *

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


## 模型名稱解釋

* `bert-base-uncased`:
  - `bert`: 模型名稱
  - `base`: 模型大小，`base`表示層數為$12$層, `word embedding(hidden)`為$768$維, `heads`為$12$，另外有`large`，層數為$24$層，`word embedding(hidden)`為$1024$維，`heads`為$16$。
  - `uncased`: 表示對於文本的前處理，`uncased`表示字全部轉小寫，反之`cased`表示維持原樣。
 
另外不只有這些模型，其餘模型可以參考：
https://huggingface.co/transformers/pretrained_models.html

In [2]:
"""
載入預訓練模型
"""
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--bert-base-uncased/snapshots/1dbc166cf8765166998eff31ade2eb64c8a40076/config.json
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.30.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

loading weights file model.safetensors from cache at /home/jovyan/.cache/huggingface/hub/models--bert-base-uncased/snapshots/1dbc166cf8765166998eff31ade2eb64c8a40076/model.safetensors
2023-09-21 17:40:46.800111: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-21 17:40:47.465036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10417 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:0e:00.0, compute capability: 6.1
Loaded 109,482,240 parameters in the TF 2.0 model.
All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassifi

In [3]:
"""
載入模型斷詞工具
"""
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

loading file vocab.txt from cache at /home/jovyan/.cache/huggingface/hub/models--bert-base-uncased/snapshots/1dbc166cf8765166998eff31ade2eb64c8a40076/vocab.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /home/jovyan/.cache/huggingface/hub/models--bert-base-uncased/snapshots/1dbc166cf8765166998eff31ade2eb64c8a40076/tokenizer_config.json
loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--bert-base-uncased/snapshots/1dbc166cf8765166998eff31ade2eb64c8a40076/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 

## Finetune

![img](https://hackmd.io/_uploads/rJIRRucJp.png)

所有預訓練模型都是在[GLUE Benchmark](https://gluebenchmark.com/leaderboard)進行競賽，這個競賽提供多種不同的自然語言處理任務，這些任務都是屬於分類任務，只是差別在於資料集大小以及來源而已，這裡我們使用其中一種分類任務`MRPC`來進行`finetune`。

* 資料來源: [tensorflow dataset](https://www.tensorflow.org/datasets/catalog/overview#wmt19_translate)

In [4]:
data, info = tfds.load('glue/mrpc', with_info=True)

2023-09-21 17:40:58.158956: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".


[1mDownloading and preparing dataset 1.43 MiB (download: 1.43 MiB, generated: 1.74 MiB, total: 3.17 MiB) to /home/jovyan/tensorflow_datasets/glue/mrpc/2.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/3668 [00:00<?, ? examples/s]

Shuffling /home/jovyan/tensorflow_datasets/glue/mrpc/2.0.0.incompleteC24VRC/glue-train.tfrecord*...:   0%|    …

Generating validation examples...:   0%|          | 0/408 [00:00<?, ? examples/s]

Shuffling /home/jovyan/tensorflow_datasets/glue/mrpc/2.0.0.incompleteC24VRC/glue-validation.tfrecord*...:   0%…

Generating test examples...:   0%|          | 0/1725 [00:00<?, ? examples/s]

Shuffling /home/jovyan/tensorflow_datasets/glue/mrpc/2.0.0.incompleteC24VRC/glue-test.tfrecord*...:   0%|     …

[1mDataset glue downloaded and prepared to /home/jovyan/tensorflow_datasets/glue/mrpc/2.0.0. Subsequent calls will reuse this data.[0m


### Info

資料集的介紹，最需要注意的地方就是資料集的樣子，因為`MRPC`是屬於`Sentnece-Pair classification`任務，所以資料集包括了`sentence1`和`sentence2`對應一個`label`，`MRPC`主要是在分類兩個句子之間的語義是否相同，`label`為$1$表示相同，反之$0$表示不同。

因為是競賽資料集，所以資料集已經切割好為train, validation以及test。

In [5]:
info

tfds.core.DatasetInfo(
    name='glue',
    full_name='glue/mrpc/2.0.0',
    description="""
    GLUE, the General Language Understanding Evaluation benchmark
    (https://gluebenchmark.com/) is a collection of resources for training,
    evaluating, and analyzing natural language understanding systems.
    """,
    config_description="""
    The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of
    sentence pairs automatically extracted from online news sources, with human annotations
    for whether the sentences in the pair are semantically equivalent.
    """,
    homepage='https://www.microsoft.com/en-us/download/details.aspx?id=52398',
    data_path=PosixGPath('/tmp/tmpaia75ru_tfds'),
    file_format=tfrecord,
    download_size=1.43 MiB,
    dataset_size=1.74 MiB,
    features=FeaturesDict({
        'idx': int32,
        'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
        'sentence1': Text(shape=(), dtype=string),
        'sentence2': Tex

In [6]:
for k, v in data.items():
    print('key:', k)
    print('data shapes:\n', v)
    print('-' * 20)

key: train
data shapes:
 <PrefetchDataset element_spec={'idx': TensorSpec(shape=(), dtype=tf.int32, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None), 'sentence1': TensorSpec(shape=(), dtype=tf.string, name=None), 'sentence2': TensorSpec(shape=(), dtype=tf.string, name=None)}>
--------------------
key: validation
data shapes:
 <PrefetchDataset element_spec={'idx': TensorSpec(shape=(), dtype=tf.int32, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None), 'sentence1': TensorSpec(shape=(), dtype=tf.string, name=None), 'sentence2': TensorSpec(shape=(), dtype=tf.string, name=None)}>
--------------------
key: test
data shapes:
 <PrefetchDataset element_spec={'idx': TensorSpec(shape=(), dtype=tf.int32, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None), 'sentence1': TensorSpec(shape=(), dtype=tf.string, name=None), 'sentence2': TensorSpec(shape=(), dtype=tf.string, name=None)}>
--------------------


### Dataset overview

`tensorflow`儲存資料的方式都是以`tf.data.Data`型態來儲存，可以使用`iter`來建立`generator`，並使用`next`來觀看第一筆資料，資料中包含了`idx`、`label`、`sentence1`以及`sentence2`。

In [7]:
assert isinstance(data['train'], tf.data.Dataset)

temp = data['train']
temp_gen = iter(temp)
next(temp_gen)

{'idx': <tf.Tensor: shape=(), dtype=int32, numpy=1680>,
 'label': <tf.Tensor: shape=(), dtype=int64, numpy=0>,
 'sentence1': <tf.Tensor: shape=(), dtype=string, numpy=b'The identical rovers will act as robotic geologists , searching for evidence of past water .'>,
 'sentence2': <tf.Tensor: shape=(), dtype=string, numpy=b'The rovers act as robotic geologists , moving on six wheels .'>}

### Training data format

接下來我們需要將資料集轉換成模型可讀取的格式，輸入格式有三個：

* `input_ids`: 這表示句子斷完詞之後轉成`token embeddings`，每一個詞有一個`id`，如下圖，其中`101`表示`[CLS]`，`102`表示`[SEP]`，因為`MPRC`是`Sentence-Pair classification`任務，所以下面的範例中會看到兩個`102`。

![](https://hackmd.io/_uploads/Hyl-1F5ka.png)

* `attention mask`: 因為`Transformer`會限制輸入句子的長度，最大限制為`512`，而我們選擇`128`，但不是所有的句子長度都是128，所以需要在後面進行`padding`(就是補0)，最主要的目的是不去計算`padding`位置的`loss`。

* `token_type_ids`: 用來表示`Segment embedding`，如上圖，表示詞屬於哪一個句子，因為`MRPC`有兩個句子，所以`ids`有2種，`0`和`1`。

In [8]:
max_length = 128
task = 'mrpc'

train_dataset = glue_convert_examples_to_features(data['train'],
                                                  tokenizer,
                                                  max_length,
                                                  task)
valid_dataset = glue_convert_examples_to_features(data['validation'],
                                                  tokenizer,
                                                  max_length,
                                                  task)
test_dataset = glue_convert_examples_to_features(data['test'],
                                                 tokenizer,
                                                 max_length,
                                                 task)

Using label list ['0', '1'] for task mrpc
Using output mode classification for task mrpc
*** Example ***
guid: 1680
features: InputFeatures(input_ids=[101, 1996, 7235, 9819, 2097, 2552, 2004, 20478, 21334, 2015, 1010, 6575, 2005, 3350, 1997, 2627, 2300, 1012, 102, 1996, 9819, 2552, 2004, 20478, 21334, 2015, 1010, 3048, 2006, 2416, 7787, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], attention_mask=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

### Example

觀察轉換過後的資料集。

In [9]:
next(iter(train_dataset))

({'input_ids': <tf.Tensor: shape=(128,), dtype=int32, numpy=
  array([  101,  1996,  7235,  9819,  2097,  2552,  2004, 20478, 21334,
          2015,  1010,  6575,  2005,  3350,  1997,  2627,  2300,  1012,
           102,  1996,  9819,  2552,  2004, 20478, 21334,  2015,  1010,
          3048,  2006,  2416,  7787,  1012,   102,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
   

### Parameter settings

在`tf.data.Dataset`中，通常會在訓練資料集後面接上三個標準的操作：

* `.shuffle()`: 打亂資料集的方式，會先從資料集中隨機抽取`buffer_size`筆資料進去`buffer`，然後再`buffer`從中抽取`batch_size`筆資料進行訓練，丟進`buffer`的步驟主要是在處理無法一次將所有資料集丟進記憶體進行訓練的情形。

* `.batch()`: 每次迭代使用的資料數量。
* `.repeat()`: `epochs`數量。

In [10]:
buffer_size = 100
train_bz = 16
epochs = 3
valid_bz = 50

train_dataset = train_dataset.shuffle(buffer_size).batch(train_bz).repeat(epochs)
valid_dataset = valid_dataset.batch(valid_bz)
test_dataset = test_dataset.batch(valid_bz)

In [11]:
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5,
                                     epsilon=1e-8,
                                     clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,
                                                     reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE)

model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

## Training

* `.fit()`: 支援`generator`的輸入方式，也可以用`fit_generator`。
* `steps_per_epoch`: 每個`epoch`訓練幾次，通常是$\frac{train\_size}{batch\_size}$，遍歷整個訓練集。
* `validation_steps`: 與`steps_per_epoch`同義。

In [12]:
history = model.fit(train_dataset,
                    epochs=epochs,
                    steps_per_epoch=3668//train_bz, 
                    validation_data=valid_dataset,
                    validation_steps=408//valid_bz)

Epoch 1/3
Epoch 2/3
Epoch 3/3


## Evaluation

In [20]:
valid_pred = model.predict(valid_dataset)
valid_pred_ids = np.argmax(valid_pred.logits, axis=-1)

In [22]:
valid_pred_ids

array([0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0,
       1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1,
       1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1,

In [23]:
"""
從tf.data.Dataset中拿取label
"""
valid_label = list()
for x in valid_dataset:
    valid_label += x[1].numpy().tolist()

In [25]:
print(classification_report(y_pred=valid_pred_ids, y_true=valid_label))

              precision    recall  f1-score   support

           0       0.79      0.71      0.75       129
           1       0.87      0.91      0.89       279

    accuracy                           0.85       408
   macro avg       0.83      0.81      0.82       408
weighted avg       0.85      0.85      0.85       408



In [26]:
confm = confusion_matrix(y_pred=valid_pred_ids, y_true=valid_label)

index = ['Actual_0', 'Actual_1']
columns = ['Pred_0', 'Pred_1']
pd.DataFrame(confm, index=index, columns=columns)

Unnamed: 0,Pred_0,Pred_1
Actual_0,92,37
Actual_1,25,254


## Save model

In [27]:
save_path = 'save_glue'
if not os.path.exists(save_path):
    os.mkdir(save_path)

In [28]:
model.save_pretrained(save_path)

Configuration saved in save/config.json
Model weights saved in save/tf_model.h5


## Load model and predict

這邊參考`MRPC`的輸入格式，一樣會使用`glue_convert_examples_to_features`這個函數進行轉換。

In [29]:
new_model = TFBertForSequenceClassification.from_pretrained(save_path)

loading configuration file save/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.30.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file save/tf_model.h5
Some layers from the model checkpoint at save were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on 

In [30]:
sentence1 = ["Anorld Schwarzenegger is my idol."]
sentence2 = ["My favorite idol is Anorld Schwarzenegger."]

test_dataset = pd.DataFrame(dict(idx=list(range(len(sentence1))),
                                 label=[0]*len(sentence1),
                                 sentence1=sentence1,
                                 sentence2=sentence2))

In [31]:
"""
模仿GLUE的輸入格式: (idx, label, sentence1, sentence2)
其中label是假的，是因為輸入需要，不會影響預測值
"""
test_dataset

Unnamed: 0,idx,label,sentence1,sentence2
0,0,0,Anorld Schwarzenegger is my idol.,My favorite idol is Anorld Schwarzenegger.


In [32]:
test_gen = tf.data.Dataset.from_tensor_slices(dict(test_dataset))

In [33]:
test_gen = glue_convert_examples_to_features(test_gen, tokenizer, max_length, task)

Using label list ['0', '1'] for task mrpc
Using output mode classification for task mrpc
*** Example ***
guid: 0
features: InputFeatures(input_ids=[101, 2019, 2953, 6392, 29058, 8625, 13327, 2003, 2026, 10282, 1012, 102, 2026, 5440, 10282, 2003, 2019, 2953, 6392, 29058, 8625, 13327, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], attention_mask=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

In [34]:
test_gen = test_gen.batch(1)

In [35]:
next(iter(test_gen))

({'input_ids': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
  array([[  101,  2019,  2953,  6392, 29058,  8625, 13327,  2003,  2026,
          10282,  1012,   102,  2026,  5440, 10282,  2003,  2019,  2953,
           6392, 29058,  8625, 13327,  1012,   102,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,   

In [36]:
pred = new_model.predict(test_gen)

In [39]:
pred

TFSequenceClassifierOutput(loss=None, logits=array([[-2.3520176,  2.4877014]], dtype=float32), hidden_states=None, attentions=None)

In [40]:
pred_ids = np.argmax(pred.logits, axis=-1)

In [41]:
print(pred_ids[0])

1
