<a href="https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/CORD/Fine_tuning_LayoutLMv2ForTokenClassification_on_CORD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we are going to fine-tune `LayoutLMv2ForTokenClassification` on the [CORD](https://github.com/clovaai/cord) dataset. The goal for the model is to label words appearing in scanned documents (namely, receipts) appropriately. This task is treated as a NER problem (sequence labeling). However, compared to BERT, LayoutLMv2 also incorporates visual and layout information about the tokens when encoding them into vectors. This makes the LayoutLMv2 model very powerful for document understanding tasks.

LayoutLMv2 is itself an upgrade of LayoutLM. The main novelty of LayoutLMv2 is that it also pre-trains visual embeddings, whereas the original LayoutLM only adds visual embeddings during fine-tuning.

* Paper: https://arxiv.org/abs/2012.14740
* Original repo: https://github.com/microsoft/unilm/tree/master/layoutlmv2

NOTES: 

* you first need to prepare the CORD dataset for LayoutLMv2. For that, check out the notebook "Prepare CORD for LayoutLMv2".
* this notebook is heavily inspired by [this Github repository](https://github.com/omarsou/layoutlm_CORD), which fine-tunes both BERT and LayoutLM (v1) on the CORD dataset.



## Install dependencies

First, we install the required libraries:
* Transformers (for the LayoutLMv2 model)
* Datasets (for data preprocessing)
* Seqeval (for metrics)
* Detectron2 (which LayoutLMv2 requires for its visual backbone).



In [1]:
!rm -r transformers
!git clone -b modeling_layoutlmv2_v2 https://github.com/NielsRogge/transformers.git
!cd tranformers
!pip install -q ./transformers 

Cloning into 'transformers'...
remote: Enumerating objects: 83407, done.[K
remote: Counting objects: 100% (1845/1845), done.[K
remote: Compressing objects: 100% (529/529), done.[K
remote: Total 83407 (delta 1195), reused 1697 (delta 1099), pack-reused 81562[K
Receiving objects: 100% (83407/83407), 63.59 MiB | 27.13 MiB/s, done.
Resolving deltas: 100% (59481/59481), done.
/bin/bash: line 0: cd: tranformers: No such file or directory
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?2

In [2]:
!pip install -q datasets seqeval

In [3]:
!pip install pyyaml==5.1
# workaround: install old version of pytorch since detectron2 hasn't released packages for pytorch 1.9 (issue: https://github.com/facebookresearch/detectron2/issues/3158)
!pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install detectron2 that matches pytorch 1.8
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
!pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
# exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

Looking in links: https://download.pytorch.org/whl/torch_stable.html


## Prepare the data

First, let's read in the annotations which we prepared in the other notebook. These contain the word-level annotations (words, labels, normalized bounding boxes).

In [4]:
import pandas as pd

train = pd.read_pickle('/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD_layoutlmv2_format/train.pkl')
val = pd.read_pickle('/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD_layoutlmv2_format/dev.pkl')
test = pd.read_pickle('/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD_layoutlmv2_format/test.pkl')

Let's define a list of all unique labels. For that, let's first count the number of occurrences for each label:

In [5]:
from collections import Counter

all_labels = [item for sublist in train[1] for item in sublist] + [item for sublist in val[1] for item in sublist] + [item for sublist in test[1] for item in sublist]
Counter(all_labels)

Counter({'menu.cnt': 2429,
         'menu.discountprice': 403,
         'menu.etc': 19,
         'menu.itemsubtotal': 7,
         'menu.nm': 6597,
         'menu.num': 109,
         'menu.price': 2585,
         'menu.sub_cnt': 189,
         'menu.sub_etc': 9,
         'menu.sub_nm': 822,
         'menu.sub_price': 160,
         'menu.sub_unitprice': 14,
         'menu.unitprice': 750,
         'menu.vatyn': 9,
         'sub_total.discount_price': 191,
         'sub_total.etc': 283,
         'sub_total.othersvc_price': 6,
         'sub_total.service_price': 353,
         'sub_total.subtotal_price': 1482,
         'sub_total.tax_price': 1283,
         'total.cashprice': 1393,
         'total.changeprice': 1297,
         'total.creditcardprice': 410,
         'total.emoneyprice': 129,
         'total.menuqty_cnt': 630,
         'total.menutype_cnt': 130,
         'total.total_etc': 89,
         'total.total_price': 2120,
         'void_menu.nm': 3,
         'void_menu.price': 1})

As we can see, there are some labels that contain very few examples. Let's replace them by the "neutral" label "O" (which stands for "Outside").

In [6]:
replacing_labels = {'menu.etc': 'O', 'mneu.itemsubtotal': 'O', 'menu.sub_etc': 'O', 'menu.sub_unitprice': 'O', 'menu.vatyn': 'O',
                  'void_menu.nm': 'O', 'void_menu.price': 'O', 'sub_total.othersvc_price': 'O'}

In [7]:
def replace_elem(elem):
  try:
    return replacing_labels[elem]
  except KeyError:
    return elem
def replace_list(ls):
  return [replace_elem(elem) for elem in ls]
train[1] = [replace_list(ls) for ls in train[1]]
val[1] = [replace_list(ls) for ls in val[1]]
test[1] = [replace_list(ls) for ls in test[1]]

In [8]:
all_labels = [item for sublist in train[1] for item in sublist] + [item for sublist in val[1] for item in sublist] + [item for sublist in test[1] for item in sublist]
Counter(all_labels)

Counter({'O': 61,
         'menu.cnt': 2429,
         'menu.discountprice': 403,
         'menu.itemsubtotal': 7,
         'menu.nm': 6597,
         'menu.num': 109,
         'menu.price': 2585,
         'menu.sub_cnt': 189,
         'menu.sub_nm': 822,
         'menu.sub_price': 160,
         'menu.unitprice': 750,
         'sub_total.discount_price': 191,
         'sub_total.etc': 283,
         'sub_total.service_price': 353,
         'sub_total.subtotal_price': 1482,
         'sub_total.tax_price': 1283,
         'total.cashprice': 1393,
         'total.changeprice': 1297,
         'total.creditcardprice': 410,
         'total.emoneyprice': 129,
         'total.menuqty_cnt': 630,
         'total.menutype_cnt': 130,
         'total.total_etc': 89,
         'total.total_price': 2120})

Now we have to save all the unique labels in a list.

In [9]:
labels = list(set(all_labels))
print(labels)

['total.creditcardprice', 'menu.num', 'total.total_etc', 'menu.cnt', 'menu.sub_cnt', 'total.menutype_cnt', 'total.menuqty_cnt', 'menu.discountprice', 'menu.sub_nm', 'total.changeprice', 'menu.sub_price', 'sub_total.service_price', 'menu.itemsubtotal', 'menu.unitprice', 'sub_total.subtotal_price', 'O', 'sub_total.etc', 'sub_total.tax_price', 'sub_total.discount_price', 'menu.nm', 'total.emoneyprice', 'menu.price', 'total.cashprice', 'total.total_price']


In [10]:
label2id = {label: idx for idx, label in enumerate(labels)}
id2label = {idx: label for idx, label in enumerate(labels)}
print(label2id)
print(id2label)

{'total.creditcardprice': 0, 'menu.num': 1, 'total.total_etc': 2, 'menu.cnt': 3, 'menu.sub_cnt': 4, 'total.menutype_cnt': 5, 'total.menuqty_cnt': 6, 'menu.discountprice': 7, 'menu.sub_nm': 8, 'total.changeprice': 9, 'menu.sub_price': 10, 'sub_total.service_price': 11, 'menu.itemsubtotal': 12, 'menu.unitprice': 13, 'sub_total.subtotal_price': 14, 'O': 15, 'sub_total.etc': 16, 'sub_total.tax_price': 17, 'sub_total.discount_price': 18, 'menu.nm': 19, 'total.emoneyprice': 20, 'menu.price': 21, 'total.cashprice': 22, 'total.total_price': 23}
{0: 'total.creditcardprice', 1: 'menu.num', 2: 'total.total_etc', 3: 'menu.cnt', 4: 'menu.sub_cnt', 5: 'total.menutype_cnt', 6: 'total.menuqty_cnt', 7: 'menu.discountprice', 8: 'menu.sub_nm', 9: 'total.changeprice', 10: 'menu.sub_price', 11: 'sub_total.service_price', 12: 'menu.itemsubtotal', 13: 'menu.unitprice', 14: 'sub_total.subtotal_price', 15: 'O', 16: 'sub_total.etc', 17: 'sub_total.tax_price', 18: 'sub_total.discount_price', 19: 'menu.nm', 20: '

In [11]:
from os import listdir
from torch.utils.data import Dataset
import torch
from PIL import Image

class CORDDataset(Dataset):
    """CORD dataset."""

    def __init__(self, annotations, image_dir, processor=None, max_length=512):
        """
        Args:
            annotations (List[List]): List of lists containing the word-level annotations (words, labels, boxes).
            image_dir (string): Directory with all the document images.
            processor (LayoutLMv2Processor): Processor to prepare the text + image.
        """
        self.words, self.labels, self.boxes = annotations
        self.image_dir = image_dir
        self.image_file_names = [f for f in listdir(image_dir)]
        self.processor = processor

    def __len__(self):
        return len(self.image_file_names)

    def __getitem__(self, idx):
        # first, take an image
        item = self.image_file_names[idx]
        image = Image.open(self.image_dir + item).convert("RGB")

        # get word-level annotations 
        words = self.words[idx]
        boxes = self.boxes[idx]
        word_labels = self.labels[idx]

        assert len(words) == len(boxes) == len(word_labels)
        
        word_labels = [label2id[label] for label in word_labels]
        # use processor to prepare everything
        encoded_inputs = self.processor(image, words, boxes=boxes, word_labels=word_labels, 
                                        padding="max_length", truncation=True, 
                                        return_tensors="pt")
        
        # remove batch dimension
        for k,v in encoded_inputs.items():
          encoded_inputs[k] = v.squeeze()

        assert encoded_inputs.input_ids.shape == torch.Size([512])
        assert encoded_inputs.attention_mask.shape == torch.Size([512])
        assert encoded_inputs.token_type_ids.shape == torch.Size([512])
        assert encoded_inputs.bbox.shape == torch.Size([512, 4])
        assert encoded_inputs.image.shape == torch.Size([3, 224, 224])
        assert encoded_inputs.labels.shape == torch.Size([512]) 
      
        return encoded_inputs

In [12]:
from transformers import LayoutLMv2Processor

processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")

train_dataset = CORDDataset(annotations=train,
                            image_dir='/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD/train/image/', 
                            processor=processor)
val_dataset = CORDDataset(annotations=val,
                            image_dir='/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD/dev/image/', 
                            processor=processor)
test_dataset = CORDDataset(annotations=test,
                            image_dir='/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/CORD/test/image/', 
                            processor=processor)

Let's verify an example:

In [13]:
encoding = train_dataset[0]
encoding.keys()

dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'labels', 'image'])

In [14]:
for k,v in encoding.items():
  print(k, v.shape)

input_ids torch.Size([512])
token_type_ids torch.Size([512])
attention_mask torch.Size([512])
bbox torch.Size([512, 4])
labels torch.Size([512])
image torch.Size([3, 224, 224])


In [15]:
print(processor.tokenizer.decode(encoding['input_ids']))

[CLS] tebu lemon 1 22. 000 total 22. 000 cash 22. 000 1 [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [P

In [16]:
train[0][0]

['Tebu', 'Lemon', '1', '22.000', 'Total', '22.000', 'CASH', '22.000', '1']

In [17]:
train[1][0]

['menu.nm',
 'menu.nm',
 'menu.cnt',
 'menu.price',
 'total.total_price',
 'total.total_price',
 'total.cashprice',
 'total.cashprice',
 'total.menuqty_cnt']

In [18]:
[id2label[label] for label in encoding['labels'].tolist() if label != -100]

['menu.nm',
 'menu.nm',
 'menu.cnt',
 'menu.price',
 'total.total_price',
 'total.total_price',
 'total.cashprice',
 'total.cashprice',
 'total.menuqty_cnt']

In [19]:
for id, label in zip(encoding['input_ids'][:30], encoding['labels'][:30]):
  print(processor.tokenizer.decode([id]), label.item())

[CLS] -100
te 19
##bu -100
lemon 19
1 3
22 21
. -100
000 -100
total 23
22 23
. -100
000 -100
cash 22
22 22
. -100
000 -100
1 6
[SEP] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100
[PAD] -100


Next, we create corresponding dataloaders.

In [20]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(train_dataset, batch_size=2, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=2, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=2)

## Train the model

Let's train the model using native PyTorch. We use the AdamW optimizer with learning rate = 5e-5 (this is a good default value when fine-tuning Transformer-based models).



In [21]:
from transformers import LayoutLMv2ForTokenClassification, AdamW
import torch
from tqdm.notebook import tqdm

model = LayoutLMv2ForTokenClassification.from_pretrained('microsoft/layoutlmv2-base-uncased',
                                                                      num_labels=len(labels))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
optimizer = AdamW(model.parameters(), lr=5e-5)

global_step = 0
num_train_epochs = 4

#put the model in training mode
model.train() 
for epoch in range(num_train_epochs):  
   print("Epoch:", epoch)
   for batch in tqdm(train_dataloader):
        # get the inputs;
        input_ids = batch['input_ids'].to(device)
        bbox = batch['bbox'].to(device)
        image = batch['image'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        token_type_ids = batch['token_type_ids'].to(device)
        labels = batch['labels'].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = model(input_ids=input_ids,
                        bbox=bbox,
                        image=image,
                        attention_mask=attention_mask,
                        token_type_ids=token_type_ids,
                        labels=labels) 
        loss = outputs.loss
        
        # print loss every 100 steps
        if global_step % 100 == 0:
          print(f"Loss after {global_step} steps: {loss.item()}")

        loss.backward()
        optimizer.step()
        global_step += 1

model.save_pretrained("/content/drive/MyDrive/LayoutLMv2/Tutorial notebooks/CORD/Checkpoints")

Some weights of the model checkpoint at microsoft/layoutlmv2-base-uncased were not used when initializing LayoutLMv2ForTokenClassification: ['layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.num_batches_trac

Epoch: 0


  0%|          | 0/400 [00:00<?, ?it/s]

Loss after 0 steps: 3.1703407764434814
Loss after 100 steps: 2.3077478408813477
Loss after 200 steps: 1.6867226362228394
Loss after 300 steps: 1.0950108766555786
Epoch: 1


  0%|          | 0/400 [00:00<?, ?it/s]

Loss after 400 steps: 2.3882484436035156
Loss after 500 steps: 0.8808697462081909
Loss after 600 steps: 0.9903069734573364
Loss after 700 steps: 0.5121206641197205
Epoch: 2


  0%|          | 0/400 [00:00<?, ?it/s]

Loss after 800 steps: 0.23476076126098633
Loss after 900 steps: 0.2449778914451599
Loss after 1000 steps: 0.3742366433143616
Loss after 1100 steps: 0.2236849069595337
Epoch: 3


  0%|          | 0/400 [00:00<?, ?it/s]

Loss after 1200 steps: 0.8207438588142395
Loss after 1300 steps: 0.31655505299568176
Loss after 1400 steps: 0.4179667830467224
Loss after 1500 steps: 0.04862891137599945


## Evaluation

Let's evaluate the model on the test set. First, let's do a sanity check on the first example of the test set.

In [22]:
encoding = test_dataset[0]
processor.tokenizer.decode(encoding['input_ids'])

"[CLS] rp. goblin's mace 1 25, 000 mozarella hot dog 2 38, 000 chili pepper croquette 1 14, 000 cheese croquette 1 14, 000 plastik amook 1 0 plastik putih take away 1 0 subtotal 91, 000 discount ( 0 ) total 91, 000 debit 91, 000 [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PA

In [23]:
ground_truth_labels = [id2label[label] for label in encoding['labels'].squeeze().tolist() if label != -100]
print(ground_truth_labels)

['total.total_price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'sub_total.subtotal_price', 'sub_total.subtotal_price', 'sub_total.discount_price', 'sub_total.discount_price', 'total.total_price', 'total.total_price', 'total.creditcardprice', 'total.creditcardprice']


In [24]:
for k,v in encoding.items():
  encoding[k] = v.unsqueeze(0).to(device)

model.eval()
# forward pass
outputs = model(input_ids=encoding['input_ids'], attention_mask=encoding['attention_mask'],
                token_type_ids=encoding['token_type_ids'], bbox=encoding['bbox'],
                image=encoding['image'])

In [25]:
prediction_indices = outputs.logits.argmax(-1).squeeze().tolist()
print(prediction_indices)

[19, 14, 14, 14, 19, 19, 19, 19, 3, 21, 21, 21, 19, 19, 19, 19, 19, 3, 21, 21, 21, 19, 19, 19, 19, 19, 3, 21, 21, 21, 19, 19, 19, 19, 3, 21, 21, 21, 19, 19, 19, 19, 19, 3, 21, 19, 19, 19, 19, 19, 19, 19, 3, 21, 14, 14, 14, 14, 14, 14, 18, 18, 18, 18, 23, 23, 23, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 19, 19, 19, 0, 0, 0, 19, 19, 19, 19, 19, 19, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 0, 0, 0, 0, 19, 0, 0, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 0, 0, 0, 0, 0

In [26]:
prediction_indices = outputs.logits.argmax(-1).squeeze().tolist()
predictions = [id2label[label] for gt, label in zip(encoding['labels'].squeeze().tolist(), prediction_indices) if gt != -100]
print(predictions)

['sub_total.subtotal_price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.nm', 'menu.cnt', 'menu.price', 'sub_total.subtotal_price', 'sub_total.subtotal_price', 'sub_total.discount_price', 'sub_total.discount_price', 'total.total_price', 'total.total_price', 'total.creditcardprice', 'total.creditcardprice']


In [27]:
import numpy as np

preds_val = None
out_label_ids = None

# put model in evaluation mode
model.eval()
for batch in tqdm(test_dataloader, desc="Evaluating"):
    with torch.no_grad():
        input_ids = batch['input_ids'].to(device)
        bbox = batch['bbox'].to(device)
        image = batch['image'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        token_type_ids = batch['token_type_ids'].to(device)
        labels = batch['labels'].to(device)

        # forward pass
        outputs = model(input_ids=input_ids, bbox=bbox, image=image, attention_mask=attention_mask, 
                        token_type_ids=token_type_ids, labels=labels)
        
        if preds_val is None:
          preds_val = outputs.logits.detach().cpu().numpy()
          out_label_ids = batch["labels"].detach().cpu().numpy()
        else:
          preds_val = np.append(preds_val, outputs.logits.detach().cpu().numpy(), axis=0)
          out_label_ids = np.append(
              out_label_ids, batch["labels"].detach().cpu().numpy(), axis=0
          )

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

In [28]:
import warnings
warnings.filterwarnings("ignore")
from seqeval.metrics import (
    classification_report,
    f1_score,
    precision_score,
    recall_score)

def results_test(preds, out_label_ids, labels):
  preds = np.argmax(preds, axis=2)

  label_map = {i: label for i, label in enumerate(labels)}

  out_label_list = [[] for _ in range(out_label_ids.shape[0])]
  preds_list = [[] for _ in range(out_label_ids.shape[0])]

  for i in range(out_label_ids.shape[0]):
      for j in range(out_label_ids.shape[1]):
          if out_label_ids[i, j] != -100:
              out_label_list[i].append(label_map[out_label_ids[i][j]])
              preds_list[i].append(label_map[preds[i][j]])

  results = {
      "precision": precision_score(out_label_list, preds_list),
      "recall": recall_score(out_label_list, preds_list),
      "f1": f1_score(out_label_list, preds_list),
  }
  return results, classification_report(out_label_list, preds_list)

In [29]:
labels = list(set(all_labels))
val_result, class_report = results_test(preds_val, out_label_ids, labels)
print("Overall results:", val_result)
print(class_report)

Overall results: {'precision': 0.917298937784522, 'recall': 0.9166034874905231, 'f1': 0.9169510807736064}
                         precision    recall  f1-score   support

                enu.cnt       1.00      0.97      0.98       224
      enu.discountprice       0.62      0.50      0.56        10
       enu.itemsubtotal       0.00      0.00      0.00         6
                 enu.nm       0.96      0.90      0.93       251
                enu.num       0.85      1.00      0.92        11
              enu.price       0.96      0.98      0.97       247
            enu.sub_cnt       0.76      0.94      0.84        17
             enu.sub_nm       0.52      0.91      0.66        32
          enu.sub_price       0.81      0.85      0.83        20
          enu.unitprice       0.96      0.96      0.96        68
         otal.cashprice       0.94      0.92      0.93        71
       otal.changeprice       0.93      0.97      0.95        59
   otal.creditcardprice       0.88      0.82    

The results I was getting were: 

`{'precision': 0.9307458143074582, 'recall': 0.9272175890826384, 'f1': 0.9289783516900872}`