# Fine-Tuning Donut for Invoice Parsing

[Base Model Repository](https://github.com/clovaai/donut)

[Paper](https://arxiv.org/abs/2111.15664)

[Hugging Face Model Card](https://huggingface.co/naver-clova-ix/donut-base)

## Environment Setup

I had trouble getting the implementation from the hugging face transformers library working, which propably had to do with different dependecies and CUDA versions. So I created this notebook from the source code which the Donut team itself provided in their [GitHub repository](https://github.com/clovaai/donut). I've tried to use the library versions they had tested with:

> We tested [donut-python](https://pypi.org/project/donut-python/1.0.1) == 1.0.1 with:
> - [torch](https://github.com/pytorch/pytorch) == 1.11.0+cu113 
> - [torchvision](https://github.com/pytorch/vision) == 0.12.0+cu113
> - [pytorch-lightning](https://github.com/Lightning-AI/lightning) == 1.6.4
> - [transformers](https://github.com/huggingface/transformers) == 4.11.3
> - [timm](https://github.com/rwightman/pytorch-image-models) == 0.5.4

To get these versions installed, I had to use Python 3.9. However, I had to go with the newest torch version since I ran it on a newer GPU, but it still worked. The critical part seems to be the pytorch-lightning package.

In [1]:
# pip install donut-python datasets Pillow tqdm nltk torch wandb

First we'll set some variables to work with Donut locally. If push_to_hub is set True, the fine-tuned model will be pushed to you hf_repo. I've used Weight and Biases (wandb) for logging, so I set the run name here as well.

I was experimenting with the learn rate so this can be set right here on top.

In [2]:
dataset_id = "katanaml-org/invoices-donut-data-v1"
ds_no = "ds1"
run_ds_restructure = False

initial_lr = 1e-5

wandb_run_name = f"{ds_no}_donut_{initial_lr}"

output_dataset_name = f"formatted_{ds_no}"
save_model_path = f"donut_invoice_{ds_no}"

## Hyperparameters

This config sets the rest of the hyperparameters.

In [3]:
import datetime
import json
import types

# Generate the timestamp version string
version_no = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

# Define the configuration as a Python dictionary
config = {
    "resume_from_checkpoint_path": None,
    "result_path": "donut_invoice_model",
    "pretrained_model_name_or_path": "naver-clova-ix/donut-base",
    "dataset_name_or_paths": [dataset_id],
    "sort_json_key": False,
    "train_batch_sizes": [1],
    "val_batch_sizes": [1],
    "input_size": [1600, 1280],
    "max_length": 1024,
    "align_long_axis": False,
    "num_nodes": 1,
    "seed": 2022,
    "lr": initial_lr,
    "warmup_steps": 50,
    "num_training_samples_per_epoch": 425,
    "max_epochs": 30,
    "max_steps": 100000,
    "num_workers": 8,
    "val_check_interval": 0.5,
    "check_val_every_n_epoch": 1,
    "gradient_clip_val": 1.0,
    "verbose": True,
    "task_start_tokens": "<invoice>",
    "task_end_tokens": "</invoice>",
    "task_name": "invoice",
    "exp_name": f"invoice_{ds_no}",
    "exp_version": version_no,
    "accumulate_grad_batches": 4,
}

config_obj = types.SimpleNamespace(**config)


print(json.dumps(config, indent=4))
print(f"Accessing seed via attribute: {config_obj.seed}")
print(f"Accessing lr via attribute: {config_obj.lr}")

{
    "resume_from_checkpoint_path": null,
    "result_path": "donut_invoice_model",
    "pretrained_model_name_or_path": "naver-clova-ix/donut-base",
    "dataset_name_or_paths": [
        "katanaml-org/invoices-donut-data-v1"
    ],
    "sort_json_key": false,
    "train_batch_sizes": [
        1
    ],
    "val_batch_sizes": [
        1
    ],
    "input_size": [
        1600,
        1280
    ],
    "max_length": 1024,
    "align_long_axis": false,
    "num_nodes": 1,
    "seed": 2022,
    "lr": 1e-05,
    "warmup_steps": 50,
    "num_training_samples_per_epoch": 425,
    "max_epochs": 30,
    "max_steps": 100000,
    "num_workers": 8,
    "val_check_interval": 0.5,
    "check_val_every_n_epoch": 1,
    "gradient_clip_val": 1.0,
    "verbose": true,
    "task_start_tokens": "<invoice>",
    "task_end_tokens": "</invoice>",
    "task_name": "invoice",
    "exp_name": "invoice_ds1",
    "exp_version": "20250429_220217",
    "accumulate_grad_batches": 4
}
Accessing seed via attribute:

## Training Modules

Next, the Pytorch-Lightning module needs to be set up for training. I did some slight changes to the original code, to implement a working learning rate scheduler, since I was getting errors with the original one.

In [4]:
"""
Donut
Copyright (c) 2022-present NAVER Corp.
MIT License
"""

# Implemented some changes to the original for the lr_scheduler

import math
import random
import re
from pathlib import Path

import numpy as np
import pytorch_lightning as pl
import torch
from nltk import edit_distance
from pytorch_lightning.utilities import rank_zero_only
from timm.data.constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
from torch.nn.utils.rnn import pad_sequence
from torch.optim.lr_scheduler import OneCycleLR
from torch.utils.data import DataLoader

from donut import DonutConfig, DonutModel


class DonutModelPLModule(pl.LightningModule):
    def __init__(self, config):
        super().__init__()
        self.config = config

        if self.config.pretrained_model_name_or_path:
            self.model = DonutModel.from_pretrained(
                self.config.pretrained_model_name_or_path,
                input_size=self.config.input_size,
                max_length=self.config.max_length,
                align_long_axis=self.config.align_long_axis,
                ignore_mismatched_sizes=True,
            )
        else:
            self.model = DonutModel(
                config=DonutConfig(
                    input_size=self.config.input_size,
                    max_length=self.config.max_length,
                    align_long_axis=self.config.align_long_axis,
                    # with DonutConfig, the architecture customization is available, e.g.,
                    # encoder_layer=[2,2,14,2], decoder_layer=4, ...
                )
            )
        self.pytorch_lightning_version_is_1 = int(pl.__version__[0]) < 2
        self.num_of_loaders = len(self.config.dataset_name_or_paths)

    def training_step(self, batch, batch_idx):
        image_tensors, decoder_input_ids, decoder_labels = list(), list(), list()
        for batch_data in batch:
            image_tensors.append(batch_data[0])
            decoder_input_ids.append(batch_data[1][:, :-1])
            decoder_labels.append(batch_data[2][:, 1:])
        image_tensors = torch.cat(image_tensors)
        decoder_input_ids = torch.cat(decoder_input_ids)
        decoder_labels = torch.cat(decoder_labels)
        loss = self.model(image_tensors, decoder_input_ids, decoder_labels)[0]
        self.log_dict({"train_loss": loss}, sync_dist=True)
        if not self.pytorch_lightning_version_is_1:
            self.log('loss', loss, prog_bar=True)
        return loss

    def on_validation_epoch_start(self) -> None:
        super().on_validation_epoch_start()
        self.validation_step_outputs = [[] for _ in range(self.num_of_loaders)]
        return

    def validation_step(self, batch, batch_idx, dataloader_idx=0):
        image_tensors, decoder_input_ids, prompt_end_idxs, answers = batch
        decoder_prompts = pad_sequence(
            [input_id[: end_idx + 1] for input_id, end_idx in zip(decoder_input_ids, prompt_end_idxs)],
            batch_first=True,
        )

        preds = self.model.inference(
            image_tensors=image_tensors,
            prompt_tensors=decoder_prompts,
            return_json=False,
            return_attentions=False,
        )["predictions"]

        scores = list()
        for pred, answer in zip(preds, answers):
            pred = re.sub(r"(?:(?<=>) | (?=</s_))", "", pred)
            answer = re.sub(r"<.*?>", "", answer, count=1)
            answer = answer.replace(self.model.decoder.tokenizer.eos_token, "")
            scores.append(edit_distance(pred, answer) / max(len(pred), len(answer)))

            if self.config.verbose and len(scores) == 1:
                self.print(f"Prediction: {pred}")
                self.print(f"    Answer: {answer}")
                self.print(f" Normed ED: {scores[0]}")

        self.validation_step_outputs[dataloader_idx].append(scores)

        return scores

    def on_validation_epoch_end(self):
        assert len(self.validation_step_outputs) == self.num_of_loaders
        cnt = [0] * self.num_of_loaders
        total_metric = [0] * self.num_of_loaders
        val_metric = [0] * self.num_of_loaders
        for i, results in enumerate(self.validation_step_outputs):
            for scores in results:
                cnt[i] += len(scores)
                total_metric[i] += np.sum(scores)
            val_metric[i] = total_metric[i] / cnt[i]
            val_metric_name = f"val_metric_{i}th_dataset"
            self.log_dict({val_metric_name: val_metric[i]}, sync_dist=True)
        self.log_dict({"val_metric": np.sum(total_metric) / np.sum(cnt)}, sync_dist=True)

    def configure_optimizers(self):
            max_iter = None
            if int(self.config.max_epochs) > 0:
                assert len(self.config.train_batch_sizes) == 1, "Set max_epochs only if the number of datasets is 1"
                # Correct calculation using num_nodes and potential grad accumulation
                accumulate_grad_batches = getattr(self.config, "accumulate_grad_batches", 1)
                effective_batch_size = self.config.train_batch_sizes[0] * self.config.num_nodes * accumulate_grad_batches
                steps_per_epoch = math.ceil(self.config.num_training_samples_per_epoch / effective_batch_size)
                max_iter = steps_per_epoch * self.config.max_epochs

            if hasattr(self.config, 'max_steps') and int(self.config.max_steps) > 0:
                max_iter = min(self.config.max_steps, max_iter) if max_iter is not None else self.config.max_steps

            if max_iter is None or max_iter <= 0:
                raise ValueError(f"Could not determine max_iter={max_iter}. Check config.")
            max_iter = int(max_iter)

            optimizer = torch.optim.AdamW(self.parameters(), lr=self.config.lr)

            warmup_steps = getattr(self.config, 'warmup_steps', 0)
            warmup_fraction = warmup_steps / max_iter if max_iter > 0 else 0.1
            warmup_fraction = min(max(0.01, warmup_fraction), 0.5)

            one_cycle_scheduler = OneCycleLR(
                optimizer,
                max_lr=self.config.lr,
                total_steps=max_iter,
                pct_start=warmup_fraction,
                anneal_strategy='cos'
            )

            scheduler_config = {
                "scheduler": one_cycle_scheduler,
                "name": "learning_rate",
                "interval": "step",
                "frequency": 1,
            }
            return [optimizer], [scheduler_config]

    def lr_scheduler_step(self, scheduler, optimizer_idx, metric):
        # Handles manual stepping for the scheduler to bypass PL 1.x API validation issue.
        scheduler.step()

    @rank_zero_only
    def on_save_checkpoint(self, checkpoint):
        save_path = Path(self.config.result_path) / self.config.exp_name / self.config.exp_version
        self.model.save_pretrained(save_path)
        self.model.decoder.tokenizer.save_pretrained(save_path)


class DonutDataPLModule(pl.LightningDataModule):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.train_batch_sizes = self.config.train_batch_sizes
        self.val_batch_sizes = self.config.val_batch_sizes
        self.train_datasets = []
        self.val_datasets = []
        self.g = torch.Generator()
        self.g.manual_seed(self.config.seed)

    def train_dataloader(self):
        loaders = list()
        for train_dataset, batch_size in zip(self.train_datasets, self.train_batch_sizes):
            loaders.append(
                DataLoader(
                    train_dataset,
                    batch_size=batch_size,
                    num_workers=self.config.num_workers,
                    pin_memory=True,
                    worker_init_fn=self.seed_worker,
                    generator=self.g,
                    shuffle=True,
                )
            )
        return loaders

    def val_dataloader(self):
        loaders = list()
        for val_dataset, batch_size in zip(self.val_datasets, self.val_batch_sizes):
            loaders.append(
                DataLoader(
                    val_dataset,
                    batch_size=batch_size,
                    pin_memory=True,
                    shuffle=False,
                )
            )
        return loaders

    @staticmethod
    def seed_worker(wordker_id):
        worker_seed = torch.initial_seed() % 2 ** 32
        np.random.seed(worker_seed)
        random.seed(worker_seed)


Now we can set up the trainer. I've made some changes to make it work in a notebook environment, added Weights and Biases (wandb) logging and I've added a custom hook for saving which could also be used to push the model to the hub as well as an early stopper hook which interrupts training when the validation metrics flat out.

In [5]:
"""
Donut
Copyright (c) 2022-present NAVER Corp.
MIT License
"""

# Changes made to original Donut source code: Running directly from notebook, changed trainer strategy; wandb logging; added hook for saving and early stopping

import os
from os.path import basename
from pathlib import Path

import numpy as np
import pytorch_lightning as pl
import torch
from pytorch_lightning.callbacks import LearningRateMonitor, ModelCheckpoint, EarlyStopping, Callback
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.plugins import CheckpointIO
from pytorch_lightning.utilities import rank_zero_only

from donut import DonutDataset
import wandb


class CustomCheckpointIO(CheckpointIO):
    def save_checkpoint(self, checkpoint, path, storage_options=None):
        del checkpoint["state_dict"]
        torch.save(checkpoint, path)

    def load_checkpoint(self, path, storage_options=None):
        checkpoint = torch.load(path + "artifacts.ckpt")
        state_dict = torch.load(path + "pytorch_model.bin")
        checkpoint["state_dict"] = {"model." + key: value for key, value in state_dict.items()}
        return checkpoint

    def remove_checkpoint(self, path) -> None:
        return super().remove_checkpoint(path)


@rank_zero_only
def save_config_file(config, path):
    if not Path(path).exists():
        os.makedirs(path)
    save_path = Path(path) / "config.yaml"
    print(config.dumps())
    with open(save_path, "w") as f:
        f.write(config.dumps(modified_color=None, quote_str=True))
        print(f"Config is saved at {save_path}")


class ProgressBar(pl.callbacks.TQDMProgressBar):
    def __init__(self, config):
        super().__init__()
        self.enable = True
        self.config = config

    def disable(self):
        self.enable = False

    def get_metrics(self, trainer, model):
        items = super().get_metrics(trainer, model)
        items.pop("v_num", None)
        items["exp_name"] = f"{self.config.exp_name}"
        items["exp_version"] = f"{self.config.exp_version}"
        return items


def set_seed(seed):
    pytorch_lightning_version = int(pl.__version__[0])
    if pytorch_lightning_version < 2:
        pl.utilities.seed.seed_everything(seed, workers=True)
    else:
        import lightning_fabric
        lightning_fabric.utilities.seed.seed_everything(seed, workers=True)


def train(config):
    set_seed(config.seed)

    model_module = DonutModelPLModule(config)
    data_module = DonutDataPLModule(config)

    # add datasets to data_module
    datasets = {"train": [], "validation": []}
    for i, dataset_name_or_path in enumerate(config.dataset_name_or_paths):
        task_name = os.path.basename(dataset_name_or_path)  # e.g., cord-v2, docvqa, rvlcdip, ...
        
        # add categorical special tokens (optional)
        if task_name == "rvlcdip":
            model_module.model.decoder.add_special_tokens([
                "<advertisement/>", "<budget/>", "<email/>", "<file_folder/>", 
                "<form/>", "<handwritten/>", "<invoice/>", "<letter/>", 
                "<memo/>", "<news_article/>", "<presentation/>", "<questionnaire/>", 
                "<resume/>", "<scientific_publication/>", "<scientific_report/>", "<specification/>"
            ])
        if task_name == "docvqa":
            model_module.model.decoder.add_special_tokens(["<yes/>", "<no/>"])
            
        for split in ["train", "validation"]:
            datasets[split].append(
                DonutDataset(
                    dataset_name_or_path=dataset_name_or_path,
                    donut_model=model_module.model,
                    max_length=config.max_length,
                    split=split,
                    task_start_token=config.task_start_tokens[i]
                    if config.task_start_tokens
                    else f"<s_{task_name}>",
                    prompt_end_token=f"<s_{task_name}>",
                    sort_json_key=config.sort_json_key,
                )
            )
            # prompt_end_token is used for ignoring a given prompt in a loss function
            # for docvqa task, i.e., {"question": {used as a prompt}, "answer": {prediction target}},
            # set prompt_end_token to "<s_answer>"
    data_module.train_datasets = datasets["train"]
    data_module.val_datasets = datasets["validation"]

    wandb.finish() # in case there is still a previous run active

    logger = WandbLogger(project="Donut", name=wandb_run_name)

    lr_callback = LearningRateMonitor(logging_interval="step")

    checkpoint_callback = ModelCheckpoint(
        monitor="val_metric",
        dirpath=Path(config.result_path) / config.exp_name / config.exp_version,
        filename="artifacts",
        save_top_k=1,
        save_last=False,
        mode="min",
    )

    bar = ProgressBar(config)

    class SavingCallback(Callback):
        def on_train_end(self, trainer, pl_module):
            pl_module.model.save_pretrained(save_model_path)

    early_stop_callback = EarlyStopping(monitor="val_metric", patience=3, verbose=False, mode="min")

    custom_ckpt = CustomCheckpointIO()
    trainer = pl.Trainer(
        num_nodes=config.num_nodes,
        devices=[0],
        # strategy="dpp", # removed this for use inside notebook
        accelerator="gpu",
        plugins=custom_ckpt,
        max_epochs=config.max_epochs,
        max_steps=config.max_steps,
        val_check_interval=config.val_check_interval,
        check_val_every_n_epoch=config.check_val_every_n_epoch,
        gradient_clip_val=config.gradient_clip_val,
        precision=16,
        num_sanity_val_steps=0,
        logger=logger,
        callbacks=[lr_callback, checkpoint_callback, bar, early_stop_callback, SavingCallback()],
    )

    trainer.fit(model_module, data_module, ckpt_path=config.resume_from_checkpoint_path)


## Training

Now a final test, to see if Pytorch-Lightning is working alright and we're actually running on GPU.

In [6]:
print("--- Testing Minimal Trainer ---")
try:
    # Use the most explicit configuration
    minimal_trainer = pl.Trainer(accelerator="gpu", devices=[0], max_epochs=1, precision=16)
    print("Minimal Trainer initialized successfully.")
    # Clean up memory if needed
    del minimal_trainer
    torch.cuda.empty_cache()
except Exception as e:
    print(f"!!! Error initializing minimal Trainer: {e}")
    print("--- End Minimal Trainer Test ---")
    # You might want to stop here if the minimal trainer fails
    raise e # Re-raise the exception to see the traceback
print("--- End Minimal Trainer Test ---")

print(f"PyTorch version: {torch.__version__}")
print(f"PyTorch-Lightning version: {pl.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    print(f"Current CUDA device: {torch.cuda.current_device()}")
    print(f"Device name: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA is *not* available to PyTorch.")



Using 16bit native Automatic Mixed Precision (AMP)
  scaler = torch.cuda.amp.GradScaler()
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


--- Testing Minimal Trainer ---
Minimal Trainer initialized successfully.
--- End Minimal Trainer Test ---
PyTorch version: 2.7.0+cu128
PyTorch-Lightning version: 1.6.4
CUDA available: True
Number of GPUs: 1
Current CUDA device: 0
Device name: NVIDIA GeForce RTX 5070 Ti


And now we can start the actual training.

In [7]:
train(config_obj)

Global seed set to 2022
  from torch.distributed._sharded_tensor import pre_load_state_dict_hook, state_dict_hook
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of DonutModel were not initialized from the model checkpoint at naver-clova-ix/donut-base and are newly initialized because the shapes did not match:
- encoder.model.layers.0.blocks.1.attn_mask: found shape torch.Size([3072, 100, 100]) in the checkpoint and torch.Size([1280, 100, 100]) in the model instantiated
- encoder.model.layers.1.blocks.1.attn_mask: found shape torch.Size([768, 100, 100]) in the checkpoint and torch.Size([320, 100, 100]) in the model instantiated
- encoder.model.layers.2.blocks.1.attn_mask: found shape torch.Size([192, 100, 100]) in the checkpoint and torch.Size([80, 100, 100]) in the model instantiated
- encoder.model.layers.2.blocks.3.attn_mask: found shape torch.Size([192, 100, 100]) in the checkpoint and torch.Size([80, 100, 100]) in the model instantiated
- encode

Using 16bit native Automatic Mixed Precision (AMP)
  scaler = torch.cuda.amp.GradScaler()
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type       | Params
-------------------------------------
0 | model | DonutModel | 201 M 
-------------------------------------
201 M     Trainable params
0         Non-trainable params
201 M     Total params
402.744   Total estimated model params size (MB)
  rank_zero_warn(


Training: 0it [00:00, ?it/s]



Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth 

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth Gol-102 Socce

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRDI" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth 

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth 

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth 

Validation: 0it [00:00, ?it/s]

Prediction: <s_invoice_no>65321852</s_invoice_no><s_invoice_date>04/11/2021</s_invoice_date><s_seller>Kaufman, Cooper and Young 33451 Johnson Lake New Ann, NE 54138</s_seller><s_client>Wells-Carlson 148 Carroll Village Suite 393 South Allisonstad, TX 72090</s_client><s_seller_tax_id>930-79-7845</s_seller_tax_id><s_client_tax_id>957-82-7504</s_client_tax_id><s_iban>GB85XPMM58300597200061</s_iban></s_header><s_items><s_item_desc>New KID CUDI "WZRD" Rap Hip Hop Soul Music Men's Black T-Shirt Size S to 3XL</s_item_desc><s_item_qty>2,00</s_item_qty><s_item_net_price>22,49</s_item_net_price><s_item_net_worth>44,98</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>49,48</s_item_gross_worth><sep/><s_item_desc>boys bogs waterproof boots youth size 3</s_item_desc><s_item_qty>1,00</s_item_qty><s_item_net_price>9,80</s_item_net_price><s_item_net_worth>9,80</s_item_net_worth><s_item_vat>10%</s_item_vat><s_item_gross_worth>10,78</s_item_gross_worth><sep/><s_item_desc>Joma Boys Youth 