# Simple Document Question Answering Demo

**Author:** Alan Meeson <alan@carefullycalculated.co.uk>

**Date:** 2023-05-06

A simple example notebook showing how to use a pre-trainined document question answering model to perform inference

In [None]:
import requests
import os
from IPython.display import Image
from transformers import pipeline

## Grab and display image

In [None]:
img_url = "https://pbs.twimg.com/media/CLLjDg0VAAA48k2?format=jpg&name=medium"
img_file = "../data/buzz-voucher-pg1.png"

if not os.path.exists(img_file):
    response = requests.get(img_url)
    if response.status_code == 200:
        with open(img_file, 'wb') as fp:
            fp.write(response.content)

In [None]:
display(Image(filename=img_file))

## Load and apply the model

### Load the model

This will download the model from Huggingface, if it is not already downloaded.

The model being used is [impira/layout-lm-document-qa](https://huggingface.co/impira/layoutlm-document-qa).

This is a fine-tuned version of the [LayoutLM](https://github.com/microsoft/unilm/tree/master/layoutlm) model, finetuned on [SQuAD2.0](https://huggingface.co/datasets/squad_v2) and [DocVQA](https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg)

References:
- [impira/layout-lm-document-qa](https://huggingface.co/impira/layoutlm-document-qa): MIT
- [LayoutLM](https://github.com/microsoft/unilm/tree/master/layoutlm): MIT
- [transformers](https://github.com/huggingface/transformers/blob/main/LICENSE): Apache 2.0
- [SQuAD2.0](https://huggingface.co/datasets/squad_v2): cc-by-sa-4.0
- [DocVQA](https://www.docvqa.org/datasets/docvqa)
- [IIT-CDIP Test Collection 1.0*](https://data.commerce.gov/complex-document-information-processing-cdip-dataset)

In [None]:
nlp = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

### Ask some questions

We can ask natural language questions and get somewhat sensible results.  

In [None]:
nlp(img_file, "What is the Voucher number?")

In [None]:
nlp(img_file, "What date was was travel authorized?")

In [None]:
nlp(img_file, "Where was travel from?")

In [None]:
nlp(img_file, "Where was travel to?")

Equally, we also get some which are quite confidently wrong.

In [None]:
nlp(img_file, "Who is the Payee?")

In [None]:
nlp(img_file, "Who authorized payment?")

In [None]:
nlp(img_file, "Where was travel to?")