Skip to content

Latest commit

 

History

History
 
 

pdf-annotator-python

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Document AI PDF Annotator Sample

This project uses the Document AI API to annotate PDF documents.

Quick start

  1. Install Python
  2. Install the prerequisites: pip install -r requirements.txt
  3. Install the Google Cloud SDK
  4. Run gcloud init and create a new project
  5. Enable the Document AI API: gcloud services enable documentai.googleapis.com
  6. Setup application default authentication, run: gcloud auth application-default login
  7. Clone this repo and run the sample: python main.py -i invoice.pdf. You should see the annotated document in the current directory named invoice_annotated.pdf.

Setup

Install dependencies

  1. Install pyenv: https://github.com/pyenv/pyenv#installation
  2. Use pyenv to install the latest version of Python 3 for example, to install Python version 3.10.1, run: pyenv install 3.10.1
  3. Create a Python virtual environment with the installed version of Python 3, for example, to create a Python 3.10.1 virtual environment called docai-annotator, run: pyenv virtualenv 3.10.1 docai-annotator
  4. Clone this repo and cd to the root of the repo
  5. Configure pyenv to use the virtual python environment we created earlier when in this repo: pyenv local docai-annotator
  6. Install the prerequisites: pip install -r requirements.txt

Setup Google Cloud

  1. Install the Cloud SDK: https://cloud.google.com/sdk/docs/install
  2. Run gcloud init, to create a new project, and link a billing to your project
  3. Enable the Document AI API: gcloud services enable documentai.googleapis.com
  4. Setup application default authentication, run: gcloud auth application-default login

Testing

Manual

  1. Run the sample: python main.py -i invoice.pdf
  2. Check to see the annotated version of the PDF created in the current directory with the name invoice_annotated.pdf.