The document provides a comprehensive guide on how to set up and uptrain an Invoice Parser using Google Cloud's Document AI. Here is a breakdown of the content:
Steps to Setup Invoice Parser in Document AI
Start by creating a new project in Google Cloud.
In the Google Cloud console, navigate to Document AI and select the Processor Gallery. Search for Invoice Parser and create a new processor, giving it a name and selecting the closest region.
Create a new Cloud Storage bucket to store the dataset required for training and testing the processor.
Import a sample invoice PDF file into your dataset for manual labeling to help the processor identify the entities to extract. Label the fields in the sample document using the provided tools.
Edit the schema to mark unused labels as inactive and add custom labels if needed before starting the training.
Annotate the sample document by selecting text and applying labels. Ensure all instances of an entity are annotated correctly.
Assign the labeled document to the training set to prepare it for training.
Import pre-labeled documents into the training and test sets, ensuring you have enough documents and label instances for effective training. Optionally, use auto-labeling for new documents if there is an existing deployed processor version.
Start the training process after setting up the processor with the appropriate data and labels. Training may take several hours.
Deploy the trained processor version to make it ready for use.
Evaluate the processor's performance using metrics like F1 score, precision, and recall. Test the processor with new documents not used in training or testing to validate its performance.
Manage your custom-trained processor versions and send processing requests to handle entity extraction tasks.
Manually label a sample document to guide the processor.
Edit and manage the schema to include only relevant labels.
Ensure a sufficient number of labeled documents in the training and test sets.
Train the processor and deploy the trained version.
Evaluate the processor using various metrics and test it with new data.
Use and manage the custom-trained processor for processing requests.