PDFExtractor This is a simple script built using Invoice2Data, regex to extract data from invoices and add it to a json file We made slight modifications to Invoice2Data, removed pdftotext and replaced it with PyPDF2