Developed a Python automation script at Pinchin Ltd. that extracted data from PDFs and exported it to Excel, significantly reducing manual data entry time.
This project automates the workflow of taking structured data out of PDF test results and turned the data into a clean Excel template for analysis and storage.
It was originally built to reduce manual data entry time for laboratory staff by parsing common report formats and writing the results into spreadsheets.
- Extracts tabular or semi-structured data from PDFs
- Maps extracted fields to a defined Excel layout
- Skips duplicate or malformed entries to protect data integrity
- Logs processing results (success, skipped, errors)
- Designed to be extended for new PDF templates
- Python 3.x
- PDF parsing:
pdfplumber - Excel writing:
openpyxl
src/
main.py # CLI / entry point
mold_processing.py # PDF extraction logic
testing.py
samples/
Example.xlsx