Note: This program uses Qt 5 as GUI. I used PyQt5 library instead of PySide2 suggested by The Qt Company. I uploaded this program on GitHub as an example to show integration of Qt with Python.
This is a small project that I worked upon while doing my internship. The organization had done OCR on some documents where the output was a PDF document with recognized text and the Registration no. of a document was their filename. However, some of the registeration nos. were recognized incorrectly, hence, the filename was wrong too.
After observations, location of registration no. on the document was found to be in a particular region for a period of time. Using this information, image localization was performed to get a cropped area of first page of each document.
This program takes a directory containing PDFs as input and then performs:
- Image localization to get the cropped area
- User manually checks and corrects the registration no. if wrong
- At a time, only 3 entries are shown. The documents are divided into batches each containing 3 files
- Changes are saved after user moves on to next batch
- The files are renamed and moved to a new folder. Localized images are also saved.
- A .csv file is generated which contains the original and final names.
- Python 3
- Any OS that supports Python 3 and following libraries.
- PyQt5
- PyMuPDF
- Pillow (PIL)
Install using : pip install --user requirements.txt
Note: It is recommended to use Python virtual environments like venv and conda environments.
General:
python docname_verif.py
Linux and macOS may have Python 2 as default, then use:
python3 docname_verif.py
On Windows, Python allows you to launch in windowed mode:
pythonw docname_verif.py
or you may launch the program using the Batch file.
- docname_verif.py
This is the main code. It contains the back-end of the program. - docver_gui.py
Qt User Interface file (.ui) was converted to Python3 code using pyuic5. This utility converts .ui files to .py. This Python code serves as the front-end of the program. - docver_gui.ui
Although not required by the program, it is included if UI needs to be changed using Qt Designer/Creator instead of modifying the python script. - docname_verif.bat
This is a Batch file. It launches the program on Windows and exits.
Screenshots of the main window are in the repo's wiki at: https://github.com/alphrho/docname_verifier/wiki/Screenshots:-Main-Window
The program is a bit slow which was an unpleasant experience.