A Python document-comparison utility for comparing document text and generating a Word report that highlights differences.
This is an older Python utility project, preserved as evidence of my scripting, document-processing, and automation background.
The script compares two document files and identifies text differences.
It currently demonstrates:
- reading text from
.docxfiles - comparing document content word by word
- printing detected differences to the terminal
- generating a
report.docxfile - highlighting differences in bold inside the generated report
- reading a PDF file and reporting the number of pages
This repository demonstrates:
- Python scripting
- document automation
- text processing
- file comparison logic
- Word document generation
- basic PDF handling
- practical utility scripting
| Path | Purpose |
|---|---|
File_Comparison/ |
Main script and example files |
File_Comparison/main.py |
Python comparison script |
README.md |
Project overview |
The script uses:
- Python
python-docxPyPDF2
Install dependencies with:
pip install python-docx PyPDF2From inside the File_Comparison/ folder, run:
python main.pyThe current script expects these files to be present in the same folder:
file1.docxfile2.docxfile1.pdf
It generates:
report.docx
This is an older learning/utility project. It is useful as a demonstration of Python automation and document-processing logic, but it is not yet packaged as a reusable command-line tool.
- Input filenames are currently hardcoded.
- The script expects files to be in the same folder as
main.py. - The comparison logic is basic and word-based.
- The PDF handling currently reports page count rather than full PDF text comparison.
- Error handling could be improved.
- The project could be refactored into a clearer command-line interface.
A stronger version could add:
- command-line arguments for input files
- support for choosing output report name
- better error messages
- full PDF text extraction
- paragraph-level comparison
- tests with sample documents
- clearer report formatting
This project supports my background in practical automation: taking repetitive document-review work and turning it into a scriptable workflow.
My current portfolio focus is applied AI workflow automation, but this project shows an earlier foundation in Python scripting, file handling, and document-processing utilities.