Skip to content

MagnoCarlos/File_ComparisonPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

File ComparisonPy

A Python document-comparison utility for comparing document text and generating a Word report that highlights differences.

This is an older Python utility project, preserved as evidence of my scripting, document-processing, and automation background.

What it does

The script compares two document files and identifies text differences.

It currently demonstrates:

  • reading text from .docx files
  • comparing document content word by word
  • printing detected differences to the terminal
  • generating a report.docx file
  • highlighting differences in bold inside the generated report
  • reading a PDF file and reporting the number of pages

Technical focus

This repository demonstrates:

  • Python scripting
  • document automation
  • text processing
  • file comparison logic
  • Word document generation
  • basic PDF handling
  • practical utility scripting

Repository structure

Path Purpose
File_Comparison/ Main script and example files
File_Comparison/main.py Python comparison script
README.md Project overview

Requirements

The script uses:

  • Python
  • python-docx
  • PyPDF2

Install dependencies with:

pip install python-docx PyPDF2

Current usage

From inside the File_Comparison/ folder, run:

python main.py

The current script expects these files to be present in the same folder:

  • file1.docx
  • file2.docx
  • file1.pdf

It generates:

  • report.docx

Current status

This is an older learning/utility project. It is useful as a demonstration of Python automation and document-processing logic, but it is not yet packaged as a reusable command-line tool.

Known limitations

  • Input filenames are currently hardcoded.
  • The script expects files to be in the same folder as main.py.
  • The comparison logic is basic and word-based.
  • The PDF handling currently reports page count rather than full PDF text comparison.
  • Error handling could be improved.
  • The project could be refactored into a clearer command-line interface.

Future improvements

A stronger version could add:

  • command-line arguments for input files
  • support for choosing output report name
  • better error messages
  • full PDF text extraction
  • paragraph-level comparison
  • tests with sample documents
  • clearer report formatting

Why this matters

This project supports my background in practical automation: taking repetitive document-review work and turning it into a scriptable workflow.

My current portfolio focus is applied AI workflow automation, but this project shows an earlier foundation in Python scripting, file handling, and document-processing utilities.

About

Python document-comparison utility for comparing DOCX/PDF files and generating a highlighted Word report of differences.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages