A state-of-the-art GUI application that uses OpenAI's GPT-4o vision model to analyze handwritten annotations on PDF documents and intelligently apply those changes to Word documents.
- GPT-4o Vision Integration: Sophisticated analysis of handwritten annotations, markings, and corrections
- Intelligent Relationship Detection: Identifies connections between arrows, strikethroughs, and corrections
- Context-Aware Processing: Understands the intent behind different types of markings
- Handwriting Recognition: Converts handwritten text to digital text and places it appropriately
- Strikethrough Processing: Detects crossed-out text and applies replacements or deletions
- Cross Mark Analysis: Distinguishes between small item deletions and large section removals
- Arrow Following: Tracks arrows to understand correction relationships
- Proximity Analysis: Uses spatial relationships to determine annotation intent
- Intuitive Design: Clean, professional interface with real-time preview
- Progress Tracking: Live progress bars and detailed logging
- Results Visualization: Comprehensive analysis results with confidence scoring
- Batch Processing: Handle multiple documents efficiently
- Export Options: Save results in multiple formats (Word, Excel, JSON)
- Double-click
start.batto launch the application - The setup wizard will guide you through initial configuration
- Add your OpenAI API key when prompted
-
Install Python 3.8+ from python.org
-
Install Dependencies:
pip install -r requirements.txt
-
Configure API Key:
- Edit
.envfile and add your OpenAI API key:
OPENAI_API_KEY=your-actual-api-key-here - Edit
-
Launch Application:
python launcher.py
- Upload PDF: Select the PDF with handwritten annotations
- Upload Word Document: Choose the Word document to modify
- Configure Settings: Adjust detection settings and confidence thresholds
- PDF Chunking: Breaks PDF into analyzable image chunks
- GPT-4o Analysis: Each chunk is analyzed for:
- Handwritten text and corrections
- Strikethrough text and deletions
- Cross marks (small deletions vs. large section removal)
- Arrows connecting annotations to text
- Highlighting and emphasis marks
- Margin notes and annotations
- Text Matching: AI matches PDF annotations to Word document content
- Relationship Analysis: Determines relationships between markings:
- Handwriting near strikethrough = replacement
- Arrows pointing from corrections to text
- Large crosses = delete entire paragraphs
- Small crosses = delete specific items
- Smart Application: Applies changes based on detected intent:
- Replace: Strikethrough text replaced with handwritten corrections
- Delete: Crossed-out sections removed completely
- Insert: Handwritten additions placed appropriately
- Highlight: Important sections emphasized
- Comment: Uncertain annotations added as reviewable comments
- Document Review: Apply handwritten feedback to digital documents
- Manuscript Editing: Convert paper edits to digital format
- Form Processing: Digitize handwritten form corrections
- Contract Modification: Apply legal annotations and changes
- Academic Papers: Process reviewer comments and corrections
- Business Documents: Convert meeting notes and annotations
βββ gui_main.py # Main GUI application
βββ launcher.py # Setup wizard and launcher
βββ document_parser.py # Core document processing engine
βββ chunk_analyzer.py # GPT-4o vision analysis
βββ advanced_word_processor.py # Intelligent Word document modification
βββ document_preprocessor.py # PDF to image conversion and chunking
βββ word_processor.py # Basic Word document operations
βββ config.yaml # Configuration settings
βββ .env # API keys and environment variables
βββ requirements.txt # Python dependencies
βββ setup.py # Initial setup script
βββ example_usage.py # Usage examples
βββ start.bat # Windows launcher script
βββ README.md # This file
analysis:
detect_handwriting: true # Detect handwritten text
detect_strikethrough: true # Detect crossed-out text
detect_crosses: true # Detect X marks and crosses
detect_arrows: true # Detect arrows and connections
detect_highlighting: true # Detect highlighted text
confidence_threshold: 0.7 # Minimum confidence for detectionsdocument:
chunk_size: 512 # Image chunk size in pixels
overlap: 64 # Overlap between chunks
supported_formats: [".pdf", ".png", ".jpg", ".jpeg", ".tiff"]word_processing:
preserve_formatting: true # Keep original formatting
highlight_detected_items: true # Highlight changes in output
add_comments: true # Add comments for uncertain items
track_changes: false # Enable Word change trackingThe system uses sophisticated prompts to guide GPT-4o analysis:
- Relationship detection between markings
- Size-based deletion scope determination
- Context-aware text placement
- Confidence-based action selection
- Process multiple PDF-Word document pairs
- Consistent settings across all documents
- Detailed processing reports
- Error handling and recovery
- Modified Word Documents: Final documents with applied changes
- Analysis Reports: Detailed breakdown of all detections
- Excel Summaries: Tabular data for further analysis
- JSON Results: Raw analysis data for custom processing
- Python: 3.8 or higher
- OpenAI API Key: GPT-4o vision access required
- Operating System: Windows 10/11 (primary), macOS, Linux
- Memory: 4GB RAM minimum, 8GB recommended
- Storage: 1GB free space for processing
- High-Quality PDFs: Use PDFs with clear, readable text and annotations
- Consistent Handwriting: Clear, legible handwriting improves accuracy
- Good Contrast: Ensure annotations are clearly visible against the background
- Logical Annotations: Use consistent marking patterns (arrows, crosses, etc.)
- API Limits: Be mindful of OpenAI API usage limits for large documents
- API Key Error: Verify your OpenAI API key in the
.envfile - Dependencies Missing: Run
pip install -r requirements.txt - PDF Processing Error: Ensure PDF is not password-protected
- Memory Error: Reduce chunk size in configuration for large documents
- Check the processing log in the GUI for detailed error information
- Verify all dependencies are properly installed
- Ensure your OpenAI API key has GPT-4o vision access
- Try processing a smaller test document first
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Document Processor Pro - Bridging the gap between handwritten annotations and digital documents with AI-powered intelligence.