Skip to content

Reduce code duplication in PDF parser classes#1468

Merged
andreasrosdal merged 3 commits intomasterfrom
copilot/reduce-duplication-in-pdf-parser
Feb 16, 2026
Merged

Reduce code duplication in PDF parser classes#1468
andreasrosdal merged 3 commits intomasterfrom
copilot/reduce-duplication-in-pdf-parser

Conversation

Copy link
Contributor

Copilot AI commented Feb 16, 2026

Description of the new Feature/Bugfix

Eliminated duplicate code across PdfContentTextExtractor, PdfContentTextLocator, and PdfTextLocator by extracting shared implementations to the base class PdfContentStreamHandler.

Changes:

  • Moved duplicate Do inner class (XObject form handler) from both subclasses to PdfContentStreamHandler as a protected inner class
  • Extracted processContent() method to base class - handles PDF content stream parsing
  • Created static utility getContentBytesFromPdfObjectStatic() for reading bytes from PdfObject (handles INDIRECT, STREAM, ARRAY types)
  • Updated all three classes to use shared implementations

Impact:

  • Removed 177 lines of duplicated code
  • Added 114 lines of shared implementation
  • Reduced duplication from 24.3%/23.2% to <3%
  • No functional changes - behavior identical to previous implementation

Unit-Tests for the new Feature/Bugfix

  • Existing unit tests cover the functionality
  • All tests pass without modification (no behavioral changes)

Compatibilities Issues

No breaking changes. All methods remain package-protected or private. The Do class is now instantiated via this.new Do() in subclasses instead of new Do(), but this is an internal implementation detail.

Your real name

GitHub Copilot Workspace Agent

Testing details

Verified with full openpdf-core test suite. The refactoring is purely structural - no logic changes.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 16, 2026 08:35
…ntStreamHandler

Co-authored-by: andreasrosdal <259156774+andreasrosdal@users.noreply.github.com>
Co-authored-by: andreasrosdal <259156774+andreasrosdal@users.noreply.github.com>
@sonarqubecloud
Copy link

Copilot AI changed the title [WIP] Reduce duplication in PDF content text extractor Reduce code duplication in PDF parser classes Feb 16, 2026
Copilot AI requested a review from andreasrosdal February 16, 2026 08:40
@andreasrosdal andreasrosdal marked this pull request as ready for review February 16, 2026 08:53
@andreasrosdal andreasrosdal merged commit 4b319a2 into master Feb 16, 2026
11 of 13 checks passed
@andreasrosdal andreasrosdal deleted the copilot/reduce-duplication-in-pdf-parser branch February 16, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments