Skip to content

v3.0.0-alpha.1 Encrypted document support, major text extraction updates

Choose a tag to compare

@PrinsFrank PrinsFrank released this 24 May 13:45
· 39 commits to main since this release
e701679

New Features

  • Added support for encrypted PDFs
  • New Text grouping algorithm: text with majority vertical overlap is considered part of the same line. Fixes subscript-superscript extraction issues.
  • Several transformation matrix issues solved, fixing text extraction/ordering issues

Other changes

  • Fix scale operand doesn't accept trailing 0 by @szepeviktor in #303
  • Dictionary entry for key Order can be of type ReferenceValueArray by @PrinsFrank in #316
  • Value for dictionaryKey AP can be a single dictionary by @PrinsFrank in #319
  • Automatically resolve values from subdictionaries when expected value type is not a dictionary by @PrinsFrank in #321
  • Simplify type checks XObject by @PrinsFrank in #322
  • Implement value retrieval from ancestor nodes in page tree for inheritable properties by @PrinsFrank in #323
  • Automatically resolve references in dictionary entries when retrieving values by @PrinsFrank in #325
  • feat(rectangle): add width and height helpers by @vitormattos in #317
  • Fix invalid section reference for file encryption key calculation by @PrinsFrank in #327
  • User password entry length should always be 32 regardless of security handler revision by @PrinsFrank in #328
  • Add file encryption key to metadata for samples by @PrinsFrank in #329
  • Enable support for encrypted documents by @PrinsFrank in #282
  • Add sample with user/owner password by @PrinsFrank in #332
  • Add information about debugging file encryption to CONTRIBUTING.md by @PrinsFrank in #333
  • Add support for all escape sequences in literal strings by @PrinsFrank in #334
  • Support octals with one or two digits (next to support for three) in string literals by @PrinsFrank in #335
  • Clean up decoding of string literals and hex strings in EncryptDictionary and use getText instead by @PrinsFrank in #336
  • Fix improper handling of hex encoded binary strings in password entries by @PrinsFrank in #337
  • Update minimum required PHP version to 8.2 by @PrinsFrank in #338
  • Switch from readonly properties to readonly classes wherever possible by @PrinsFrank in #339
  • Check file encryption key for samples by @PrinsFrank in #330
  • Add upgrade guide for v3 by @PrinsFrank in #340
  • Document argument for getValueForKey on dictionary is now required by @PrinsFrank in #341
  • Recover userPassword from ownerPassword to also add support for ownerPasswords by @PrinsFrank in #343
  • Cache calculated file encryption key on document by @PrinsFrank in #344
  • Fix newly discovered PHPStan issue by @PrinsFrank in #346
  • Update sponsorship section in README by @PrinsFrank in #345
  • Properly parse hex strings by @PrinsFrank in #348
  • Decrypt dictionary entries while parsing dictionaries in encrypted documents by @PrinsFrank in #347
  • Decrypt content of compressed objects before parsing by @PrinsFrank in #349
  • Replace escaped characters in encrypted strings before running decryption by @PrinsFrank in #350
  • Check dictionary and page content for encrypted documents by @PrinsFrank in #342
  • Add missing PNG predictor algorithms by @PrinsFrank in #351
  • Flate decode columns should be multiplied by colors if present by @PrinsFrank in #352
  • Ignore "endobj" markers in streams and search after length of stream dictionarymarker for it to allow for proper embedded PDF support by @PrinsFrank in #296
  • The resource dictionary is now inherited by @PrinsFrank in #353
  • Add sample with different font sizes by @PrinsFrank in #354
  • Abstract line grouping strategy to make it replaceable by @PrinsFrank in #355
  • Fix incorrect matrix multiplication in Move and MoveOffsetLeading operators causing scrambled text by @PrinsFrank in #356
  • Apply transformation for NEXT_LINE Text positioning operator by @PrinsFrank in #358
  • Add new overlap grouping strategy for text by @PrinsFrank in #357
  • Fix initial text state not being set and appended/restored from stack resulting in lost textObjects by @PrinsFrank in #359
  • Added sample file for #272 by @k00ni in #273
  • Fix issues with operators that interact with both text state and transformation matrix by @PrinsFrank in #360
  • Fix incorrect inverse matrix multiplication in graphicsStateOperator by @PrinsFrank in #361
  • Handle text extraction with inverted Y-axis by @PrinsFrank in #362
  • Use LineFeed as default page separator when extracting text for multiple pages by @PrinsFrank in #363
  • Add sample from issue #290 by @PrinsFrank in #364
  • Properly support encrypted documents in sample generation by @PrinsFrank in #365
  • Move CONTRIBUTING.md to root of project by @PrinsFrank in #366

New Contributors

Full Changelog: v2.8.0...v3.0.0-alpha.1