v3.0.0-alpha.1 Encrypted document support, major text extraction updates
·
39 commits
to main
since this release
New Features
- Added support for encrypted PDFs
- New Text grouping algorithm: text with majority vertical overlap is considered part of the same line. Fixes subscript-superscript extraction issues.
- Several transformation matrix issues solved, fixing text extraction/ordering issues
Other changes
- Fix scale operand doesn't accept trailing 0 by @szepeviktor in #303
- Dictionary entry for key Order can be of type ReferenceValueArray by @PrinsFrank in #316
- Value for dictionaryKey AP can be a single dictionary by @PrinsFrank in #319
- Automatically resolve values from subdictionaries when expected value type is not a dictionary by @PrinsFrank in #321
- Simplify type checks XObject by @PrinsFrank in #322
- Implement value retrieval from ancestor nodes in page tree for inheritable properties by @PrinsFrank in #323
- Automatically resolve references in dictionary entries when retrieving values by @PrinsFrank in #325
- feat(rectangle): add width and height helpers by @vitormattos in #317
- Fix invalid section reference for file encryption key calculation by @PrinsFrank in #327
- User password entry length should always be 32 regardless of security handler revision by @PrinsFrank in #328
- Add file encryption key to metadata for samples by @PrinsFrank in #329
- Enable support for encrypted documents by @PrinsFrank in #282
- Add sample with user/owner password by @PrinsFrank in #332
- Add information about debugging file encryption to CONTRIBUTING.md by @PrinsFrank in #333
- Add support for all escape sequences in literal strings by @PrinsFrank in #334
- Support octals with one or two digits (next to support for three) in string literals by @PrinsFrank in #335
- Clean up decoding of string literals and hex strings in EncryptDictionary and use getText instead by @PrinsFrank in #336
- Fix improper handling of hex encoded binary strings in password entries by @PrinsFrank in #337
- Update minimum required PHP version to 8.2 by @PrinsFrank in #338
- Switch from readonly properties to readonly classes wherever possible by @PrinsFrank in #339
- Check file encryption key for samples by @PrinsFrank in #330
- Add upgrade guide for v3 by @PrinsFrank in #340
- Document argument for getValueForKey on dictionary is now required by @PrinsFrank in #341
- Recover userPassword from ownerPassword to also add support for ownerPasswords by @PrinsFrank in #343
- Cache calculated file encryption key on document by @PrinsFrank in #344
- Fix newly discovered PHPStan issue by @PrinsFrank in #346
- Update sponsorship section in README by @PrinsFrank in #345
- Properly parse hex strings by @PrinsFrank in #348
- Decrypt dictionary entries while parsing dictionaries in encrypted documents by @PrinsFrank in #347
- Decrypt content of compressed objects before parsing by @PrinsFrank in #349
- Replace escaped characters in encrypted strings before running decryption by @PrinsFrank in #350
- Check dictionary and page content for encrypted documents by @PrinsFrank in #342
- Add missing PNG predictor algorithms by @PrinsFrank in #351
- Flate decode columns should be multiplied by colors if present by @PrinsFrank in #352
- Ignore "endobj" markers in streams and search after length of stream dictionarymarker for it to allow for proper embedded PDF support by @PrinsFrank in #296
- The resource dictionary is now inherited by @PrinsFrank in #353
- Add sample with different font sizes by @PrinsFrank in #354
- Abstract line grouping strategy to make it replaceable by @PrinsFrank in #355
- Fix incorrect matrix multiplication in Move and MoveOffsetLeading operators causing scrambled text by @PrinsFrank in #356
- Apply transformation for NEXT_LINE Text positioning operator by @PrinsFrank in #358
- Add new overlap grouping strategy for text by @PrinsFrank in #357
- Fix initial text state not being set and appended/restored from stack resulting in lost textObjects by @PrinsFrank in #359
- Added sample file for #272 by @k00ni in #273
- Fix issues with operators that interact with both text state and transformation matrix by @PrinsFrank in #360
- Fix incorrect inverse matrix multiplication in graphicsStateOperator by @PrinsFrank in #361
- Handle text extraction with inverted Y-axis by @PrinsFrank in #362
- Use LineFeed as default page separator when extracting text for multiple pages by @PrinsFrank in #363
- Add sample from issue #290 by @PrinsFrank in #364
- Properly support encrypted documents in sample generation by @PrinsFrank in #365
- Move CONTRIBUTING.md to root of project by @PrinsFrank in #366
New Contributors
- @vitormattos made their first contribution in #317
Full Changelog: v2.8.0...v3.0.0-alpha.1