Tags: py-pdf/pypdf
Tags
REL: 5.6.0 ## What's new ### New Features (ENH) - Add basic support for JBIG2 by using jbig2dec (#3163) by @stefan6419846 ### Bug Fixes (BUG) - Fix crashes by removing unnecessary line (#3293) by @larsga - Add delimiters to NameObject.renumber_table (#3286) by @ztravis ### Robustness (ROB) - Handle DecodeParms being a NullObject (#3285) by @stefan6419846 ### Code Style (STY) - Update to mypy 1.16.0 (#3300) by @stefan6419846 [Full Changelog](5.5.0...5.6.0)
REL: 5.5.0 ## What's new ### New Features (ENH) - Add support for IndirectObject.__iter__ (#3228) by @bryan-brancotte - Allow filtering by font when removing text (#3216) by @samuelbradshaw ### Bug Fixes (BUG) - Add missing named destinations being ByteStringObjects (#3282) by @stefan6419846 - Get font information more reliably when removing text (#3252) by @samuelbradshaw - T* 2D Translation consistent with PDF 1.7 Spec (#3250) by @hackowitz-af - Add font stack to q/Q operations in layout mode (#3225) by @hackowitz-af - Avoid completely hiding image loading issues like exceeding image size limits (#3221) by @stefan6419846 - Using compress_identical_objects on transformed content duplicates differing content (#3197) by @danio - Consider BlackIs1 parameter for CCITTFaxDecode filter (#3196) by @stefan6419846 ### Robustness (ROB) - Deal with insufficient cm matrix during text extraction (#3283) by @stefan6419846 - Allow merging when annotations miss D entry (#3281) by @stefan6419846 - Fix merging documents if there are no Dests (#3280) by @stefan6419846 - Fix crash on malformed action in outline (#3278) by @larsga - Fix compression issues for removed images which might be None (#3246) by @stefan6419846 - Attempt to deal with non-rectangular FlateDecode streams (#3245) by @stefan6419846 - Handle some None values for broken PDF files (#3230) by @stefan6419846 ### Developer Experience (DEV) - Multiple style improvements by @j-t-1 - Update ruff to 0.11.0 by @stefan6419846 ### Maintenance (MAINT) - Conform ASCIIHexDecode implementation to specification (#3274) by @j-t-1 - Modify comments of filters that do not use decode_parms (#3260) by @j-t-1 ### Code Style (STY) - Simplify warnings & debugging in layout mode text extraction (#3271) by @hackowitz-af - Standardize mypy assert statements (#3276) by @j-t-1 [Full Changelog](5.4.0...5.5.0)
REL: 5.4.0 ## What's new ### New Features (ENH) - Add support for `IndirectObject.__contains__` (#3155) by @noamkush ### Bug Fixes (BUG) - Fix detection of inline images followed by names or numbers (#3173) by @stefan6419846 ### Robustness (ROB) - Consider root objects without catalog type as fallback (#3175) by @stefan6419846 - Raise proper error on infinite loop when reading objects (#3169) by @stefan6419846 ### Documentation (DOC) - Mention memory consumption of text extraction (#3168) by @stefan6419846 ### Developer Experience (DEV) - Upgrade to ruff 0.10.0 (#3191) by @stefan6419846 [Full Changelog](5.3.1...5.4.0)
REL: 5.3.1 ## What's new ### Bug Fixes (BUG) - Use the correct name StandardEncoding for the predefined cmap (#3156) by @stefan6419846 - Handle inline images containing `EI ` sequences (#3152) by @stefan6419846 - Fix check box value which should be name object (#3124) by @stefan6419846 - Fix stream position on inline image fallback extraction (#3120) by @stefan6419846 - Fix object count for incremental writer (#3117) by @m32 ### Robustness (ROB) - Avoid index errors on empty lines in xref table (#3162) by @stefan6419846 - Improve handling of LZW decoder table overflow (#3159) by @stefan6419846 - Ignore non-numbers for width when building font width map (#3158) by @stefan6419846 - Avoid negative seek values when reading partially broken files (#3157) by @stefan6419846 ### Documentation (DOC) - Fixed PageObject.images example usage for replacing image (#3149) by @jutoth [Full Changelog](5.3.0...5.3.1)
REL: 5.3.0 ## What's new ### New Features (ENH) - Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846 ### Bug Fixes (BUG) - Handle annotations being None on merging (#3111) by @stefan6419846 ### Robustness (ROB) - Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog ### Documentation (DOC) - stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma ### Developer Experience (DEV) - Remove ignoring multiple Ruff rules by @j-t-1 - Remove unused mutmut configuration (#3092) by @stefan6419846 ### Testing (TST) - Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny [Full Changelog](5.2.0...5.3.0)
REL: 5.2.0 ## What's new ### Deprecations (DEP) - Deprecate with replacement CCITParameters (#3019) by @j-t-1 - Correct deprecation of interiour_color (#2947) by @j-t-1 ### New Features (ENH) - Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846 - Adding support for reading .metadata.keywords (#2939) by @Lucas-C ### Bug Fixes (BUG) - Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin - Ensure `add_metadata` can deal with `_info = None` (#3040) by @xmo-odoo - Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846 - Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846 - Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846 - Fixed reference of value with `str.__new__` in TextStringObject (#2952) by @thomas-forte - Handle indirect objects in font width calculations (#2967) by @nsw42 - Title sometimes is bytes and not str (#2930) by @reformy - Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846 - Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn ### Robustness (ROB) - Handle zero height fonts when extracting text (#3075) by @blushingpenguin - Deal with content streams not containing streams (#3005) by @stefan6419846 - Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846 - Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846 - Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846 - Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846 - Skip destination page being None in PdfWriter (#2963) by @dxsooo - Fix infinite loop case when reading null objects within an Array by @jakep-allenai - Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai ### Documentation (DOC) - Add note about default line colors (#3014) by @stefan6419846 ### Developer Experience (DEV) - Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1 - Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1 - Move Windows CI to Python 3.13 (#3003) by @stefan6419846 - Move to Ubuntu 22.04 (#3004) by @stefan6419846 ### Maintenance (MAINT) - Fix formatting of warning message and include exception message (#3076) by @stefan6419846 - Narrow return type for `ContentStream.operations` (#2941) by @kmurphy4 ### Testing (TST) - Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846 - Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846 ### Code Style (STY) - Add form feed to WHITESPACES (#3054) by @j-t-1 - Lots of small internal changes by @j-t-1 [Full Changelog](5.1.0...5.2.0)
REL: 5.1.0 ## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei - Improve handling of spaces in text extraction (#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (#2919) by @stefan6419846 - Generalize the method of obtaining space_code (#2891) by @ssjkamei - Unnecessary character mapping process (#2888) by @ssjkamei - New LZW decoding implementation (#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (#2903) by @j-t-1 - Modify error messages in PdfWriter (#2902) by @j-t-1 [Full Changelog](5.0.1...5.1.0)
REL: 5.0.1 (#2884) ## Version 5.0.1, 2024-09-29 ### New Features (ENH) - Add `full` parameter to PdfWriter constructor (#2865) ### Bug Fixes (BUG) - Update pyproject.toml with minimum Python version of 3.8 (#2859) - Cope with unbalanced delimiters in dictionary object (#2878) - Cope with encoding with too many differences (#2873) - Missing spaces in extract_text() method (#1328) (#2868) - Tolerate truncated files and no warning when jumping startxref (#2855) ### Robustness (ROB) - Repair PDF with invalid Root object (#2880) - Continue parsing dictionary object when error is detected (#2872) - Merge documents with invalid pages in named destinations (#2857) - Tolerate comments in arrays (#2856) ### Developer Experience (DEV) - Use latest Python version for benchmarking (#2879) ### Maintenance (MAINT) - Add tests to source distributions (#2874) - Refactor _update_field_annotation (#2862) [Full Changelog](5.0.0...5.0.1)
REL: 5.0.0 (#2851) ## Version 5.0.0, 2024-09-15 This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead). ### Deprecations (DEP) - Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813) - Drop Python 3.7 support (#2793) ### New Features (ENH) - Add capability to remove /Info from PDF (#2820) - Add incremental capability to PdfWriter (#2811) - Add UniGB-UTF16 encodings (#2819) - Accept utf strings for metadata (#2802) - Report PdfReadError instead of RecursionError (#2800) - Compress PDF files merging identical objects (#2795) ### Bug Fixes (BUG) - Fix sheared image (#2801) ### Robustness (ROB) - Robustify .set_data() (#2821) - Raise PdfReadError when missing /Root in trailer (#2808) - Fix extract_text() issues on damaged PDFs (#2760) - Handle images with empty data when processing an image from bytes (#2786) ### Developer Experience (DEV) - Fix coverage uploads (#2832) - Test against Python 3.13 (#2776) [Full Changelog](4.3.1...5.0.0)
## Version 4.3.1, 2024-07-21 ### Bug Fixes (BUG) - Cope with Matrix entry in field annotations (#2736) ### Robustness (ROB) - Cope with fields with upside down box/rectangle (#2729) ### Maintenance (MAINT) - Add deprecate_with_replacement to StreamObject.initializeFromD… (#2728) - Deal with cryptography>=43 moving ARC4 (#2765) [Full Changelog](4.3.0...4.3.1)
PreviousNext