Tags · py-pdf/pypdf

5.6.1

REL: 5.6.1

## What's new

### New Features (ENH)
- Add PDF/A XMP metadata support (#3314) by @Arya-A-Nair

### Robustness (ROB)
- Deal with annotations not being lists on merge (#3321) by @stefan6419846
- Handle NullObject for cmap encoding Differences entry (#3317) by @stefan6419846

### Developer Experience (DEV)
- Update ruff to 0.12.0 (#3316) by @stefan6419846

[Full Changelog](5.6.0...5.6.1)

Jun 22, 2025
c721d1f
zip
tar.gz
Notes

5.6.0

REL: 5.6.0

## What's new

### New Features (ENH)
- Add basic support for JBIG2 by using jbig2dec (#3163) by @stefan6419846

### Bug Fixes (BUG)
- Fix crashes by removing unnecessary line (#3293) by @larsga
- Add delimiters to NameObject.renumber_table (#3286) by @ztravis

### Robustness (ROB)
- Handle DecodeParms being a NullObject (#3285) by @stefan6419846

### Code Style (STY)
- Update to mypy 1.16.0 (#3300) by @stefan6419846

[Full Changelog](5.5.0...5.6.0)

Jun 1, 2025
40752eb
zip
tar.gz
Notes

5.5.0

REL: 5.5.0

## What's new

### New Features (ENH)
- Add support for IndirectObject.__iter__ (#3228) by @bryan-brancotte
- Allow filtering by font when removing text (#3216) by @samuelbradshaw

### Bug Fixes (BUG)
- Add missing named destinations being ByteStringObjects (#3282) by @stefan6419846
- Get font information more reliably when removing text (#3252) by @samuelbradshaw
- T* 2D Translation consistent with PDF 1.7 Spec (#3250) by @hackowitz-af
- Add font stack to q/Q operations in layout mode (#3225) by @hackowitz-af
- Avoid completely hiding image loading issues like exceeding image size limits (#3221) by @stefan6419846
- Using compress_identical_objects on transformed content duplicates differing content (#3197) by @danio
- Consider BlackIs1 parameter for CCITTFaxDecode filter (#3196) by @stefan6419846

### Robustness (ROB)
- Deal with insufficient cm matrix during text extraction (#3283) by @stefan6419846
- Allow merging when annotations miss D entry (#3281) by @stefan6419846
- Fix merging documents if there are no Dests (#3280) by @stefan6419846
- Fix crash on malformed action in outline (#3278) by @larsga
- Fix compression issues for removed images which might be None (#3246) by @stefan6419846
- Attempt to deal with non-rectangular FlateDecode streams (#3245) by @stefan6419846
- Handle some None values for broken PDF files (#3230) by @stefan6419846

### Developer Experience (DEV)
- Multiple style improvements by @j-t-1
- Update ruff to 0.11.0 by @stefan6419846

### Maintenance (MAINT)
- Conform ASCIIHexDecode implementation to specification (#3274) by @j-t-1
- Modify comments of filters that do not use decode_parms (#3260) by @j-t-1

### Code Style (STY)
- Simplify warnings & debugging in layout mode text extraction (#3271) by @hackowitz-af
- Standardize mypy assert statements (#3276) by @j-t-1

[Full Changelog](5.4.0...5.5.0)

May 11, 2025
be7c870
zip
tar.gz
Notes

5.4.0

REL: 5.4.0

## What's new

### New Features (ENH)
- Add support for `IndirectObject.__contains__` (#3155) by @noamkush

### Bug Fixes (BUG)
- Fix detection of inline images followed by names or numbers (#3173) by @stefan6419846

### Robustness (ROB)
- Consider root objects without catalog type as fallback (#3175) by @stefan6419846
- Raise proper error on infinite loop when reading objects (#3169) by @stefan6419846

### Documentation (DOC)
- Mention memory consumption of text extraction (#3168) by @stefan6419846

### Developer Experience (DEV)
- Upgrade to ruff 0.10.0 (#3191) by @stefan6419846

[Full Changelog](5.3.1...5.4.0)

Mar 16, 2025
f20954f
zip
tar.gz
Notes

5.3.1

REL: 5.3.1

## What's new

### Bug Fixes (BUG)
- Use the correct name StandardEncoding for the predefined cmap (#3156) by @stefan6419846
- Handle inline images containing `EI ` sequences (#3152) by @stefan6419846
- Fix check box value which should be name object (#3124) by @stefan6419846
- Fix stream position on inline image fallback extraction (#3120) by @stefan6419846
- Fix object count for incremental writer (#3117) by @m32

### Robustness (ROB)
- Avoid index errors on empty lines in xref table (#3162) by @stefan6419846
- Improve handling of LZW decoder table overflow (#3159) by @stefan6419846
- Ignore non-numbers for width when building font width map (#3158) by @stefan6419846
- Avoid negative seek values when reading partially broken files (#3157) by @stefan6419846

### Documentation (DOC)
- Fixed PageObject.images example usage for replacing image (#3149) by @jutoth

[Full Changelog](5.3.0...5.3.1)

Mar 2, 2025
7143554
zip
tar.gz
Notes

5.3.0

REL: 5.3.0

## What's new

### New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846

### Bug Fixes (BUG)
- Handle annotations being None on merging (#3111) by @stefan6419846

### Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog

### Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma

### Developer Experience (DEV)
- Remove ignoring multiple Ruff rules by @j-t-1
- Remove unused mutmut configuration (#3092) by @stefan6419846

### Testing (TST)
- Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny

[Full Changelog](5.2.0...5.3.0)

Feb 9, 2025
1c3baab
zip
tar.gz
Notes

5.2.0

REL: 5.2.0

## What's new

### Deprecations (DEP)
- Deprecate with replacement CCITParameters (#3019) by @j-t-1
- Correct deprecation of interiour_color (#2947) by @j-t-1

### New Features (ENH)
- Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846
- Adding support for reading .metadata.keywords (#2939) by @Lucas-C

### Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin
- Ensure `add_metadata` can deal with `_info = None` (#3040) by @xmo-odoo
- Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846
- Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846
- Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846
- Fixed reference of value with `str.__new__` in TextStringObject (#2952) by @thomas-forte
- Handle indirect objects in font width calculations (#2967) by @nsw42
- Title sometimes is bytes and not str (#2930) by @reformy
- Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846
- Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn

### Robustness (ROB)
- Handle zero height fonts when extracting text (#3075) by @blushingpenguin
- Deal with content streams not containing streams (#3005) by @stefan6419846
- Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846
- Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846
- Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846
- Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846
- Skip destination page being None in PdfWriter (#2963) by @dxsooo
- Fix infinite loop case when reading null objects within an Array by @jakep-allenai
- Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai

### Documentation (DOC)
- Add note about default line colors (#3014) by @stefan6419846

### Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1
- Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1
- Move Windows CI to Python 3.13 (#3003) by @stefan6419846
- Move to Ubuntu 22.04 (#3004) by @stefan6419846

### Maintenance (MAINT)
- Fix formatting of warning message and include exception message (#3076) by @stefan6419846
- Narrow return type for `ContentStream.operations` (#2941) by @kmurphy4

### Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846
- Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846

### Code Style (STY)
- Add form feed to WHITESPACES (#3054) by @j-t-1
- Lots of small internal changes by @j-t-1

[Full Changelog](5.1.0...5.2.0)

Jan 26, 2025
049f71e
zip
tar.gz
Notes

5.1.0

REL: 5.1.0

## What's new

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846

### Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1

### Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma

### Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma

### Code Style (STY)
- Capitalize error messages (#2903) by @j-t-1
- Modify error messages in PdfWriter (#2902) by @j-t-1

[Full Changelog](5.0.1...5.1.0)

Oct 27, 2024
9f647e6
zip
tar.gz
Notes

5.0.1

REL: 5.0.1 (#2884)

## Version 5.0.1, 2024-09-29

### New Features (ENH)
- Add `full` parameter to PdfWriter constructor (#2865)

### Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)

### Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)

### Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)

### Maintenance (MAINT)
- Add tests to source distributions (#2874)
- Refactor _update_field_annotation (#2862)

[Full Changelog](5.0.0...5.0.1)

Sep 29, 2024
ab21802
zip
tar.gz
Notes

5.0.0

REL: 5.0.0 (#2851)

## Version 5.0.0, 2024-09-15

This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead).


### Deprecations (DEP)
- Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)

### New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)

### Bug Fixes (BUG)
- Fix sheared image (#2801)

### Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)

### Developer Experience (DEV)
- Fix coverage uploads (#2832)
- Test against Python 3.13 (#2776)


[Full Changelog](4.3.1...5.0.0)

Sep 17, 2024
637bc44
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

5.6.1

5.6.0

5.5.0

5.4.0

5.3.1

5.3.0

5.2.0

5.1.0

5.0.1

5.0.0

Tags: py-pdf/pypdf