Releases · pdfminer/pdfminer.six

28 Dec 21:25

github-actions

20231228

bd252ef

20231228 Latest

Latest

Added

Output converter for the hOCR format (#651)
Font name aliases for Arial, Courier New and Times New Roman (#790)
Documentation on why special characters can sometimes not be extracted (#829)
Storing Bezier path and dashing style of line in LTCurve (#801)

Fixed

Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools (#921)
flake8 failures (#921)
ValueError when bmp images with 1 bit channel are decoded (#773)
ValueError when trying to decrypt empty metadata values (#766)
Sphinx errors during building of documentation (#760)
TypeError when getting default width of font (#720)
Installing typing-extensions on Python 3.6 and 3.7 (#775)
TypeError in cmapdb.py when parsing null characters (#768)
Color "convenience operators" now (per spec) also set color space (#794)
ValueError when extracting images, due to breaking changes in Pillow (#827)
Small typo's and issues in the documentation (#828)
Ignore non-Unicode cmaps in TrueType fonts (#806)

Changed

Using non-hardcoded version string and setuptools-git-versioning to enable installation from source and building on Python 3.12 (#922)

Deprecated

Usage of if __name__ == "__main__" where it was only intended for testing purposes (#756)

Removed

Support for Python 3.6 and 3.7 because they are end-of-life (#923)

Assets 4

05 Nov 16:33

github-actions

20221105

ebf7bcd

20221105

Added

Output converter for the hOCR format (#651)
Font name aliases for Arial, Courier New and Times New Roman (#790)
Documentation on why special characters can sometimes not be extracted (#829)

Fixed

ValueError when bmp images with 1 bit channel are decoded (#773)
ValueError when trying to decrypt empty metadata values (#766)
Sphinx errors during building of documentation (#760)
TypeError when getting default width of font (#720)
Installing typing-extensions on Python 3.6 and 3.7 (#775)
TypeError in cmapdb.py when parsing null characters (#768)
Color "convenience operators" now (per spec) also set color space (#794)
ValueError when extracting images, due to breaking changes in Pillow (#827)
Small typo's and issues in the documentation (#828)

Deprecated

Usage of if __name__ == "__main__" where it was only intended for testing purposes (#756)

Assets 4

24 May 17:44

github-actions

20220524

0b09d5f

20220524

Fixed

Ignoring (invalid) path constructors that do not begin with m (#749)

Changed

Removed upper version bounds (#755)

Assets 4

06 May 20:04

github-actions

20220506

e19aea9

20220506

Fixed

IndexError when handling invalid bfrange code map in
CMap (#731)
TypeError in lzw.py when self.table is not set (#732)
TypeError in encodingdb.py when name of unicode is not
str (#733)
TypeError in HTMLConverter when using a bytes fontname (#734)

Added

Exporting images without any specific encoding (#737)

Changed

Using charset-normalizer instead of chardet for less restrictive license (#744)

Assets 4

19 Mar 20:13

github-actions

20220319

ae7f315

20220319

Added

Export type annotations from pypi package per PEP561 (#679)
Support for identity cmap's (#626)
Add support for PDF page labels (#680)
Installation of Pillow as an optional extra dependency (#714)

Fixed

Hande decompression error due to CRC checksum error (#637)
Regression (since 20191107) in LTLayoutContainer.group_textboxes that returned some text lines out of order (#659)
Add handling of JPXDecode filter to enable extraction of images for some pdfs (#645)
Fix extraction of jbig2 files, which was producing invalid files (#652)
Crash in pdf2txt.py --boxes-flow=disabled (#682)
Only use xref fallback if PDFNoValidXRef is raised and fallback is True (#684)
Ignore empty characters when analyzing layout (#499)

Changed

Replace warnings.warn with logging.Logger.warning in line with recommended use (#673)
Switched from nose to pytest, from tox to nox and from Travis CI to GitHub Actions (#704)

Removed

Unnecessary return statements without argument at the end of functions (#707)

Assets 4

19 Mar 16:49

pietermarsman

20211012

da5b968

20211012

Added

Add support for PDF 2.0 (ISO 32000-2) AES-256 encryption (#614)
Support for Paeth PNG filter compression (predictor value = 4) (#537)
Type annotations (#661)

Fixed

KeyError when 'Encrypt' but not 'ID' present in trailer (#594)
Fix issue of ValueError and KeyError rasied in PDFdocument and PDFparser (#573)
Fix issue of TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' (#529)
Fix PermissionError when creating temporary filepaths on windows when running tests (#484)
Fix AttributeError when dumping a TOC with bytes destinations (#600)
Fix issue of some Chinese characters can not be extracted correctly (#593)
Detecting trailer correctly when surrounded with needless whitespace (#535)
Fix .paint_path logic for handling single line segments and extracting point-on-curve positions of Beziér path commands (#530)
Raising UnboundLocalError when a bad --output-type is used (#610)
TypeError when using TagExtractor with non-string or non-bytes tag values (#610)
Using io.TextIOBase as the file to write to (#616)
Parsing \r\n after the escape character in a literal string (#616)

Removed

Support for Python 3.4 and 3.5 (#522)
Unused dependency on sortedcontainers package (#525)
Support for non-standard output streams that are not binary (#523)
Dependency on typing-extensions introduced by #661 (#677)

Assets 4

18 Oct 11:09

pietermarsman

20201018

c8cceb7

20201018

Deprecated

Support for Python 3.4 and 3.5 (#503)

Added

Option to disable boxes flow layout analysis when using pdf2txt (#479)
Support for pathlib.PurePath in open_filename (#491)

Fixed

Pass caching parameter to PDFResourceManager in high_level functions (#475)
Fix .paint_path logic for handling non-rect quadrilaterals and decomposing complex paths (#473)
Fix out-of-bound access on some PDFs (#483)

Removed

Remove unused rijndael encryption implementation (#465)

Assets 4

30 Jul 06:57

pietermarsman

20200726

0b44f77

20200726

Fixed

Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change (#461)
Always try to get CMap, not only for identity encodings (#438)

Assets 4

20 Jul 20:16

pietermarsman

20200720

3cebf5e

20200720

Added

Support for painting multiple rectangles at once (#371)

Fixed

Validate image object in do_EI is a PDFStream (#451)

Changed

Hiding fallback xref by default from dumppdf.py output (#431)
Raise a warning instead of an error when extracting text from a non-extractable PDF (#350)
Switched from pycryptodome to cryptography package for AES decryption (#456)

Assets 4

17 May 15:56

pietermarsman

20200517

33b60df

20200517

Added

Python3 shebang line to script in tools (408

Fixed

Fix ordering of textlines within a textbox when boxes_flow=None (#411)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Fixed

Changed

Deprecated

Removed

Added

Fixed

Deprecated

Fixed

Changed

Fixed

Added

Changed

Added

Fixed

Changed

Removed

Added

Fixed

Removed

Deprecated

Added

Fixed

Removed

Fixed

Added

Fixed

Changed

Added

Fixed

Releases: pdfminer/pdfminer.six

20231228

Added

Fixed

Changed

Deprecated

Removed

20221105

Added

Fixed

Deprecated

20220524

Fixed

Changed

20220506

Fixed

Added

Changed

20220319

Added

Fixed

Changed

Removed

20211012

Added

Fixed

Removed

20201018

Deprecated

Added

Fixed

Removed

20200726

Fixed

20200720

Added

Fixed

Changed

20200517

Added

Fixed