Releases: aws-samples/amazon-textract-textractor
Releases · aws-samples/amazon-textract-textractor
Version 1.7.12
What's Changed
- Fix issue where tables linearized to plaintext that contained merge cells would duplicate the text over the entire table.
Full Changelog: v1.7.11...v1.7.12
Version 1.7.11
What's Changed
- Add figure layout prefix and suffix by @Belval in #362
- Add confidence scores at the DocumentEntity level by @Belval in #363
Full Changelog: v1.7.10...v1.7.11
Version 1.7.10
What's Changed
- Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
- Fix missing figure layouts
Full Changelog: v1.7.9...v1.7.10
Version 1.7.9
Version 1.7.8
Version 1.7.7
What's Changed
Full Changelog: v1.7.6...v1.7.7
Version 1.7.6
Version 1.7.5
What's Changed
- Make KeyValue.key an EntityList by @Belval in #320
- Remove numpy from explicit dependencies by @Belval in #324
- Hide key value layouts by @Belval in #325
- Return query and query answer with get_text() by @Belval in #329
- Convert image to RGB in EntityList for Jupyter compatibility by @Belval in #330
- Support for Python 3.12 by @tb102122 in #311
Full Changelog: v1.7.4...v1.7.5
Version 1.7.4
What's Changed
- Fix table title .get_text() by @Belval in #314
- Fix .to_pandas() raising an exception by @Belval in #315
Full Changelog: v1.7.3...v1.7.4
Version 1.7.3
What's Changed
-
Table linearization improvements by @Belval in #313
- Add
.get_text()
,.to_html()
and.to_markdown()
functions toLinearizable
which is now implemented byDocument
,Page
,DocumentEntity
andEntityList
- Add
HTMLLinearizationConfig
andMarkdownLinearizationConfig
as pre-configuredTextLinearizationConfig
- Add the follow parameters to
TextLinearizationConfig
duplicate_text_in_merged_cells
duplicates the text in merge cells to preserve row-level alignmenttable_flatten_headers
combines multi-row headers into a single row, duplicating the merged cells horizontally as neededtable_tabulate_remove_extra_hyphens
removes extra hyphens '-' in markdown tables to reduce context lengthmax_number_of_consecutive_spaces
defines the maximum number of contiguous whitespace characters, similar tomax_number_of_consecutive_new_lines
- Add
-
Fixes:
New Contributors
Full Changelog: v1.7.2...v1.7.3