Skip to content

Releases: aws-samples/amazon-textract-textractor

Version 1.7.12

23 May 17:15
Compare
Choose a tag to compare

What's Changed

  • Fix issue where tables linearized to plaintext that contained merge cells would duplicate the text over the entire table.

Full Changelog: v1.7.11...v1.7.12

Version 1.7.11

13 May 20:26
Compare
Choose a tag to compare

What's Changed

  • Add figure layout prefix and suffix by @Belval in #362
  • Add confidence scores at the DocumentEntity level by @Belval in #363

Full Changelog: v1.7.10...v1.7.11

Version 1.7.10

19 Apr 02:00
Compare
Choose a tag to compare

What's Changed

  • Use AWS_REGION and AWS_DEFAULT_REGION environment variables in Textractor when available
  • Fix missing figure layouts

Full Changelog: v1.7.9...v1.7.10

Version 1.7.9

22 Mar 14:54
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.7.8...v1.7.9

Version 1.7.8

21 Mar 12:21
Compare
Choose a tag to compare

What's Changed

  • Handle None Relationships when parsing LAYOUT_FIGURE

Full Changelog: v1.7.7...v1.7.8

Version 1.7.7

20 Mar 16:45
Compare
Choose a tag to compare

What's Changed

  • Handle None bounding box when parsing Queries by @Belval in #340

Full Changelog: v1.7.6...v1.7.7

Version 1.7.6

15 Mar 20:39
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.7.5...v1.7.6

Version 1.7.5

07 Mar 23:34
Compare
Choose a tag to compare

What's Changed

  • Make KeyValue.key an EntityList by @Belval in #320
  • Remove numpy from explicit dependencies by @Belval in #324
  • Hide key value layouts by @Belval in #325
  • Return query and query answer with get_text() by @Belval in #329
  • Convert image to RGB in EntityList for Jupyter compatibility by @Belval in #330
  • Support for Python 3.12 by @tb102122 in #311

Full Changelog: v1.7.4...v1.7.5

Version 1.7.4

26 Feb 15:46
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.7.3...v1.7.4

Version 1.7.3

26 Feb 12:39
c5120b0
Compare
Choose a tag to compare

What's Changed

  • Table linearization improvements by @Belval in #313

    • Add .get_text(), .to_html() and .to_markdown() functions to Linearizable which is now implemented by Document, Page, DocumentEntity and EntityList
    • Add HTMLLinearizationConfig and MarkdownLinearizationConfig as pre-configured TextLinearizationConfig
    • Add the follow parameters to TextLinearizationConfig
      • duplicate_text_in_merged_cells duplicates the text in merge cells to preserve row-level alignment
      • table_flatten_headers combines multi-row headers into a single row, duplicating the merged cells horizontally as needed
      • table_tabulate_remove_extra_hyphens removes extra hyphens '-' in markdown tables to reduce context length
      • max_number_of_consecutive_spaces defines the maximum number of contiguous whitespace characters, similar to max_number_of_consecutive_new_lines
  • Fixes:

    • Fix trailing whitespace in cell text
    • Fix table_column_separator being hardcoded as '\t'
    • Fix table_row_separator being hardcoded as '\n'
    • Resets BytesIO buffer to 0 position by @abest0 in #310

New Contributors

Full Changelog: v1.7.2...v1.7.3