Skip to content

Commit

Permalink
Unstructured v0.12.6 release (#2626)
Browse files Browse the repository at this point in the history
## 0.12.6

### Enhancements

* **Improve ability to capture embedded links in `partition_pdf()` for
`fast` strategy** Previously, a threshold value that affects the capture
of embedded links was set to a fixed value by default. This allows users
to specify the threshold value for better capturing.
* **Refactor `add_chunking_strategy` decorator to dispatch by name.**
Add `chunk()` function to be used by the `add_chunking_strategy`
decorator to dispatch chunking call based on a chunking-strategy name
(that can be dynamic at runtime). This decouples chunking dispatch from
only those chunkers known at "compile" time and enables runtime
registration of custom chunkers.

### Features
* **Added Unstructured Platform Documentation** The Unstructured
Platform is currently in beta. The documentation provides how-to guides
for setting up workflow automation, job scheduling, and configuring
source and destination connectors.

### Fixes

* **Partitioning raises on file-like object with `.name` not a local
file path.** When partitioning a file using the `file=` argument, and
`file` is a file-like object (e.g. io.BytesIO) having a `.name`
attribute, and the value of `file.name` is not a valid path to a file
present on the local filesystem, `FileNotFoundError` is raised. This
prevents use of the `file.name` attribute for downstream purposes to,
for example, describe the source of a document retrieved from a network
location via HTTP.
* **Fix SharePoint dates with inconsistent formatting** Adds logic to
conditionally support dates returned by office365 that may vary in date
formatting or may be a datetime rather than a string.
* **Include warnings** about the potential risk of installing a version
of `pandoc` which does not support RTF files + instructions that will
help resolve that issue.
* **Incorporate the `install-pandoc` Makefile recipe** into relevant
stages of CI workflow, ensuring it is a version that supports RTF input
files.
* **Fix Google Drive source key** Allow passing string for source
connector key.
* **Fix table structure evaluations calculations** Replaced special
value `-1.0` with `np.nan` and corrected rows filtering of files metrics
basing on that.
* **Fix Sharepoint-with-permissions test** Ignore permissions metadata,
update test.
* **Fix table structure evaluations for edge case** Fixes the issue when
the prediction does not contain any table - no longer errors in such
case.
  • Loading branch information
ron-unstructured committed Mar 8, 2024
1 parent 911f998 commit e5fab21
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.12.6-dev9
## 0.12.6

### Enhancements

Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.12.6-dev9" # pragma: no cover
__version__ = "0.12.6" # pragma: no cover

0 comments on commit e5fab21

Please sign in to comment.