From 6393f79de40ad438df0dc459647f05cd621c428a Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Mon, 30 Sep 2024 08:40:21 -0700 Subject: [PATCH 1/2] Open source: remove attachment_partitioner from partition_msg and partition_email --- open-source/core-functionality/partitioning.mdx | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/open-source/core-functionality/partitioning.mdx b/open-source/core-functionality/partitioning.mdx index 032fbf88..f112d97e 100644 --- a/open-source/core-functionality/partitioning.mdx +++ b/open-source/core-functionality/partitioning.mdx @@ -229,15 +229,14 @@ elements = partition_email(text=text, include_headers=True) `partition_email` includes a `max_partition` parameter that indicates the maximum character length for a document element. This parameter only applies if `"text/plain"` is selected as the `content_source`. The default value is `1500`, which roughly corresponds to the average character length for a paragraph. You can disable `max_partition` by setting it to `None`. -You can optionally partition e-mail attachments by setting `process_attachments=True`. If you set `process_attachments=True`, you’ll also need to pass in a partitioning function to `attachment_partitioner`. The following is an example of what the workflow looks like: +You can optionally partition e-mail attachments by setting `process_attachments=True`. The following is an example of what the workflow looks like: ```python -from unstructured.partition.auto import partition from unstructured.partition.email import partition_email filename = "example-docs/eml/fake-email-attachment.eml" elements = partition_email( - filename=filename, process_attachments=True, attachment_partitioner=partition + filename=filename, process_attachments=True ) ``` @@ -377,15 +376,14 @@ elements = partition_msg(filename="example-docs/fake-email.msg") `partition_msg` includes a `max_partition` parameter that indicates the maximum character length for a document element. This parameter only applies if `"text/plain"` is selected as the `content_source`. The default value is `1500`, which roughly corresponds to the average character length for a paragraph. You can disable `max_partition` by setting it to `None`. -You can optionally partition e-mail attachments by setting `process_attachments=True`. If you set `process_attachments=True`, you’ll also need to pass in a partitioning function to `attachment_partitioner`. The following is an example of what the workflow looks like: +You can optionally partition e-mail attachments by setting `process_attachments=True`. The following is an example of what the workflow looks like: ```python -from unstructured.partition.auto import partition from unstructured.partition.msg import partition_msg filename = "example-docs/fake-email-attachment.msg" elements = partition_msg( - filename=filename, process_attachments=True, attachment_partitioner=partition + filename=filename, process_attachments=True ) ``` From 71842610aad835f2a09dcbb6859f5e1225cd190a Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Tue, 1 Oct 2024 08:45:25 -0700 Subject: [PATCH 2/2] Incorporated feedback: removed whitespace --- open-source/core-functionality/partitioning.mdx | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/open-source/core-functionality/partitioning.mdx b/open-source/core-functionality/partitioning.mdx index f112d97e..99acce1d 100644 --- a/open-source/core-functionality/partitioning.mdx +++ b/open-source/core-functionality/partitioning.mdx @@ -235,10 +235,7 @@ You can optionally partition e-mail attachments by setting `process_attachments= from unstructured.partition.email import partition_email filename = "example-docs/eml/fake-email-attachment.eml" -elements = partition_email( - filename=filename, process_attachments=True -) - +elements = partition_email(filename=filename, process_attachments=True) ``` @@ -382,10 +379,7 @@ You can optionally partition e-mail attachments by setting `process_attachments= from unstructured.partition.msg import partition_msg filename = "example-docs/fake-email-attachment.msg" -elements = partition_msg( - filename=filename, process_attachments=True -) - +elements = partition_msg(filename=filename, process_attachments=True) ```