Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions platform/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,4 @@ To generate embeddings, choose one of the following embedding providers and mode
- **text-embedding-3-large**, with 3072 dimensions.
- **Ada 002 (Text)**, with 1536 dimensions.

[Learn more](https://platform.openai.com/docs/guides/embeddings).

- **Vertex AI**: Use [Vertex AI](https://cloud.google.com/vertex-ai) to generate embeddings by using the [textembedding-gecko@001](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) model, with 768 dimensions.
[Learn more](https://platform.openai.com/docs/guides/embeddings).
16 changes: 13 additions & 3 deletions platform/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,19 @@ To get your data RAG-ready, the Unstructured Platform moves it through the follo
<Step title="Route">
Routing determines which strategy Unstructured Platform uses to transforming your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides these [partitioning](/platform/partitioning) strategies for document transformation:

- **Fast** is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format.
- **Hi Res** is best for PDFs and tables and where accurate classification of document elements is critical.
- If you're unsure which strategy to use, choose **Auto**, and the Unstructured Platform will handle the decision for you.
- **Basic** / **Fast** is ideal for simple, text-only documents.
- **Advanced** / **High Res** is best for PDFs, images, and complex file types.

<Note>
During **Advanced** / **High Res** processing, any detected text-based files are processed and billed at the **Basic** / **Fast** rate instead.
</Note>

- **Platinum** / **VLM** is for challenging documents, including scanned and handwritten content.

<Note>
During **Platinum** / **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** / **High Res** or **Basic** / **Fast** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** / **Fast** rate instead. The other files are processed and billed at the **Advanced** / **High Res** rate instead.
</Note>

</Step>
<Step title="Transform">
Expand Down
19 changes: 12 additions & 7 deletions platform/partitioning.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,16 @@ To choose one of these strategies, select one of the **Partition Strategy** opti
<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>

- **Fast**: This strategy is ideal for simple, text-based documents.
- **Hi-Res**: This strategy is best for PDFs, images, and complex file types.
- **VLM**: For your most challenging documents, including scanned and handwritten content, use this strategy, which leverages vision
language models (VLMs). During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged
at the **Hi-Res** rate instead.
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
**Fast** rate for that file.
- **High Res**: This strategy is best for PDFs, images, and complex file types.

<Note>
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
</Note>

- **VLM**: For your most challenging documents, including scanned and handwritten content.

<Note>
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
</Note>

53 changes: 35 additions & 18 deletions platform/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,17 @@ To create an automatic workflow:

- **Basic** Ideal for simple, text-only documents.
- **Advanced** Best for PDFs, images, and complex file types.
- **Platinum** For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead.

<Note>
During **Advanced** processing, any detected text-based files are processed and billed at the **Basic** rate instead.
</Note>

- **Platinum** For your most challenging documents, including scanned and handwritten content.

<Note>
During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead.
</Note>

9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:

Expand Down Expand Up @@ -109,12 +118,18 @@ There are two ways to create a custom workflow:
9. In the **Strategy** area, choose one of the following:

- **Fast**: Ideal for simple, text-only documents.
- **Hi-Res**: Best for PDFs, images, and complex file types.
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
**Fast** rate for that file.
- **High Res**: Best for PDFs, images, and complex file types.

<Note>
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
</Note>

- **VLM**: For your most challenging documents, including scanned and handwritten content.

<Note>
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
</Note>

[Learn more](/platform/partitioning).

Expand Down Expand Up @@ -189,8 +204,6 @@ There are two ways to create a custom workflow:

[Learn more](https://platform.openai.com/docs/guides/embeddings).

- **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).

Learn more:

- [Embedding overview](/platform/embedding)
Expand Down Expand Up @@ -266,12 +279,18 @@ There are two ways to create a custom workflow:
For **Partition Strategy**, choose one of the following:

- **Fast**: Ideal for simple, text-only documents.
- **Hi-Res**: Best for PDFs, images, and complex file types.
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
**Fast** rate for that file.
- **High Res**: Best for PDFs, images, and complex file types.

<Note>
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
</Note>

- **VLM**: For your most challenging documents, including scanned and handwritten content.

<Note>
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
</Note>

[Learn more](/platform/partitioning).
</Accordion>
Expand Down Expand Up @@ -338,8 +357,6 @@ There are two ways to create a custom workflow:

[Learn more](https://platform.openai.com/docs/guides/embeddings).

- **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).

Learn more:

- [Embedding overview](/platform/embedding)
Expand Down