From 94ab7029cd0e31a452a3b59aeb812231223ceeb1 Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Mon, 2 Dec 2024 09:23:52 -0800 Subject: [PATCH] Platinum/VLM strategy: note about PDF 200+ page files --- platform/partitioning.mdx | 3 +++ platform/workflows.mdx | 9 +++++++++ snippets/quickstarts/platform.mdx | 5 +++++ 3 files changed, 17 insertions(+) diff --git a/platform/partitioning.mdx b/platform/partitioning.mdx index 2483d6c6..37d92b39 100644 --- a/platform/partitioning.mdx +++ b/platform/partitioning.mdx @@ -31,5 +31,8 @@ To choose one of these strategies, select one of the **Partition Strategy** opti During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead. + + When you use the **VLM** strategy with embeddings for PDF files of 200 or more pages, you might notice some errors when + these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images. diff --git a/platform/workflows.mdx b/platform/workflows.mdx index b07ea01f..b4b2ea01 100644 --- a/platform/workflows.mdx +++ b/platform/workflows.mdx @@ -64,6 +64,9 @@ To create an automatic workflow: During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead. Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead. + + When you use the **Platinum** strategy for PDF files of 200 or more pages, you might notice some errors when + these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images. 9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors: @@ -129,6 +132,9 @@ There are two ways to create a custom workflow: During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead. + + When you use the **VLM** strategy with embeddings for PDF files of 200 or more pages, you might notice some errors when + these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images. [Learn more](/platform/partitioning). @@ -317,6 +323,9 @@ There are two ways to create a custom workflow: During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead. + + When you use the **VLM** strategy with embeddings for PDF files of 200 or more pages, you might notice some errors when + these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images. [Learn more](/platform/partitioning). diff --git a/snippets/quickstarts/platform.mdx b/snippets/quickstarts/platform.mdx index c4c6e6ce..b1e3bedd 100644 --- a/snippets/quickstarts/platform.mdx +++ b/snippets/quickstarts/platform.mdx @@ -99,6 +99,11 @@ allowfullscreen - **Platinum**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs). During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead. + + When you use the **Platinum** strategy for PDF files of 200 or more pages, you might notice some errors when + these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images. + + 9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors: - Checking this box reprocesses all documents in the source location on every workflow run.