Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion platform/chunking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Here are a few examples:

The following sections provide information about the available chunking strategies and their settings.

<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
<Note>You can change a workflow's preconfigured strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>

## Basic chunking strategy

Expand Down
2 changes: 1 addition & 1 deletion platform/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ on Hugging Face:

To generate embeddings, choose one of the following embedding providers and models in the **Providers** section of an **Embedder** node in a workflow:

<Note>You can change a workflow's predefined provider only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
<Note>You can change a workflow's preconfigured provider only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>

- **OpenAI**: Use [OpenAI](https://openai.com) to generate embeddings. Also, choose the model to use:

Expand Down
2 changes: 1 addition & 1 deletion platform/partitioning.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ For example, the **Fast** strategy can be about 100 times faster than leading im

To choose one of these strategies, select one of the **Partition Strategy** options in the **Partitioner** node of a workflow:

<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
<Note>You can change a workflow's preconfigured strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>

- **Fast**: This strategy is ideal for simple, text-based documents.
- **High Res**: This strategy is best for PDFs, images, and complex file types.
Expand Down
105 changes: 93 additions & 12 deletions platform/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,21 +50,102 @@ To create an automatic workflow:
<Note>You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.</Note>

7. Click **Continue**.
8. In the **Optimize for** section, select the option to choose one of these predefined workflow settings groups:
8. In the **Optimize for** section, select the option to choose one of these preconfigured workflow settings groups. Expand any or all
of the following options to learn more about these preconfigured settings:

- **Basic** Ideal for simple, text-only documents.
- **Advanced** Best for PDFs, images, and complex file types.
<AccordionGroup>
<Accordion title="Basic">
This option is ideal for simple, text-only documents.

<Note>
During **Advanced** processing, any detected text-based files are processed and billed at the **Basic** rate instead.
</Note>

- **Platinum** For your most challenging documents, including scanned and handwritten content.
The **Basic** option uses the following preconfigured workflow settings:

<Note>
During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead.
</Note>
- **Strategy**: Fast
- **Image Summarizer**: None
- **Table Summarizer**: None
- **Include Page Breaks**: No
- **Infer Table Structure**: No
- **Elements to Exclude**: None
- **Chunk**:

- **Chunker Type**: Chunk By Character
- **Chunk Options**:

- **Include Original Elements**: No
- **Max Characters**: 2048
- **New After N Characters**: 1500
- **Overlap**: 160
- **Overlap All**: No

- **Embed**:

- **Provider**: Azure OpenAI
- **Model**: text-embedding-3-large (3072 dimensions)

</Accordion>
<Accordion title="Advanced">
This option is best for PDFs, images, and complex file types.

<Note>
During **Advanced** processing, any detected text-based files are processed and billed at the **Basic** rate instead.
</Note>

The **Advanced** option uses the following preconfigured workflow settings:

- **Strategy**: High-Res
- **Image Summarizer**: GPT-4o
- **Table Summarizer**: Claude 3.5 Sonnet
- **Include Page Breaks**: No
- **Infer Table Structure**: No
- **Elements to Exclude**: None
- **Chunk**:

- **Chunker Type**: Chunk By Title
- **Chunk Options**:

- **Combine Text Under N Characters**: 0
- **Include Original Elements**: No
- **Max Characters**: 2048
- **New After N Characters**: 1500
- **Overlap**: 160
- **Overlap All**: No

- **Embed**:

- **Provider**: Azure OpenAI
- **Model**: text-embedding-3-large (3072 dimensions)

</Accordion>
<Accordion title="Platinum">
This option is for your most challenging documents, including scanned and handwritten content.

<Note>
During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead.
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead.
</Note>

The **Platinum** option uses the following preconfigured workflow settings:

- **Strategy**: VLM
- **Chunk**:

- **Chunker Type**: Chunk By Title
- **Chunk Options**:

- **Combine Text Under N Characters**: 0
- **Include Original Elements**: No
- **Max Characters**: 2048
= **Multipage Sections**: No
- **New After N Characters**: 1500
- **Overlap**: 160
- **Overlap All**: No

- **Embed**:

- **Provider**: Azure OpenAI
- **Model**: text-embedding-3-large (3072 dimensions)

</Accordion>
</AccordionGroup>

9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:

Expand Down
2 changes: 1 addition & 1 deletion snippets/quickstarts/platform.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ allowfullscreen
<Note>You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.</Note>

7. Click **Continue**.
8. In the **Optimize for** section, select the option to choose one of these predefined workflow settings groups:
8. In the **Optimize for** section, select the option to choose one of these preconfigured workflow settings groups:

- **Basic**: Ideal for simple, text-only documents.
- **Advanced**: Best for PDFs, images, and complex file types.
Expand Down