diff --git a/docs.json b/docs.json
index 9f5b6286..f2f46d0f 100644
--- a/docs.json
+++ b/docs.json
@@ -30,7 +30,8 @@
{
"group": "Getting started with the UI",
"pages": [
- "ui/quickstart"
+ "ui/quickstart",
+ "ui/walkthrough"
]
},
{
diff --git a/img/ui/walkthrough/AddChunker.png b/img/ui/walkthrough/AddChunker.png
new file mode 100644
index 00000000..aa42f16c
Binary files /dev/null and b/img/ui/walkthrough/AddChunker.png differ
diff --git a/img/ui/walkthrough/AddEmbedder.png b/img/ui/walkthrough/AddEmbedder.png
new file mode 100644
index 00000000..5e3c3364
Binary files /dev/null and b/img/ui/walkthrough/AddEmbedder.png differ
diff --git a/img/ui/walkthrough/AddEnrichment.png b/img/ui/walkthrough/AddEnrichment.png
new file mode 100644
index 00000000..d2a64cd0
Binary files /dev/null and b/img/ui/walkthrough/AddEnrichment.png differ
diff --git a/img/ui/walkthrough/BuildItMyself.png b/img/ui/walkthrough/BuildItMyself.png
new file mode 100644
index 00000000..2b1acc63
Binary files /dev/null and b/img/ui/walkthrough/BuildItMyself.png differ
diff --git a/img/ui/walkthrough/ChunkByCharacter.png b/img/ui/walkthrough/ChunkByCharacter.png
new file mode 100644
index 00000000..4f229c75
Binary files /dev/null and b/img/ui/walkthrough/ChunkByCharacter.png differ
diff --git a/img/ui/walkthrough/DropFileToTest.png b/img/ui/walkthrough/DropFileToTest.png
new file mode 100644
index 00000000..6d703080
Binary files /dev/null and b/img/ui/walkthrough/DropFileToTest.png differ
diff --git a/img/ui/walkthrough/EnrichedWorkflow.png b/img/ui/walkthrough/EnrichedWorkflow.png
new file mode 100644
index 00000000..886eabd6
Binary files /dev/null and b/img/ui/walkthrough/EnrichedWorkflow.png differ
diff --git a/img/ui/walkthrough/GoToEnrichmentNode.png b/img/ui/walkthrough/GoToEnrichmentNode.png
new file mode 100644
index 00000000..d91e2fbb
Binary files /dev/null and b/img/ui/walkthrough/GoToEnrichmentNode.png differ
diff --git a/img/ui/walkthrough/HighResPartitioner.png b/img/ui/walkthrough/HighResPartitioner.png
new file mode 100644
index 00000000..4149ad48
Binary files /dev/null and b/img/ui/walkthrough/HighResPartitioner.png differ
diff --git a/img/ui/walkthrough/NewWorkflow.png b/img/ui/walkthrough/NewWorkflow.png
new file mode 100644
index 00000000..7aee84e7
Binary files /dev/null and b/img/ui/walkthrough/NewWorkflow.png differ
diff --git a/img/ui/walkthrough/SearchJSON.png b/img/ui/walkthrough/SearchJSON.png
new file mode 100644
index 00000000..9149c3bd
Binary files /dev/null and b/img/ui/walkthrough/SearchJSON.png differ
diff --git a/img/ui/walkthrough/TestLocalFile.png b/img/ui/walkthrough/TestLocalFile.png
new file mode 100644
index 00000000..22028e3b
Binary files /dev/null and b/img/ui/walkthrough/TestLocalFile.png differ
diff --git a/img/ui/walkthrough/TestOutputResults.png b/img/ui/walkthrough/TestOutputResults.png
new file mode 100644
index 00000000..2a6732bc
Binary files /dev/null and b/img/ui/walkthrough/TestOutputResults.png differ
diff --git a/img/ui/walkthrough/WorkflowDesigner.png b/img/ui/walkthrough/WorkflowDesigner.png
new file mode 100644
index 00000000..a8d027f2
Binary files /dev/null and b/img/ui/walkthrough/WorkflowDesigner.png differ
diff --git a/img/ui/walkthrough/WorkflowsSidebar.png b/img/ui/walkthrough/WorkflowsSidebar.png
new file mode 100644
index 00000000..2115437d
Binary files /dev/null and b/img/ui/walkthrough/WorkflowsSidebar.png differ
diff --git a/ui/quickstart.mdx b/ui/quickstart.mdx
index c1ac0a85..a27d07d9 100644
--- a/ui/quickstart.mdx
+++ b/ui/quickstart.mdx
@@ -27,7 +27,9 @@ import GetStartedSimpleUIOnly from '/snippets/general-shared-text/get-started-si
allowfullscreen
>
-4. To move beyond local file processing, you can try the following [remote quickstart](#remote-quickstart), or skip over to the [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) instead.
+4. To keep enhancing your workflow, skip ahead to the [walkthrough](/ui/walkthrough).
+
+5. To move beyond local file processing, you can try the following [remote quickstart](#remote-quickstart), or skip over to the [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) instead.
You can also learn about Unstructured [source connectors](/ui/sources/overview), [destination connectors](/ui/destinations/overview), [workflows](/ui/workflows), [jobs](/ui/jobs), and [managing your account](/ui/account/overview).
diff --git a/ui/walkthrough.mdx b/ui/walkthrough.mdx
new file mode 100644
index 00000000..a9d7b368
--- /dev/null
+++ b/ui/walkthrough.mdx
@@ -0,0 +1,289 @@
+---
+title: Unstructured UI walkthrough
+sidebarTitle: Walkthrough
+---
+
+This walkthrough provides you with deep, hands-on experience with the [Unstructured user interface (UI)](/ui/overview). As you follow along, you will learn how to use many of Unstructured's
+features for [partitioning](/ui/partitioning), [enriching](/ui/enriching/overview), [chunking](/ui/chunking), and [embedding](/ui/embedding). These features are optimized for turning
+your source documents and data into information that is well-tuned for retrieval-augmented generation (RAG), agentic AI, and model fine-tuning.
+
+This walkthrough uses two sample files to demonstrate how Unstructured identifies and processes content such as image, graphs, complex tables, non-English characters, and handwriting.
+These files, which are available for you to download to your local machine, include:
+
+- Wang, Z., Liu, X., & Zhang, M. (2022, November 23).
+ _Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling_.
+ arXiv.org. https://arxiv.org/pdf/2211.12781. This 12-page PDF file features English and non-English characters, images, graphs, and complex tables.
+ Throughout this walkthrough, this file's title is shortened to "Chinese Characters" for brevity.
+- United States Central Security Service. (2012, January 27). _National Cryptologic Museum Opens New Exhibit on Dr. John Nash_.
+ United States National Security Agency. https://courses.csail.mit.edu/6.857/2012/files/H03-Cryptosystem-proposed-by-Nash.pdf.
+ This PDF file features English handwriting and scanned images of documents.
+ Throughout this walkthrough, this file's title is shortened to "Nash letters" for brevity.
+
+If you are not able to complete any of the following steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
+
+## Step 1: Sign up and sign in to Unstructured
+
+import GetStartedSimpleUIOnly from '/snippets/general-shared-text/get-started-simple-ui-only.mdx';
+
+
+
+## Step 2: Create a custom workflow
+
+In this step, you create a custom [workflow](/ui/workflows) in your Unstructured account. Workflows are
+defined sequences of processes that automate the flow of data from your source documents and data into Unstructured for processing.
+Unstructured then sends its processed data over into your destination file storage locations, databases, and vector stores.
+
+1. After you are signed in to your Unstructured account, on the sidebar, click **Workflows**.
+
+ 
+
+2. Click **New Workflow**.
+
+ 
+
+3. With **Build it Myself** already selected, click **Continue**.
+
+ 
+
+4. The workflow designer appears.
+
+ 
+
+## Step 3: Experiment with partitioning
+
+In this step, you use your new workflow to [partition](/ui/partitioning) the sample PDF files that you downloaded earlier.
+Partitioning is the process where Unstructured identifies and extracts content from your source documents and then
+presents this content as a series of contextually-rich [document elements and metadata](/ui/document-elements). This step
+shows how well the **High Res** partitioning strategy identifies and extracts content, and how well **VLM** handles
+more complex content such as complex tables, multilanguage characters, and handwriting.
+
+1. With the workflow designer active from the previous step, at the bottom of the **Source** node, click **Drop file to test**.
+
+ 
+
+2. Browse to and select the "Chinese Characters" PDF file that you downloaded earlier.
+3. Click the **Partitioner** node and then, in the node's settings pane's **Details** tab, select **High Res**.
+
+ 
+
+4. Immediately above the **Source** node, click **Test**.
+
+ 
+
+5. The PDF file appears in a pane on the left side of the screen, and Unstructured's output appears in a **Test output** pane on the right side of the screen.
+
+ 
+
+6. Some interesting portions of the output include the following, which you can get to be clicking **Search JSON** above the output:
+
+ 
+
+ - The Chinese characters on page 3. Search for the text `In StrokeNet, the corresponding`. Notice that the Chinese characters are not interpreted correctly.
+ - The formula on page 5. Search for the text `L= LL + Ln`. Notice that the formula's output diverges quite a bit from the original content.
+ - Table 2 on page 6. Search for the text `Model Parameters Performance (BLEU)`. Notice that the `text_as_html` output diverges slightly from the original content.
+ - Figure 4 on page 8. Search for the text `50 45 40 35`. Notice that the output is not that informative about the original image's content.
+
+ These issues will be addressed later in this step when you change the partitioning strategy to **VLM**, and later in **Step 4**when you add enrichments alongside **High Res** partitioning.
+
+7. Now try changing the partitioning strategy to **VLM** and see how the output changes. To do this:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Partitioner** node and then, in the node's settings pane's **Details** tab, select **VLM**.
+ c. Under **Select VLM Model**, under **Anthropic**, select **Claude 3.5 Sonnet**.
+ d. Click **Test**.
+
+8. Notice how the output changes, now that you are using the **VLM** strategy:
+
+ - The Chinese characters on page 3. Search for the text `In StrokeNet, the corresponding`. Notice that the Chinese characters are intepreted correctly.
+ - The formula on page 5. Search for the text `match class`. Notice that the formula's output is closer to the original content.
+ - Table 2 on page 6. Search for the text `Model Parameters Performance (BLEU)`. Notice that the `text_as_html` output is closer to the original content.
+ - Figure 4 on page 8. Search for the text `Graph showing BLEU scores comparison`. Notice the informative description about the figure.
+
+9. Now try looking at the "Nash letters" PDF file's output. To do this:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Partitioner** node and then, in the node's settings pane's **Details** tab, select **High Res**.
+ c. At the bottom of the **Source** node, click the existing PDF's file name.
+ d. Browse to and select the "Nash letters" file that you downloaded earlier.
+ e. Click **Test**.
+
+10. Some interesting portions of the output include the following:
+
+ - The handwriting on page 3. Search for the text `Deo Majr`. Notice that the handwriting is not recognized correctly.
+ - The mimeograph on page 11. Search for the text `Technicans at this Agency` (note the typo `Technicans`).
+ Notice that the mimoegraph contains `18 January 1955`, but the output contains only `January 1955`.
+ - The handwritten diagrams on page 13. Search for the text `"page_number": 13`. Notice that no output is generated for the diagrams.
+
+11. Now try changing the partitioning strategy to **VLM** and see how the output changes. To do this:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Partitioner** node and then, in the node's settings pane's **Details** tab, select **VLM**.
+ c. Under **Select VLM Model**, under **Anthropic**, select **Claude 3.5 Sonnet**.
+ d. Click **Test**.
+
+12. Notice how the output changes, now that you are using the **VLM** strategy:
+
+ - The handwriting on page 3. Search for the text `Dear Major Grosjean`. Notice how well the handwriting is recognized correctly.
+ - The mimeograph on page 11. Search for the text `Technicians at this Agency` (note the corrected typo `Technicians`).
+ Notice that the mimoegraph contains `18 January 1955`, and the output now also contains `18 January 1955`.
+ - The handwritten diagrams on page 13. Search for the text `graph LR`. Notice that [Mermaid](https://docs.mermaidchart.com/mermaid-oss/intro/syntax-reference.html) representations of the
+ handwritten diagrams are output.
+
+13. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
+ the workflow designer for the next step.
+
+## Step 4: Experiment with enriching
+
+In this step, you add several [enrichments](/ui/enriching/overview) to your workflow, such as generating summary descriptions of detected images and tables,
+HTML representations of detected tables, and detected entities (such as people and organizations) and the inferred relationships among these entities.
+
+1. With the workflow designer active from the previous step, change the **Partitioner** node to use **High Res**.
+2. Between the **Partitioner** and **Destination** nodes, click the add (**+**) icon, and then click **Enrich > Enrichment**.
+
+ 
+
+3. In the node's settings pane's **Details** tab, select **Image** under **Input Type**, and then click **OpenAI (GPT-4o)** under **Model**.
+4. Repeat this process to add three more nodes between the **Partitioner** and **Destination** nodes. To do this, click
+ the add (**+**) icon, and then click **Enrich > Enrichment**, as follows:
+
+ a. Add a **Table** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) and **Table Description** (under **Task**) selected.
+ b. Add another **Table** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) and **Table to HTML** (under **Task**) selected.
+ c. Add a **Text** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) selected.
+
+ The workflow designer should now look like this:
+
+ 
+
+5. Change the **Source** node to use the "Chinese Characters" PDF file, and then click **Test**.
+6. In the **Test output** pane, make sure that **Enrichment (5 of 5)** is showing. If not, click the right arrow (**>**) until **Enrichment (5 of 5)** appears, which will show the output from the last node in the workflow.
+
+ 
+
+7. Some interesting portions of the output include the following:
+
+ - The figures on pages 3, 7, and 8. Search for the seven instances of the text `"type": "Image"`. Notice the summary description for each image.
+ - The tables on pages 6, 7, 8, 9, and 12. Search for the seven instances of the text `"type": "Table"`. Notice the summary description for each of these tables.
+ Also notice the `text_as_html` field for each of these tables.
+ - The identified entities and inferred relationships among them. Search for the text `Zhijun Wang`. Of the eight instances of this name, notice
+ the author's identification as a `PERSON` three times, the author's `published` relationship twice, and the author's `affiliated_with` relationship twice.
+
+8. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
+ the workflow designer for the next step.
+
+## Step 5: Experiment with chunking
+
+In this step, you apply [chunking](/ui/chunking) to your workflow. Chunking is the process where Unstructured rearranges
+the resulting document elements into manageable "chunks" to stay within the limits of an embedding model and to improve retrieval precision. For the
+best chunking strategy to apply to your use case, see the documentation for your target embedding model and downstream application toolsets.
+
+1. With the workflow designer active from the previous step, just before the **Destination** node, click the add (**+**) icon, and then click **Enrich > Chunker**.
+
+ 
+
+2. In the node's settings pane's **Details** tab, select **Chunk by Character**.
+3. Under **Chunk by Character**, specify the following settings:
+
+ - Check the box labelled **Include Original Elements**.
+ - Set **Max Characters** to **500**.
+ - Set **New After N Characters** to **400**.
+ - Set **Overlap** to **50**.
+ - Leave **Contextual Chunking** turned off and **Overlap All** unchecked.
+
+ 
+
+4. With the "Chinese Characters" PDF file still selected in the **Source** node, click **Test**.
+5. In the **Test output** pane, make sure that **Chunker (6 of 6)** is showing. If not, click the right arrow (**>**) until **Chunker (6 of 6)** appears, which will show the output from the last node in the workflow.
+6. To explore the chunker's results, search for the text `"type": "CompositeElement"`.
+7. Try running this workflow again with the **Chunk by Title** strategy, as follows:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Title**.
+ c. Under **Chunk by Title**, specify the following settings:
+
+ - Check the box labelled **Include Original Elements**.
+ - Set **Max Characters** to **500**.
+ - Set **New After N Characters** to **400**.
+ - Set **Overlap** to **50**.
+ - Leave **Contextual Chunking** turned off, leave **Combine Text Under N Characters** blank, and leave **Multipage Sections** and **Overlap All** unchecked.
+
+ d. Click **Test**.
+ e. In the **Test output** pane, make sure that **Chunker (6 of 6)** is showing. If not, click the right arrow (**>**) until **Chunker (6 of 6)** appears, which will show the output from the last node in the workflow.
+ f. To explore the chunker's results, search for the text `"type": "CompositeElement"`. Notice that the lengths of some of the chunks that immediately
+ precede titles might be shortened due to the presence of the title impacting the chunk's size.
+
+8. Try running this workflow again with the **Chunk by Page** strategy, as follows:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Page**.
+ c. Under **Chunk by Page**, specify the following settings:
+
+ - Check the box labelled **Include Original Elements**.
+ - Set **Max Characters** to **500**.
+ - Set **New After N Characters** to **400**.
+ - Set **Overlap** to **50**.
+ - Leave **Contextual Chunking** turned off, and leave **Overlap All** unchecked.
+
+ d. Click **Test**.
+ e. In the **Test output** pane, make sure that **Chunker (6 of 6)** is showing. If not, click the right arrow (**>**) until **Chunker (6 of 6)** appears, which will show the output from the last node in the workflow.
+ f. To explore the chunker's results, search for the text `"type": "CompositeElement"`. Notice that the lengths of some of the chunks that immediately
+ precede page breaks might be shortened due to the presence of the page break impacting the chunk's size.
+
+9. Try running this workflow again with the **Chunk by Similarity** strategy, as follows:
+
+ a. Click the close (**X**) button above the output on the right side of the screen.
+ b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Similarity**.
+ c. Under **Chunk by Similarity**, specify the following settings:
+
+ - Check the box labelled **Include Original Elements**.
+ - Set **Max Characters** to **500**.
+ - Set **Similarity Threshold** to **0.99**.
+ - Leave **Contextual Chunking** turned off.
+
+ d. Click **Test**.
+ e. In the **Test output** pane, make sure that **Chunker (6 of 6)** is showing. If not, click the right arrow (**>**) until **Chunker (6 of 6)** appears, which will show the output from the last node in the workflow.
+ f. To explore the chunker's results, search for the text `"type": "CompositeElement"`. Notice that the lengths of many of the chunks fall well short of the **Max Characters** limit. This is because a similarity threshold
+ of 0.99 means that only sentences or text segments with a near-perfect semantic match will be grouped together into the same chunk. This is an extremely high threshold, resulting in very short, highly specific chunks of text.
+ g. If you change **Similarity Threshold** to **0.01** and run the workflow again, searching for the text `"type": "CompositeElement"`, many of the chunks will now come closer to the **Max Characters** limit. This is because a similarity threshold
+ of 0.01 provides an extreme tolerance of differences between pieces of text, grouping almost anything together.
+
+10. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
+ the workflow designer for the next step.
+
+## Step 6: Experiment with embedding
+
+In this step, you generate [embeddings](/ui/embedding) for your workflow. Embeddings are vectors of numbers that represent various aspects of the text that is extracted by Unstructured.
+These vectors are stored or "embedded" next to the text itself in a vector store or vector database. Chatbots, agents, and other AI solutions can use
+these vector embeddings to more efficiently and effectively find, analyze, and use the associated text. These vector embeddings are generated by an
+embedding model that is provided by an embedding provider. For the best embedding model to apply to your use case, see the documentation for your target downstream application toolsets.
+
+1. With the workflow designer active from the previous step, just before the **Destination** node, click the add (**+**) icon, and then click **Transform > Embedder**.
+
+ 
+
+2. In the node's settings pane's **Details** tab, under **Select Embedding Model**, for **Azure OpenAI**, select **Text Embedding 3 Small [dim 1536]**.
+3. With the "Chinese Characters" PDF file still selected in the **Source** node, click **Test**.
+4. In the **Test output** pane, make sure that **Embedder (7 of 7)** is showing. If not, click the right arrow (**>**) until **Embedder (7 of 7)** appears, which will show the output from the last node in the workflow.
+5. To explore the embeddings, search for the text `"embeddings"`.
+6. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
+ the workflow designer so that you can continue designing things later as you see fit.
+
+## Next steps
+
+Congratulations! You now have an Unstructured workflow that partitions, enriches, chunks, and embeds your source documents, producing
+context-rich data that is ready for retrieval-augmented generation (RAG), agentic AI, and model fine-tuning.
+
+Right now, your workflow only accepts one local file at a time for input. Your worklow also only sends Unstructured's processed data to your screen.
+You can modify your workflow to accept multiple files and data from—and send Unstructured's processed data to—one or more file storage
+locations, databases, and vector stores. To learn how to do this, try one or more of the following quickstarts:
+
+- [Remote quickstart](/ui/quickstart#remote-quickstart)
+- [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart)
+- [Pinecone destination connector quickstart](/ui/destinations/pinecone-destination-quickstart)
+
+Unstructured also offers an API and SDKs, which allow you to work with Unstructured programmatically. For details, see:
+
+- [Unstructured API quickstart](/api-reference/workflow/overview#quickstart)
+- [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk)
+- [Unstructured API overview](/api-reference/overview)
+
+If you are not able to complete any of the preceding quickstarts, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
\ No newline at end of file
diff --git a/welcome.mdx b/welcome.mdx
index e0dcb411..0218c60f 100644
--- a/welcome.mdx
+++ b/welcome.mdx
@@ -65,6 +65,8 @@ You can use Unstructured through a user interface (UI), an API, or both. Read on
allowfullscreen
>
+ [Keep enhancing your workflow](/ui/walkthrough).
+
[Learn more](/ui/overview).
---