diff --git a/api-reference/overview.mdx b/api-reference/overview.mdx index a5747456..56ed784b 100644 --- a/api-reference/overview.mdx +++ b/api-reference/overview.mdx @@ -35,6 +35,8 @@ The Unstructured API provides the following benefits beyond the [Unstructured op * Unstructured manages code dependencies, for instance for libraries such as Tesseract. * Unstructured manages its own infrastructure, including parallelization and other performance optimizations. +[Learn more](/open-source/introduction/overview#limits). + ## Pricing To call the Unstructured API, you must have an Unstructured account. diff --git a/api-reference/partition/quickstart.mdx b/api-reference/partition/quickstart.mdx index 6e94d26e..b481467e 100644 --- a/api-reference/partition/quickstart.mdx +++ b/api-reference/partition/quickstart.mdx @@ -8,6 +8,12 @@ sidebarTitle: Quickstart [Skip ahead](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb) to run this quickstart as a notebook on Google Colab now! Do you want to just copy the sample code for use on your local machine? [Skip ahead](#sample-code) to the code now! + + This quickstart uses the Unstructured Partition Endpoint and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also + focuses only on a limited set of Unstructured's full capabilities. To unlock the full feature set, as well as use Unstructured to do + large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, + [skip over](/api-reference/workflow/overview#quickstart) to an expanded, advanced version of this quickstart that uses the + Unstructured Workflow Endpoint instead. - -4. To keep enhancing your workflow, skip ahead to the [walkthrough](/ui/walkthrough). - -5. To move beyond local file processing, you can try the following [remote quickstart](#remote-quickstart), or skip over to the [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) instead. - - You can also learn about Unstructured [source connectors](/ui/sources/overview), [destination connectors](/ui/destinations/overview), [workflows](/ui/workflows), [jobs](/ui/jobs), and [managing your account](/ui/account/overview). + --- diff --git a/ui/summarizing.mdx b/ui/summarizing.mdx index e9bd96fe..4cc9bde6 100644 --- a/ui/summarizing.mdx +++ b/ui/summarizing.mdx @@ -74,13 +74,13 @@ Line breaks have been inserted here for readability. The output will not contain ## Summarize images or tables -import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-hi-res-only.mdx'; +import EnrichmentImagesTablesOCRHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-ocr-hi-res-only.mdx'; To summarize images or tables, in the **Task** drop-down list of an **Enrichment** node in a workflow, specify the following: You can change a workflow's summarization settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings. - + For image summarization, select **Image Description**, and then choose one of the available provider (and model) combinations that are shown. diff --git a/ui/walkthrough-2.mdx b/ui/walkthrough-2.mdx new file mode 100644 index 00000000..6510bada --- /dev/null +++ b/ui/walkthrough-2.mdx @@ -0,0 +1,7 @@ +--- +title: Unstructured UI walkthrough +--- + +import GetStartedSingleFileUIPart2 from '/snippets/general-shared-text/get-started-single-file-ui-part-2.mdx'; + + \ No newline at end of file diff --git a/ui/walkthrough.mdx b/ui/walkthrough.mdx index 65115086..552bdbf5 100644 --- a/ui/walkthrough.mdx +++ b/ui/walkthrough.mdx @@ -22,6 +22,14 @@ These files, which are available for you to download to your local machine, incl This PDF file features English handwriting and scanned images of documents. Throughout this walkthrough, this file's title is shortened to "Nash letters" for brevity. + + This walkthrough focuses on local files for ease-of-use demonstration purposes. + + This walkthrough does not cover how to use + Unstructured to set up [connectors](/ui/connectors) to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations. + To learn how to set up connectors and do large-scale batch processing later, see the [next steps](#next-steps) after you finish this walkthrough. + + If you are not able to complete any of the following steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io). @@ -76,7 +84,7 @@ Let's get going! _What do the other buttons on the sidebar do?_ - - **Start** takes you to the UI home page. + - **Start** takes you to the UI home page. The home page features a simple way to process one local file at a time with limited default settings. [Learn how](/welcome#unstructured-ui-quickstart). - **Connectors** allows you to create and manage your [source](/ui/sources/overview) and [destination](/ui/destinations/overview) connectors. - **Jobs** allows you to see the results of your workflows that are run manually (on-demand) and automatically (on a regular time schedule). [Learn more](/ui/jobs). @@ -95,7 +103,7 @@ Let's get going! _What does **Build it For Me** do?_ - The **Build it For Me** option creates an [automatic workflow](/ui/workflows#create-an-automatic-workflow) with sensible default settings to enable you to get good-quality results faster. However, + The **Build it For Me** option creates an [automatic workflow](/ui/workflows#create-an-automatic-workflow) with limited, sensible default settings to enable you to get good-quality results faster. However, this option requires that you first have an existing remote source and destination connector to add to the workflow. To speed things up here and keep things simple, this walkthrough only processes files from your local machine and skips the use of connectors. To learn how to use connectors later, see the [next steps](#next-steps) at the end of this walkthrough. @@ -267,41 +275,77 @@ HTML representations of detected tables, and detected entities (such as people a ![Adding an enrichment node](/img/ui/walkthrough/AddEnrichment.png) -3. In the node's settings pane's **Details** tab, select **Image** under **Input Type**, and then click **OpenAI (GPT-4o)** under **Model**. +3. In the node's settings pane's **Details** tab, click: + + - **Image** under **Input Type** + - **OpenAI** under **Provider** + - **(GPT-4o)** under **Model** The image description enrichment generates a summary description of each detected image. This can help you to more quickly and easily understand what each image is all about without having to stop to manually visualize and interpret the image's content yourself. This also provides additional helpful context about the image for your RAG apps, agents, and models. [Learn more](/ui/enriching/image-descriptions). -4. Repeat this process to add three more nodes between the **Partitioner** and **Destination** nodes. To do this, click +4. Repeat this process to add four more nodes between the **Partitioner** and **Destination** nodes. To do this, click the add (**+**) icon, and then click **Enrich > Enrichment**, as follows: - a. Add a **Table** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) and **Table Description** (under **Task**) selected.

- + a. Add a table description enrichment (click the add (**+**) icon to the right of the preceding node, and then click **Enrich > Enrichment). + + In the node's settings pane's **Details** tab, click: + + - **Table** under **Input Type**. + - **Anthropic** under **Provider**. + - **Claude Sonnet 4** under **Model**. + - **Table Description** under **Task**. + The table description enrichment generates a summary description of each detected table. This can help you to more quickly and easily understand what each table is all about without having to stop to manually read through the table's content yourself. This also provides additional helpful context about the table for your RAG apps, agents, and models. [Learn more](/ui/enriching/table-descriptions). - b. Add another **Table** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) and **Table to HTML** (under **Task**) selected.

+ b. Add a table to HTML enrichment (click the add (**+**) icon to the right of the preceding node, and then click **Enrich > Enrichment). + + In the node's settings pane's **Details** tab, click: + + - **Table** under **Input Type**. + - **Anthropic** under **Provider**. + - **Claude Sonnet 4** under **Model**. + - **Table to HTML** under **Task**. The table to HTML enrichment generates an HTML representation of each detected table. This can help you to more quickly and accurately recreate the table's content elsewhere later as needed. This also provides additional context about the table's structure for your RAG apps, agents, and models. [Learn more](/ui/enriching/table-to-html). - c. Add a **Text** (under **Input Type**) enrichment node with **OpenAI (GPT-4o)** (under **Model**) selected.

+ c. Add a named entity recognition enrichment (click the add (**+**) icon to the right of the preceding node, and then click **Enrich > Enrichment). + + In the node's settings pane's **Details** tab, click: + + - **Text** under **Input Type**. + - **Anthropic** under **Provider**. + - **Claude Sonnet 4** under **Model**. The named entity recognition (NER) enrichment generates a list of detected entities (such as people and organizations) and the inferred relationships among these entities. This provides additional context about these entities' types and their relationships for your graph databases, RAG apps, agents, and models. [Learn more](/ui/enriching/ner). + + d. Add a text fidelity optimization enrichment (click the add (**+**) icon to the right of the preceding node, and then click **Enrich > Enrichment). + + In the node's settings pane's **Details** tab, click: + + - **OCR** under **Input Type**. + - **Anthropic** under **Provider**. + - **Claude Sonnet 4** under **Model**. + + + The text fidelity optimization enrichment improves the accuracy of text blocks that Unstructured initially processed during its partitioning phase. [Learn more](/ui/enriching/ocr). + The workflow designer should now look like this: ![The workflow with enrichments added](/img/ui/walkthrough/EnrichedWorkflow.png) 5. Change the **Source** node to use the "Chinese Characters" PDF file, and then click **Test**. -6. In the **Test output** pane, make sure that **Enrichment (5 of 5)** is showing. If not, click the right arrow (**>**) until **Enrichment (5 of 5)** appears, which will show the output from the last node in the workflow. +6. In the **Test output** pane, make sure that **Enrichment (6 of 6)** is showing. If not, click the right arrow (**>**) until **Enrichment (6 of 6)** appears, which will show the output from the last node in the workflow. ![The final Enrichment node's output](/img/ui/walkthrough/GoToEnrichmentNode.png) @@ -497,18 +541,18 @@ embedding model that is provided by an embedding provider. For the best embeddin Congratulations! You now have an Unstructured workflow that partitions, enriches, chunks, and embeds your source documents, producing context-rich data that is ready for retrieval-augmented generation (RAG), agentic AI, and model fine-tuning. -Right now, your workflow only accepts one local file at a time for input. Your workflow also only sends Unstructured's processed data to your screen or to be save locally as a JSON file. +Right now, your workflow only accepts one local file at a time for input. Your workflow also only sends Unstructured's processed data to your screen or to be saved locally as a JSON file. You can modify your workflow to accept multiple files and data from—and send Unstructured's processed data to—one or more file storage locations, databases, and vector stores. To learn how to do this, try one or more of the following quickstarts: -- [Remote quickstart](/ui/quickstart#remote-quickstart) -- [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) -- [Pinecone destination connector quickstart](/ui/destinations/pinecone-destination-quickstart) +- [Remote quickstart](/ui/quickstart#remote-quickstart) - This quickstart show you how to begin processing files and semi-structured data from remote source locations at scale, instead of just one local file at a time. +- [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) - If you don't have any remote source locations available for Unstructured to connect to, this quickstart shows you how to set up a Dropbox account to store your documents in, and then connect Unstructured to your Dropbox account. +- [Pinecone destination connector quickstart](/ui/destinations/pinecone-destination-quickstart) - If you don't have any remote destination locations available for Unstructured to send its processed data to, this quickstart shows you how to set up a Pinecone account to have Unstructured store its processed data in, and then connect Unstructured to your Pinecone account. Unstructured also offers an API and SDKs, which allow you to use code to work with Unstructured programmatically instead of only with the UI. For details, see: -- [Unstructured API quickstart](/api-reference/workflow/overview#quickstart) -- [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk) -- [Unstructured API overview](/api-reference/overview) +- [Unstructured API quickstart](/api-reference/workflow/overview#quickstart) - This quickstart uses the Unstructured Workflow Endpoint to programmatically create a Dropbox source connector and a Pinecone destination connector in your Unstructured account. You then programmatically add these connectors to a workflow in your Unstructured account, run that workflow as a job, and then explore the job's results. +- [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk) - This article provides an overview of the Unstructured Python SDK and how to use it. +- [Unstructured API overview](/api-reference/overview) - This article provides an overview of the Unstructured API and how to use it. If you are not able to complete any of the preceding quickstarts, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io). \ No newline at end of file diff --git a/ui/workflows.mdx b/ui/workflows.mdx index e89d56e4..54b138ac 100644 --- a/ui/workflows.mdx +++ b/ui/workflows.mdx @@ -21,7 +21,7 @@ Unstructured provides two types of workflow builders: ### Create an automatic workflow -import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-hi-res-only.mdx'; +import EnrichmentImagesTablesOCRHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-ocr-hi-res-only.mdx'; You must first have an existing source connector and destination connector to add to the workflow. @@ -102,7 +102,7 @@ After this workflow is created, you can change any or all of its settings if you source connector, destination connector, partitioning, chunking, and embedding settings. You can also add enrichments to the workflow if you want to. - + To change the workflow's default settings or to add enrichments: @@ -214,7 +214,7 @@ If you did not previously set the workflow to run on a schedule, you can [run th - Click **Connect** to add another **Source** or **Destination** node. You can add multiple source and destination locations. Files will be ingested from all of the source locations, and the processed data will be delivered to all of the destination locations. [Learn more](#custom-workflow-node-types). - Click **Enrich** to add a **Chunker** or **Enrichment** node. [Learn more](#custom-workflow-node-types). - + - Click **Transform** to add a **Partitioner** or **Embedder** node. [Learn more](#custom-workflow-node-types). @@ -304,7 +304,7 @@ import DeprecatedModelsUI from '/snippets/general-shared-text/deprecated-models- - + - **Image** to summarize images. Also select one of the available provider (and model) combinations that are shown. @@ -330,6 +330,11 @@ import DeprecatedModelsUI from '/snippets/general-shared-text/deprecated-models- **Edit & Test Prompt** section to test the prompt. [Learn more](/ui/enriching/ner). + + - **OCR** to optimize the fidelity of text blocks that Unstructured initially processed during its partitioning phase. + Also select one of the available provider (and model) combinations that are shown. + + [Learn more](/ui/enriching/ocr). diff --git a/welcome.mdx b/welcome.mdx index 29c67a4f..4847ab94 100644 --- a/welcome.mdx +++ b/welcome.mdx @@ -51,28 +51,27 @@ You can use Unstructured through a user interface (UI), an API, or both. Read on ## Unstructured UI quickstart -1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free). - After you sign up, you are automatically signed in to your new Unstructured **Starter** account, at [https://platform.unstructured.io](https://platform.unstructured.io). -2. Watch the following 2-minute video: +import GetStartedSingleFileUI from '/snippets/general-shared-text/get-started-single-file-ui.mdx'; - - -  [Keep enhancing your workflow](/ui/walkthrough). - -  [Learn more](/ui/overview). + --- ## Unstructured API quickstart +This quickstart shows how you can use the Unstructured API to quickly and easily see Unstructured's +transformation results for a single file that is stored locally. + + + This quickstart uses the Unstructured API's [Partition Endpoint](/api-reference/partition/overview) and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also + focuses only on a limited set of Unstructured's full capabilities. + + To unlock Unstructured's full feature set, as well as use Unstructured to do + large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, + [skip over](/api-reference/workflow/overview#quickstart) to an expanded, advanced version of this quickstart that uses the + Unstructured API's [Workflow Endpoint](/api-reference/workflow/overview) instead. + + 1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free). After you sign up, you are automatically signed in to your Unstructured **Starter** account, at [https://platform.unstructured.io](https://platform.unstructured.io). 2. Watch the following 3-minute video: