Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examplecode/codesamples/api/huggingchat.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Query processed PDF with HuggingChat
---

This example uses the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) or the
This example uses the [Unstructured Ingest Python library](/ingestion/python-ingest) or the
[Unstructured JavaScript/TypeScript SDK](/platform-api/partition-api/sdk-jsts) to send a PDF file to
the [Unstructured Platform Partition Endpoint](/platform-api/partition-api/overview) for processing. Unstructured processes the PDF and extracts the PDF's content.
This example then sends some of the content to [HuggingChat](https://huggingface.co/chat/), Hugging Face's open-source AI chatbot,
Expand All @@ -11,7 +11,7 @@ along with some queries about this content.
To run this example, you'll need:

- The [hugchat](https://pypi.org/project/hugchat/) package for Python, or the [huggingface-chat](https://www.npmjs.com/package/huggingface-chat) package for JavaScript/TypeScript.
- Your Unstructured API key and API URL. [Get an API key and API URL](/platform-api/parition-api/overview).
- Your Unstructured API key and API URL. [Get an API key and API URL](/platform-api/partition-api/overview).
- Your Hugging Face account's email address and account password. [Get an account](https://huggingface.co/join).
- A PDF file for Unstructured to process. This example uses a sample PDF file containing the text of the United States Constitution,
available for download from [https://constitutioncenter.org/media/files/constitution.pdf](https://constitutioncenter.org/media/files/constitution.pdf).
Expand All @@ -37,7 +37,7 @@ import HuggingChatTSExampleCode from '/snippets/examples/huggingchat.ts.mdx';
</Accordion>
<Accordion title="JavaScript/TypeScript SDK">
<Warning>
Unstructured recommends that you use the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) instead.
Unstructured recommends that you use the [Unstructured Ingest Python library](/ingestion/python-ingest) instead.

The Ingest Python library provides faster processing of larger individual files, and faster and easier processing of multiple files at a time in batches.
</Warning>
Expand Down
22 changes: 11 additions & 11 deletions examplecode/tools/langflow.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ To do this, you can:

- [Use Unstructured Ingest to create a pipeline](/ingestion/overview) that relies on any available
[source connector](/ingestion/source-connectors/overview) to connect to
[Astra DB](/ingestion/destination-connector/astradb). Run this pipeline outside of Langflow anytime you have new documents in that non-local source location that
[Astra DB](/ingestion/destination-connectors/astradb). Run this pipeline outside of Langflow anytime you have new documents in that non-local source location that
you want Unstructured to process and then insert the new processed data into Astra DB. Then, back in the Langflow project,
use the **Playground** to ask additonal questions, which will now include the new data when generating answers.

Expand Down Expand Up @@ -274,16 +274,16 @@ Or, [use Unstructured Ingest to create a pipeline](/ingestion/overview) that rel
[source connector](/ingestion/source-connectors/overview) to connect to
one of the following available vector stores that Langflow also supports:

- [Chroma DB](/ingestion/destination-connector/chroma)
- [Couchbase](/ingestion/destination-connector/couchbase)
- [Elasticsearch](/ingestion/destination-connector/elasticsearch)
- [Milvus](/ingestion/destination-connector/milvus)
- [MongoDB](/ingestion/destination-connector/mongodb)
- [OpenSearch](/ingestion/destination-connector/opensearch)
- [Pinecone](/ingestion/destination-connector/pinecone)
- [Qdrant](/ingestion/destination-connector/qdrant)
- [Vectara](/ingestion/destination-connector/vectara)
- [Weaviate](/ingestion/destination-connector/weaviate)
- [Chroma DB](/ingestion/destination-connectors/chroma)
- [Couchbase](/ingestion/destination-connectors/couchbase)
- [Elasticsearch](/ingestion/destination-connectors/elasticsearch)
- [Milvus](/ingestion/destination-connectors/milvus)
- [MongoDB](/ingestion/destination-connectors/mongodb)
- [OpenSearch](/ingestion/destination-connectors/opensearch)
- [Pinecone](/ingestion/destination-connectors/pinecone)
- [Qdrant](/ingestion/destination-connectors/qdrant)
- [Vectara](/ingestion/destination-connectors/vectara)
- [Weaviate](/ingestion/destination-connectors/weaviate)

Run this pipeline outside of Langflow anytime you have new documents in the source location that
you want Unstructured to process and then insert the new processed data into the vector store. Then, back in the Langflow project,
Expand Down
6 changes: 3 additions & 3 deletions ingestion/ingest-cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The Unstructured Ingest CLI enables you to use command-line scripts to send file
<Note>
The Unstructured Ingest CLI does not work with the Unstructured Platform API.

For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform/api/overview).
For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform-api/api/overview).
</Note>

## Installation
Expand All @@ -34,7 +34,7 @@ For additional installation options, see [Unstructured Ingest CLI](/ingestion/ov
To call the Unstructured Ingest CLI, follow this calling pattern, where:

- `<source>` is the command name for one of the available [source](/ingestion/source-connectors/overview) (input) connectors, such as `local` for a local source location, `azure` for an Azure Storage account source, `s3` for an Amazon S3 bucket source, and so on.
- `<destination>` is the command name for one of the available [destination](/ingestion/destination-connector/overview) (output) connectors, such as `local` for a local destination, `azure` for an Azure Storage account destination, `s3` for an Amazon S3 bucket destination, and so on.
- `<destination>` is the command name for one of the available [destination](/ingestion/destination-connectors/overview) (output) connectors, such as `local` for a local destination, `azure` for an Azure Storage account destination, `s3` for an Amazon S3 bucket destination, and so on.
- `<setting>` is one or more command-line options for specifying how and where Unstructured will ingest the files from the `<source>`, or how and where to deliver the processed data to the `<destination>`.

```bash CLI
Expand All @@ -51,6 +51,6 @@ unstructured-ingest \
--<settingN> <valueN>
```

To learn how to use the Unstructured Ingest CLI to work with a specific source (input) and destination (output) location, see the CLI script examples for the [source](/ingestion/source-connectors/overview) and [destination](/ingestion/destination-connector/overview) connectors that are available for you to choose from.
To learn how to use the Unstructured Ingest CLI to work with a specific source (input) and destination (output) location, see the CLI script examples for the [source](/ingestion/source-connectors/overview) and [destination](/ingestion/destination-connectors/overview) connectors that are available for you to choose from.

See also the [ingest configuration](/ingestion/ingest-configuration/overview) settings for command-line options that enable you to further control how batches are sent and processed.
2 changes: 1 addition & 1 deletion ingestion/ingest-dependencies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ When you install the [Unstructured Ingest CLI](/ingestion/ingest-cli) and the
[Unstructured Ingest Python library](/ingestion/python-ingest) by running the command
`pip install unstructured-ingest` by itself, you get the following by default:

- The [local source connector](/ingestion/source-connectors/local) and the [local destination connector](/ingestion/destination-connector/local).
- The [local source connector](/ingestion/source-connectors/local) and the [local destination connector](/ingestion/destination-connectors/local).
- Support for the following file types:

| File type |
Expand Down
2 changes: 1 addition & 1 deletion ingestion/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ import GeneratePythonCodeExamples from '/snippets/ingestion/code-generator.mdx';

- [Ingest configuration](/ingestion/ingest-configuration/overview) settings enable you to control how batches are sent and processed.
- [Source connectors](/ingestion/source-connectors/overview) enable you to send batches from local or remote locations to be ingested by Unstructured for processing.
- [Destination connectors](/ingestion/destination-connector/overview) enable Unstructured to send the processed data to local or remote locations.
- [Destination connectors](/ingestion/destination-connectors/overview) enable Unstructured to send the processed data to local or remote locations.

## See also

Expand Down
8 changes: 3 additions & 5 deletions ingestion/python-ingest.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The Unstructured Ingest Python library enables you to use Python code to send fi
<Note>
The Unstructured Ingest Python library does not work with the Unstructured Platform API.

For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform/api/overview).
For information about the Unstructured Platform API, see the [Unstructured Platform API Overview](/platform-api/api/overview).
</Note>

The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured for processing:
Expand All @@ -23,8 +23,6 @@ The following 3-minute video shows how to use the Unstructured Ingest Python lib
allowfullscreen
></iframe>

[Learn more](/overview#unstructured-ingest-python).

## Installation

One approach to get started quickly with the Unstructured Ingest Python library is to install Python and then run the following command:
Expand All @@ -39,7 +37,7 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d

<AdditionalIngestDependencies />

For additional installation options, and information about v2 and v1 implementations in this library, see the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) in the [Ingest](/ingestion/overview) section.
For additional installation options, and information about v2 and v1 implementations in this library, see the [Unstructured Ingest Python library](/ingestion/python-ingest) in the [Ingest](/ingestion/overview) section.

<Info>To migrate from older, deprecated versions of the Ingest Python library that used `pip install unstructured`, see the [migration guide](/ingestion/overview#migration-guide).</Info>

Expand All @@ -58,6 +56,6 @@ import AzureAPIPyV1 from '/snippets/destination_connectors/azure.v1.py.mdx';

</CodeGroup>

To learn how to use the Unstructured Ingest Python library to work with a specific source (input) and destination (output) location, see the Python code examples for the [source](/ingestion/source-connectors/overview) and [destination](/ingestion/destination-connector/overview) connectors that are available for you to choose from.
To learn how to use the Unstructured Ingest Python library to work with a specific source (input) and destination (output) location, see the Python code examples for the [source](/ingestion/source-connectors/overview) and [destination](/ingestion/destination-connectors/overview) connectors that are available for you to choose from.

See also the [ingest configuration](/ingestion/ingest-configuration/overview) settings that enable you to further control how batches are sent and processed.
14 changes: 7 additions & 7 deletions meta-prompt/llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -736,7 +736,7 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Azure Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/azure)
**Reference:** [Azure Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/azure)

2. DataBricks Volumes

Expand Down Expand Up @@ -882,7 +882,7 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Databricks Volumes Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/databricks-volumes)
**Reference:** [Databricks Volumes Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/databricks-volumes)

3. Weaviate

Expand Down Expand Up @@ -948,7 +948,7 @@ The Weaviate destination connector enables you to batch process and store struct
}
```
- [Schema Reference](https://weaviate.io/developers/weaviate/config-refs/schema)
- [Document Elements and Metadata](https://docs.unstructured.io/latform-api/partition-api/document-elements)
- [Document Elements and Metadata](https://docs.unstructured.io/platform-api/partition-api/document-elements)

---

Expand Down Expand Up @@ -1054,7 +1054,7 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Weaviate Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/weaviate)
**Reference:** [Weaviate Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/weaviate)

4. Pinecone

Expand Down Expand Up @@ -1184,7 +1184,7 @@ if __name__ == "__main__":
- Ensure the Pinecone schema aligns with the data structure produced by Unstructured for smooth ingestion.
- This example uses the local source connector; you can replace it with other supported connectors as needed.

**Reference:** [Pinecone Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/pinecone)
**Reference:** [Pinecone Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/pinecone)

5. S3

Expand Down Expand Up @@ -1330,7 +1330,7 @@ if __name__ == "__main__":
- This example uses the local source connector; other connectors can be substituted.
- Use `--anonymous` for anonymous bucket access where applicable.

**Reference:** [S3 Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/s3)
**Reference:** [S3 Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/s3)

# Unstructured Ingest Best Practices

Expand Down Expand Up @@ -1539,7 +1539,7 @@ Partitioning strategies in Unstructured are used to preprocess documents like PD

---

**Learn More**: [Document Elements and Metadata](https://docs.unstructured.io/latform-api/partition-api/document-elements)
**Learn More**: [Document Elements and Metadata](https://docs.unstructured.io/platform-api/partition-api/document-elements)

5. Tables as HTML

Expand Down
8 changes: 4 additions & 4 deletions meta-prompt/splits/2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Azure Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/azure)
**Reference:** [Azure Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/azure)

2. DataBricks Volumes

Expand Down Expand Up @@ -384,7 +384,7 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Databricks Volumes Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/databricks-volumes)
**Reference:** [Databricks Volumes Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/databricks-volumes)

3. Weaviate

Expand Down Expand Up @@ -450,7 +450,7 @@ The Weaviate destination connector enables you to batch process and store struct
}
```
- [Schema Reference](https://weaviate.io/developers/weaviate/config-refs/schema)
- [Document Elements and Metadata](https://docs.unstructured.io/latform-api/partition-api/document-elements)
- [Document Elements and Metadata](https://docs.unstructured.io/platform-api/partition-api/document-elements)

---

Expand Down Expand Up @@ -556,4 +556,4 @@ if __name__ == "__main__":
).run()
```

**Reference:** [Weaviate Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/weaviate)
**Reference:** [Weaviate Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/weaviate)
6 changes: 3 additions & 3 deletions meta-prompt/splits/3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ if __name__ == "__main__":
- Ensure the Pinecone schema aligns with the data structure produced by Unstructured for smooth ingestion.
- This example uses the local source connector; you can replace it with other supported connectors as needed.

**Reference:** [Pinecone Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/pinecone)
**Reference:** [Pinecone Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/pinecone)

5. S3

Expand Down Expand Up @@ -272,7 +272,7 @@ if __name__ == "__main__":
- This example uses the local source connector; other connectors can be substituted.
- Use `--anonymous` for anonymous bucket access where applicable.

**Reference:** [S3 Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connector/s3)
**Reference:** [S3 Destination Connector Documentation](https://docs.unstructured.io/ingestion/destination-connectors/s3)

# Unstructured Ingest Best Practices

Expand Down Expand Up @@ -481,7 +481,7 @@ Partitioning strategies in Unstructured are used to preprocess documents like PD

---

**Learn More**: [Document Elements and Metadata](https://docs.unstructured.io/latform-api/partition-api/document-elements)
**Learn More**: [Document Elements and Metadata](https://docs.unstructured.io/platform-api/partition-api/document-elements)

5. Tables as HTML

Expand Down
2 changes: 1 addition & 1 deletion open-source/core-functionality/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Embedding

The Unstructured open-source library does not offer built-in support for calling embedding providers to obtain embeddings for pieces of text.

Alternatively, the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library)
Alternatively, the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/ingestion/python-ingest)
offer built-in support for calling embedding providers as part of an ingest pipeline. [Learn how](/platform-api/partition-api/embedding).

Also, you can use common third-party tools and libraries to get embeddings for document elements' text within JSON files that are
Expand Down
2 changes: 1 addition & 1 deletion open-source/core-functionality/staging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Staging

<Warning>

The `Staging` brick is being deprecated in favor of the new and more comprehensive `Destination Connectors`. To explore the complete list and usage, please refer to [Destination Connectors documentation](../ingest/destination-connectors/overview).
The `Staging` brick is being deprecated in favor of the new and more comprehensive `Destination Connectors`. To explore the complete list and usage, please refer to [Destination Connectors documentation](/ingestion/destination-connectors/overview).

Note: We are constantly expanding our collection of destination connectors. If you wish to request a specific Destination Connector, you’re encouraged to submit a Feature Request on the [Unstructured GitHub repository](https://github.com/Unstructured-IO/unstructured/issues/new/choose).
</Warning>
Expand Down
2 changes: 1 addition & 1 deletion open-source/introduction/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ and use cases.

* [Chunking](/open-source/core-functionality/chunking): The chunking process in Unstructured is distinct from conventional methods. Instead of relying solely on text-based features to form chunks, Unstructured uses a deep understanding of document formats to partition documents into semantic units (document elements).

* **High-performant Connectors**: The platform includes optimized connectors for efficient data ingestion and output. These comprise [Source Connectors](../ingest/source-connectors/overview) for data input and [Destination Connectors](../ingest/destination-connectors/overview) for data export.
* **High-performant Connectors**: The platform includes optimized connectors for efficient data ingestion and output. These comprise [Source Connectors](/ingestion/source-connectors/overview) for data input and [Destination Connectors](/ingestion/destination-connectors/overview) for data export.


## Common use cases
Expand Down
Loading