Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion api-reference/partition/chunking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Chunking strategies
---

Chunking functions use metadata and document elements detected with partition functions to split a document into
appropriately-sized chunks for uses cases such as Retrieval Augmented Generation (RAG).
appropriately-sized chunks for uses cases such as retrieval-augmented generation (RAG).

If you are familiar with chunking methods that split long text documents into smaller chunks, you'll notice that
Unstructured methods slightly differ, since the partitioning step already divides an entire document into its structural elements.
Expand Down
2 changes: 1 addition & 1 deletion api-reference/workflow/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Overview
---

The [Unstructured UI](/ui/overview) features a no-code user interface for transforming your unstructured data into data that is ready
for Retrieval Augmented Generation (RAG).
for retrieval-augmented generation (RAG).

The Unstructured Workflow Endpoint, part of the [Unstructured API](/api-reference/overview), enables a full range of partitioning, chunking, embedding, and
enrichment options for your files and data. It is designed to batch-process files and data in remote locations; send processed results to
Expand Down
2 changes: 1 addition & 1 deletion open-source/core-functionality/chunking.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Chunking
description: Chunking functions in `unstructured` use metadata and document elements detected with `partition` functions to post-process elements into more useful "chunks" for uses cases such as Retrieval Augmented Generation (RAG).
description: Chunking functions in `unstructured` use metadata and document elements detected with `partition` functions to post-process elements into more useful "chunks" for uses cases such as retrieval-augmented generation (RAG).
---

## Chunking Basics
Expand Down
2 changes: 1 addition & 1 deletion open-source/core-functionality/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ After reading this section, you should understand the following:

* How to prepare data for downstream use cases using staging functions

* How to chunk partitioned documents for use cases such as Retrieval Augmented Generation (RAG).
* How to chunk partitioned documents for use cases such as retrieval-augmented generation (RAG).
2 changes: 1 addition & 1 deletion open-source/how-to/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ These vectors are stored or _embedded_ next to the data itself.

These vector embeddings allow _vector databases_ to more quickly and efficiently analyze and process these inherent
properties and relationships between data. For example, you can save the extracted text along with its embeddings in a _vector store_.
When a user queries a retrieval augmented generation (RAG) application, the application can use a vector database to perform a similarity search in that vector store
When a user queries a retrieval-augmented generation (RAG) application, the application can use a vector database to perform a similarity search in that vector store
and then return the documents whose embeddings are the closest to that user's query.

Learn more about [chunking](https://unstructured.io/blog/chunking-for-rag-best-practices) and
Expand Down
2 changes: 1 addition & 1 deletion open-source/introduction/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ and use cases.

* Pretraining models
* Fine-tuning models
* Retrieval Augmented Generation (RAG)
* Retrieval-augmented generation (RAG)
* Traditional ETL

<Note>GPU usage is not supported for the Unstructured open source library.</Note>
Expand Down
4 changes: 2 additions & 2 deletions snippets/concepts/glossary.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ High-level overview of available strategies and models in `Unstructured` library

LLMs, like GPT, are trained on vast amounts of data and can comprehend and generate human-like text. They have achieved state-of-the-art results across many NLP tasks and can be fine-tuned to cater to specific domains or requirements.

## Retrieval augmented generation (RAG)
## Retrieval-augmented generation (RAG)

Large language models (LLMs) like OpenAI’s ChatGPT and Anthropic’s Claude have revolutionized the AI landscape with their prowess. However, they inherently suffer from significant drawbacks. One major issue is their static nature, which means they’re “frozen in time.” Despite this, LLMs might often respond to newer queries with unwarranted confidence, a phenomenon known as “hallucination.” Such errors can be highly detrimental, mainly when these models serve critical real-world applications.

Retrieval augmented generation (RAG) is a groundbreaking technique designed to counteract the limitations of foundational LLMs. By pairing an LLM with an RAG pipeline, we can enable users to access the underlying data sources that the model uses. This transparent approach ensures that an LLM’s claims can be verified for accuracy and builds a trust factor among users.
Retrieval-augmented generation (RAG) is a groundbreaking technique designed to counteract the limitations of foundational LLMs. By pairing an LLM with an RAG pipeline, we can enable users to access the underlying data sources that the model uses. This transparent approach ensures that an LLM’s claims can be verified for accuracy and builds a trust factor among users.

Moreover, RAG offers a cost-effective solution. Instead of bearing the extensive computational and financial burdens of training custom models or fine-tuning existing ones, RAG can, in many situations, serve as a sufficient alternative. This reduction in resource consumption is particularly beneficial for organizations that need more means to develop and deploy foundational models from scratch.

Expand Down
2 changes: 1 addition & 1 deletion snippets/quickstarts/single-file-ui.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enric
allowfullscreen
></iframe>

- Add a **Chunker** node after the **Partitioner** node, to chunk the partitioned data into smaller pieces for your retrieval augmented generation (RAG) applications.
- Add a **Chunker** node after the **Partitioner** node, to chunk the partitioned data into smaller pieces for your retrieval-augmented generation (RAG) applications.
To do this, click the add (**+**) button to the right of the **Partitioner** node, and then click **Enrich > Chunker**. Click the new **Chunker** node and
specify its settings. For help, click the **FAQ** button in the **Chunker** node's pane. [Learn more about chunking and chunker settings](/ui/chunking).
- Add an **Enrichment** node after the **Chunker** node, to apply enrichments to the chunked data such as image summaries, table summaries, table-to-HTML transforms, and
Expand Down
2 changes: 1 addition & 1 deletion ui/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ These vectors are stored or _embedded_ next to the text itself. These vector emb
an _embedding provider_.

You typically save these embeddings in a _vector store_.
When a user queries a retrieval augmented generation (RAG) application, the application can use a vector database to perform
When a user queries a retrieval-augmented generation (RAG) application, the application can use a vector database to perform
a [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/) in that vector store
and then return the items whose embeddings are the closest to that user's query.

Expand Down
2 changes: 1 addition & 1 deletion ui/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Overview
---

The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for retrieval-augmented generation (RAG).

<Tip>To start using the Unstructured UI right away, skip ahead to the [quickstart](/ui/quickstart).</Tip>

Expand Down
2 changes: 1 addition & 1 deletion welcome.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebarTitle: Overview

![ETL plus for GenAI data banner](/img/welcome/ETL-For-GenAI-Data.png)

Unstructured provides a platform and tools to ingest and process unstructured documents for Retrieval Augmented Generation (RAG) and model fine-tuning.
Unstructured provides a platform and tools to ingest and process unstructured documents for retrieval-augmented generation (RAG), agentic AI, and model fine-tuning.

This 60-second video describes more about what Unstructured does and its benefits:

Expand Down