Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions api-reference/ingest/destination-connector/qdrant.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,28 @@
title: Qdrant
---

import SharedQdrant from '/snippets/dc-shared-text/qdrant.mdx';
import NewDocument from '/snippets/general-shared-text/new-document.mdx';

<NewDocument />

import SharedContentQdrant from '/snippets/dc-shared-text/qdrant-cli-api.mdx';
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';

<SharedContentQdrant/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.

This example uses the local source connector:

import QdrantAPISh from '/snippets/destination_connectors/qdrant.sh.mdx';
import QdrantAPIPyV2 from '/snippets/destination_connectors/qdrant.v2.py.mdx';
import QdrantAPIPyV1 from '/snippets/destination_connectors/qdrant.v1.py.mdx';

<CodeGroup>
<QdrantAPISh />
<QdrantAPIPyV2 />
<QdrantAPIPyV1 />
</CodeGroup>


<SharedQdrant />
26 changes: 24 additions & 2 deletions open-source/ingest/destination-connectors/qdrant.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,28 @@
title: Qdrant
---

import SharedQdrant from '/snippets/dc-shared-text/qdrant.mdx';
<NewDocument />

<SharedQdrant />
import SharedContentQdrant from '/snippets/dc-shared-text/qdrant-cli-api.mdx';

<SharedContentQdrant/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.

This example uses the local source connector.

This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.

import QdrantAPISh from '/snippets/destination_connectors/qdrant.sh.mdx';
import QdrantAPIPyV2 from '/snippets/destination_connectors/qdrant.v2.py.mdx';
import QdrantAPIPyV1 from '/snippets/destination_connectors/qdrant.v1.py.mdx';

<CodeGroup>
<QdrantAPISh />
<QdrantAPIPyV2 />
<QdrantAPIPyV1 />
</CodeGroup>

import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';

<SharedPartitionByAPIOSS/>
9 changes: 9 additions & 0 deletions snippets/dc-shared-text/qdrant-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Batch process all your records to store structured outputs in Qdrant.

You will need:

import SharedQdrant from '/snippets/general-shared-text/qdrant.mdx';
import SharedQdrantCLIAPI from '/snippets/general-shared-text/qdrant-cli-api.mdx';

<SharedQdrant />
<SharedQdrantCLIAPI />
27 changes: 0 additions & 27 deletions snippets/dc-shared-text/qdrant.mdx

This file was deleted.

55 changes: 45 additions & 10 deletions snippets/destination_connectors/qdrant.sh.mdx
Original file line number Diff line number Diff line change
@@ -1,19 +1,54 @@
```bash Shell
```bash CLI
#!/usr/bin/env bash

# Chunking and embedding are optional.

# For Qdrant local:
unstructured-ingest \
local \
--input-path $LOCAL_FILE_INPUT_DIR \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--strategy hi_res \
--chunk-elements \
--chunking-strategy by_title \
--embedding-provider huggingface \
--num-processes 2 \
--verbose \
qdrant \
--collection-name $QDRANT_COLLECTION_NAME \
--location http://localhost:6333 \
--batch-size 80
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
qdrant-local \
--path $QDRANT_PATH \
--collection-name $QDRANT_COLLECTION \
--batch-size 50 \
--num-processes 1

# For Qdrant client-server:
unstructured-ingest \
local \
--input-path $LOCAL_FILE_INPUT_DIR \
--chunking-strategy by_title \
--embedding-provider huggingface \
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
qdrant-server \
--url $QDRANT_URL \
--collection-name $QDRANT_COLLECTION \
--batch-size 50 \
--num-processes 1

# For Qdrant cloud:
unstructured-ingest \
local \
--input-path $LOCAL_FILE_INPUT_DIR \
--chunking-strategy by_title \
--embedding-provider huggingface \
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
qdrant-cloud \
--url $QDRANT_URL \
--api-key $QDRANT_API_KEY \
--collection-name $QDRANT_COLLECTION \
--batch-size 50 \
--num-processes 1
```
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
```python Python
```python Python Ingest v1
import os

from unstructured_ingest.connector.local import SimpleLocalConfig
from unstructured_ingest.connector.qdrant import (
QdrantWriteConfig,
SimpleQdrantConfig,
QdrantAccessConfig,
)
from unstructured_ingest.interfaces import (
ChunkingConfig,
Expand All @@ -15,12 +18,14 @@ from unstructured_ingest.runner import LocalRunner
from unstructured_ingest.runner.writers.base_writer import Writer
from unstructured_ingest.runner.writers.qdrant import QdrantWriter

# This example uses Qdrant Cloud.

def get_writer() -> Writer:
return QdrantWriter(
connector_config=SimpleQdrantConfig(
location="http://localhost:6333",
collection_name="test",
url=os.getenv("QDRANT_URL"),
access_config=QdrantAccessConfig(api_key=os.getenv("QDRANT_API_KEY")),
collection_name=os.getenv("QDRANT_COLLECTION"),
),
write_config=QdrantWriteConfig(batch_size=80),
)
Expand All @@ -40,7 +45,15 @@ if __name__ == "__main__":
),
read_config=ReadConfig(),
partition_config=PartitionConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
strategy="hi_res",
additional_partition_args={
"split_pdf_page": True,
"split_pdf_allow_failed": True,
"split_pdf_concurrency_level": 15
}
),
chunking_config=ChunkingConfig(chunk_elements=True),
embedding_config=EmbeddingConfig(
Expand Down
99 changes: 99 additions & 0 deletions snippets/destination_connectors/qdrant.v2.py.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
```python Python Ingest v2
import os

from unstructured_ingest.v2.pipeline.pipeline import Pipeline
from unstructured_ingest.v2.interfaces import ProcessorConfig

from unstructured_ingest.v2.processes.connectors.local import (
LocalIndexerConfig,
LocalDownloaderConfig,
LocalConnectionConfig
)
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
from unstructured_ingest.v2.processes.embedder import EmbedderConfig

# For Qdrant local:
# from unstructured_ingest.v2.processes.connectors.qdrant.local import (
# LocalQdrantConnectionConfig,
# LocalQdrantAccessConfig,
# LocalQdrantUploadStagerConfig,
# LocalQdrantUploaderConfig
# )

# For Qdrant client-server:
# from unstructured_ingest.v2.processes.connectors.qdrant.server import (
# ServerQdrantConnectionConfig,
# ServerQdrantAccessConfig,
# ServerQdrantUploadStagerConfig,
# ServerQdrantUploaderConfig
# )

# For Qdrant Cloud:
from unstructured_ingest.v2.processes.connectors.qdrant.cloud import (
CloudQdrantConnectionConfig,
CloudQdrantAccessConfig,
CloudQdrantUploadStagerConfig,
CloudQdrantUploaderConfig
)

# Chunking and embedding are optional.

if __name__ == "__main__":
Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
downloader_config=LocalDownloaderConfig(),
source_connection_config=LocalConnectionConfig(),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
additional_partition_args={
"split_pdf_page": True,
"split_pdf_allow_failed": True,
"split_pdf_concurrency_level": 15
}
),
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
embedder_config=EmbedderConfig(embedding_provider="huggingface"),

# For Qdrant local:
# destination_connection_config=LocalQdrantConnectionConfig(
# access_config=LocalQdrantAccessConfig(),
# path=os.getenv("QDRANT_PATH")
# ),
# stager_config=LocalQdrantUploadStagerConfig(),
# uploader_config=LocalQdrantUploaderConfig(
# collection_name=os.gentenv("QDRANT_COLLECTION"),
# batch_size=50,
# num_processes=1
# )

# For Qdrant client-server:
# destination_connection_config=ServerQdrantConnectionConfig(
# access_config=ServerQdrantAccessConfig(),
# url=os.getenv("QDRANT_URL")
# ),
# stager_config=ServerQdrantUploadStagerConfig(),
# uploader_config=ServerQdrantUploaderConfig(
# collection_name=os.gentenv("QDRANT_COLLECTION"),
# batch_size=50,
# num_processes=1
# )

# For Qdrant cloud:
destination_connection_config=CloudQdrantConnectionConfig(
access_config=CloudQdrantAccessConfig(
api_key=os.getenv("QDRANT_API_KEY")
),
url=os.getenv("QDRANT_URL")
),
stager_config=CloudQdrantUploadStagerConfig(),
uploader_config=CloudQdrantUploaderConfig(
collection_name=os.gentenv("QDRANT_COLLECTION"),
batch_size=50,
num_processes=1
)
).run()
```
20 changes: 20 additions & 0 deletions snippets/general-shared-text/qdrant-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
The Qdrant connector dependencies.

```bash
pip install "unstructured-ingest[qdrant]"
```

import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';

<AdditionalIngestDependencies />

The following environment variables:

- `QDRANT_COLLECTION` - The name of the target collection on the Qdrant local installation,
Qdrant server, or Qdrant Cloud cluster, represented by `--collection-name` (CLI) or `collection_name` (Python).
- For Qdrant local, `QDRANT_PATH` - The path to the local Qdrant installation, represented by `--path` (CLI) or `path` (Python).
- For Qdrant client-server, `QDRANT_URL` - The Qdrant server's URL, represented by `--url` (CLI) or `url` (Python).
- For Qdrant Cloud:

- `QDRANT_URL` - The Qdrant cluster's URL, represented by `--url` (CLI) or `url` (Python).
- `QDRANT_API_KEY` - The Qdrant API key, represented by `--api-key` (CLI) or `api_key` (Python).
18 changes: 18 additions & 0 deletions snippets/general-shared-text/qdrant.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
The Qdrant prerequisites are as follows.

- The name of the target [collection](https://qdrant.tech/documentation/concepts/collections) on the Qdrant local installation,
Qdrant server, or Qdrant Cloud cluster.
- For [Qdrant local](https://github.com/qdrant/qdrant), the path to the local Qdrant installation, for example: `/qdrant/local`
- For [Qdrant client-server](https://qdrant.tech/documentation/quickstart/), the Qdrant server URL, for example: `http://localhost:6333`
- For [Qdrant Cloud](https://qdrant.tech/documentation/cloud-intro/):

- A [Qdrant account](https://cloud.qdrant.io/login).
- A [Qdrant cluster](https://qdrant.tech/documentation/cloud/create-cluster/).
- The cluster's URL. To get this URL, do the following:

1. Sign in to your Qdrant Cloud account.
2. On the sidebar, under **Dashboard**, click **Clusters**.
3. Click the cluster's name.
4. Note the value of the **Endpoint** field, for example: `https://<random-guid>.<region-id>.<cloud-provider>.cloud.qdrant.io`.

- A [Qdrant API key](https://qdrant.tech/documentation/cloud/authentication/#create-api-keys).