Skip to content

Commit c859071

Browse files
authored
[Feature][Connector-V2] Support Qdrant sink and source connector (#7299)
1 parent 657fe69 commit c859071

File tree

24 files changed

+1440
-1
lines changed

24 files changed

+1440
-1
lines changed

.github/workflows/labeler/label-scope-conf.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,13 @@ activemq:
257257
- changed-files:
258258
- any-glob-to-any-file: seatunnel-connectors-v2/connector-activemq/**
259259
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(activemq)/**'
260+
261+
qdrant:
262+
- all:
263+
- changed-files:
264+
- any-glob-to-any-file: seatunnel-connectors-v2/connector-qdrant/**
265+
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(qdrant)/**'
266+
260267
typesense:
261268
- all:
262269
- changed-files:
@@ -285,4 +292,4 @@ sls:
285292
- all:
286293
- changed-files:
287294
- any-glob-to-any-file: seatunnel-connectors-v2/connector-sls/**
288-
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(sls)/**'
295+
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(sls)/**'

config/plugin_config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,5 +88,6 @@ connector-web3j
8888
connector-milvus
8989
connector-activemq
9090
connector-sls
91+
connector-qdrant
9192
connector-typesense
9293
connector-cdc-opengauss

docs/en/connector-v2/sink/Qdrant.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Qdrant
2+
3+
> Qdrant Sink Connector
4+
5+
## Description
6+
7+
[Qdrant](https://qdrant.tech/) is a high-performance vector search engine and vector database.
8+
9+
This connector can be used to write data into a Qdrant collection.
10+
11+
## Data Type Mapping
12+
13+
| SeaTunnel Data Type | Qdrant Data Type |
14+
|---------------------|------------------|
15+
| TINYINT | INTEGER |
16+
| SMALLINT | INTEGER |
17+
| INT | INTEGER |
18+
| BIGINT | INTEGER |
19+
| FLOAT | DOUBLE |
20+
| DOUBLE | DOUBLE |
21+
| BOOLEAN | BOOL |
22+
| STRING | STRING |
23+
| ARRAY | LIST |
24+
| FLOAT_VECTOR | DENSE_VECTOR |
25+
| BINARY_VECTOR | DENSE_VECTOR |
26+
| FLOAT16_VECTOR | DENSE_VECTOR |
27+
| BFLOAT16_VECTOR | DENSE_VECTOR |
28+
| SPARSE_FLOAT_VECTOR | SPARSE_VECTOR |
29+
30+
The value of the primary key column will be used as point ID in Qdrant. If no primary key is present, a random UUID will be used.
31+
32+
## Options
33+
34+
| name | type | required | default value |
35+
|-----------------|--------|----------|---------------|
36+
| collection_name | string | yes | - |
37+
| batch_size | int | no | 64 |
38+
| host | string | no | localhost |
39+
| port | int | no | 6334 |
40+
| api_key | string | no | - |
41+
| use_tls | int | no | false |
42+
| common-options | | no | - |
43+
44+
### collection_name [string]
45+
46+
The name of the Qdrant collection to read data from.
47+
48+
### batch_size [int]
49+
50+
The batch size of each upsert request to Qdrant.
51+
52+
### host [string]
53+
54+
The host name of the Qdrant instance. Defaults to "localhost".
55+
56+
### port [int]
57+
58+
The gRPC port of the Qdrant instance.
59+
60+
### api_key [string]
61+
62+
The API key to use for authentication if set.
63+
64+
### use_tls [bool]
65+
66+
Whether to use TLS(SSL) connection. Required if using Qdrant cloud(https).
67+
68+
### common options
69+
70+
Sink plugin common parameters, please refer to [Source Common Options](../sink-common-options.md) for details.

docs/en/connector-v2/source/Qdrant.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Qdrant
2+
3+
> Qdrant source connector
4+
5+
## Description
6+
7+
[Qdrant](https://qdrant.tech/) is a high-performance vector search engine and vector database.
8+
9+
This connector can be used to read data from a Qdrant collection.
10+
11+
## Options
12+
13+
| name | type | required | default value |
14+
|-----------------|--------|----------|---------------|
15+
| collection_name | string | yes | - |
16+
| schema | config | yes | - |
17+
| host | string | no | localhost |
18+
| port | int | no | 6334 |
19+
| api_key | string | no | - |
20+
| use_tls | int | no | false |
21+
| common-options | | no | - |
22+
23+
### collection_name [string]
24+
25+
The name of the Qdrant collection to read data from.
26+
27+
### schema [config]
28+
29+
The schema of the table to read data into.
30+
31+
Eg:
32+
33+
```hocon
34+
schema = {
35+
fields {
36+
age = int
37+
address = string
38+
some_vector = float_vector
39+
}
40+
}
41+
```
42+
43+
Each entry in Qdrant is called a point.
44+
45+
The `float_vector` type columns are read from the vectors of each point, others are read from the JSON payload associated with the point.
46+
47+
If a column is marked as primary key, the ID of the Qdrant point is written into it. It can be of type `"string"` or `"int"`. Since Qdrant only [allows](https://qdrant.tech/documentation/concepts/points/#point-ids) positive integers and UUIDs as point IDs.
48+
49+
If the collection was created with a single default/unnamed vector, use `default_vector` as the vector name.
50+
51+
```hocon
52+
schema = {
53+
fields {
54+
age = int
55+
address = string
56+
default_vector = float_vector
57+
}
58+
}
59+
```
60+
61+
The ID of the point in Qdrant will be written into the column which is marked as the primary key. It can be of type `int` or `string`.
62+
63+
### host [string]
64+
65+
The host name of the Qdrant instance. Defaults to "localhost".
66+
67+
### port [int]
68+
69+
The gRPC port of the Qdrant instance.
70+
71+
### api_key [string]
72+
73+
The API key to use for authentication if set.
74+
75+
### use_tls [bool]
76+
77+
Whether to use TLS(SSL) connection. Required if using Qdrant cloud(https).
78+
79+
### common options
80+
81+
Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.

docs/zh/connector-v2/sink/Qdrant.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Qdrant
2+
3+
> Qdrant 数据连接器
4+
5+
[Qdrant](https://qdrant.tech/) 是一个高性能的向量搜索引擎和向量数据库。
6+
7+
该连接器可用于将数据写入 Qdrant 集合。
8+
9+
## 数据类型映射
10+
11+
| SeaTunnel 数据类型 | Qdrant 数据类型 |
12+
|---------------------|---------------|
13+
| TINYINT | INTEGER |
14+
| SMALLINT | INTEGER |
15+
| INT | INTEGER |
16+
| BIGINT | INTEGER |
17+
| FLOAT | DOUBLE |
18+
| DOUBLE | DOUBLE |
19+
| BOOLEAN | BOOL |
20+
| STRING | STRING |
21+
| ARRAY | LIST |
22+
| FLOAT_VECTOR | DENSE_VECTOR |
23+
| BINARY_VECTOR | DENSE_VECTOR |
24+
| FLOAT16_VECTOR | DENSE_VECTOR |
25+
| BFLOAT16_VECTOR | DENSE_VECTOR |
26+
| SPARSE_FLOAT_VECTOR | SPARSE_VECTOR |
27+
28+
主键列的值将用作 Qdrant 中的点 ID。如果没有主键,则将使用随机 UUID。
29+
30+
## 选项
31+
32+
| 名称 | 类型 | 必填 | 默认值 |
33+
|-----------------|--------|----|-----------|
34+
| collection_name | string || - |
35+
| batch_size | int || 64 |
36+
| host | string || localhost |
37+
| port | int || 6334 |
38+
| api_key | string || - |
39+
| use_tls | bool || false |
40+
| common-options | || - |
41+
42+
### collection_name [string]
43+
44+
要从中读取数据的 Qdrant 集合的名称。
45+
46+
### batch_size [int]
47+
48+
每个 upsert 请求到 Qdrant 的批量大小。
49+
50+
### host [string]
51+
52+
Qdrant 实例的主机名。默认为 "localhost"。
53+
54+
### port [int]
55+
56+
Qdrant 实例的 gRPC 端口。
57+
58+
### api_key [string]
59+
60+
用于身份验证的 API 密钥(如果设置)。
61+
62+
### use_tls [bool]
63+
64+
是否使用 TLS(SSL)连接。如果使用 Qdrant 云(https),则需要。
65+
66+
### 通用选项
67+
68+
接收插件的通用参数,请参考[源通用选项](../sink-common-options.md)了解详情。

docs/zh/connector-v2/source/Qdrant.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Qdrant
2+
3+
> Qdrant 数据源连接器
4+
5+
[Qdrant](https://qdrant.tech/) 是一个高性能的向量搜索引擎和向量数据库。
6+
7+
该连接器可用于从 Qdrant 集合中读取数据。
8+
9+
## 选项
10+
11+
| 名称 | 类型 | 必填 | 默认值 |
12+
|-----------------|--------|----|-----------|
13+
| collection_name | string || - |
14+
| schema | config || - |
15+
| host | string || localhost |
16+
| port | int || 6334 |
17+
| api_key | string || - |
18+
| use_tls | bool || false |
19+
| common-options | || - |
20+
21+
### collection_name [string]
22+
23+
要从中读取数据的 Qdrant 集合的名称。
24+
25+
### schema [config]
26+
27+
要将数据读取到的表的模式。
28+
29+
例如:
30+
31+
```hocon
32+
schema = {
33+
fields {
34+
age = int
35+
address = string
36+
some_vector = float_vector
37+
}
38+
}
39+
```
40+
41+
Qdrant 中的每个条目称为一个点。
42+
43+
`float_vector` 类型的列从每个点的向量中读取,其他列从与该点关联的 JSON 有效负载中读取。
44+
45+
如果列被标记为主键,Qdrant 点的 ID 将写入其中。它可以是 `"string"``"int"` 类型。因为 Qdrant 仅[允许](https://qdrant.tech/documentation/concepts/points/#point-ids)使用正整数和 UUID 作为点 ID。
46+
47+
如果集合是用单个默认/未命名向量创建的,请使用 `default_vector` 作为向量名称。
48+
49+
```hocon
50+
schema = {
51+
fields {
52+
age = int
53+
address = string
54+
default_vector = float_vector
55+
}
56+
}
57+
```
58+
59+
Qdrant 中点的 ID 将写入标记为主键的列中。它可以是 `int``string` 类型。
60+
61+
### host [string]
62+
63+
Qdrant 实例的主机名。默认为 "localhost"。
64+
65+
### port [int]
66+
67+
Qdrant 实例的 gRPC 端口。
68+
69+
### api_key [string]
70+
71+
用于身份验证的 API 密钥(如果设置)。
72+
73+
### use_tls [bool]
74+
75+
是否使用 TLS(SSL)连接。如果使用 Qdrant 云(https),则需要。
76+
77+
### 通用选项
78+
79+
源插件的通用参数,请参考[源通用选项](../source-common-options.md)了解详情。****

plugin-mapping.properties

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,8 @@ seatunnel.sink.ObsFile = connector-file-obs
131131
seatunnel.source.Milvus = connector-milvus
132132
seatunnel.sink.Milvus = connector-milvus
133133
seatunnel.sink.ActiveMQ = connector-activemq
134+
seatunnel.source.Qdrant = connector-qdrant
135+
seatunnel.sink.Qdrant = connector-qdrant
134136
seatunnel.source.Sls = connector-sls
135137
seatunnel.source.Typesense = connector-typesense
136138
seatunnel.sink.Typesense = connector-typesense

0 commit comments

Comments
 (0)