PostgreSQL + FDW + vector 插件加速向量检索 - 在不确定世界寻找确定答案 (例如图像相似)

作者

digoal

日期

2022-03-02

背景

向量: 简单的理解是一个number数组, 非结构化对象(图像、视频、音频等)经过转换可以成为N维的number数组, 包含了非结构化对象的特征, 针对数组可以进行识别、相似等搜索(向量检索).

向量检索属于比较耗费CPU和IO的查询, 即使有索引也如此, 所以首先会想到的是怎么提升扩展性, 例如怎么用多机资源提高请求处理吞吐能力.

本文介绍PostgreSQL postgres_fdw和vector插件的组合使用. 加速图像识别相似搜索, 但是还有一些并行的问题, 建议再和postgresql社区沟通一下fdw的问题. 目前看来性能最好的还是dblink异步调用的方式.

当然, 如果单机用vector插件和向量索引就能满足你的需求, 就不需要看下去了.

需要提醒一下, 一定要看看vector的ivfflat原理, 在造数据时考虑实际数据集的特征, 不要完全随机, 也不要完全一样的数据. 在创建索引时分几个桶、查询时找几个中心点的桶进行筛选, 都需要注意.

其他方法亦可参考: citus, dblink异步, polardb for postgresql(https://github.com/ApsaraDB/PolarDB-for-PostgreSQL).

postgres_fdw 如何实现select并行

https://www.postgresql.org/docs/devel/postgres-fdw.html

1、关于SELECT语句WHERE条件的下推必须符合:

they use only data types, operators, and functions that are built-in
or belong to an extension that's listed in the foreign server's extensions option.
Operators and functions in such clauses must be IMMUTABLE as well.

2、支持fdw分区表

已支持

3、支持fdw分区异步查询, 并行多fdw server同时查询

《PostgreSQL 14 preview - FDW 支持异步执行接口, postgres_fdw 支持异步append - sharding 性能增强 - 未来将支持更多异步操作》
已支持, 但是不是所有请求都会开启async foreign scan. 详情请研究postgres_fdw代码.

4、支持分区并行开关

已支持

5、sort 下推

已支持

6、merge sort append

已支持

7、limit 下推或 fetch_size (默认是100)设置

已支持

例子

部署vector插件

git clone https://github.com/pgvector/pgvector  
  
export PATH=/Users/digoal/pg14/bin:$PATH  
export PGDATA=/Users/digoal/data14  
export PGUSER=postgres  
export PGPORT=1922  
  
USE_PGXS=1 make  
make install

由于是本机测试, 所以配置一下密码登录认证

pg_hba.conf  
host    all             all             127.0.0.1/32            md5  
  
pg_ctl reload

创建角色

postgres=# create role test login encrypted password 'test123';  
CREATE ROLE

创建几个数据库(真实场景使用多机上不同的实例, 本文测试采用同一个实例中不同的database来模拟).

create database db0;  
create database db1;  
create database db2;  
create database db3;  
create database db4;  
postgres=# grant all on database db0,db1,db2,db3,db4 to test;  
GRANT  


\c db0;  
create extension postgres_fdw;  
create extension vector ;  

\c db1  
create extension vector ;  
\c db2  
create extension vector ;  
\c db3  
create extension vector ;  
\c db4  
create extension vector ;

db0 为查询入口, 所以在db0创建入口表、postgres_fdw插件、vector插件和fdw分区表.

db0=# \c db0 test  
You are now connected to database "db0" as user "test".  
db0=>   
  
CREATE TABLE tbl (id int, c1 vector(32), c2 text, c3 timestamp) PARTITION BY hash (id);  
CREATE INDEX idx_tbl_1 ON tbl USING ivfflat (c1 vector_l2_ops);  
SELECT * FROM tbl ORDER BY c1 <-> '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' LIMIT 5;

db1,db2,db3,db4 / test role 登录

CREATE TABLE tbl (id int, c1 vector(32), c2 text, c3 timestamp);  
CREATE INDEX idx_tbl_1 ON tbl USING ivfflat (c1 vector_l2_ops);

db0 , 创建foreign server 开启异步请求, 设置插件参数(使得op可以下推). 配置user mapping.

\c db0 postgres  
db0=# show port;  
 port   
------  
 1922  
(1 row)  
  
  
create server db1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname 'db1', port '1922', async_capable 'true', extensions 'vector', batch_size '200', use_remote_estimate 'true');  
create server db2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname 'db2', port '1922', async_capable 'true', extensions 'vector', batch_size '200', use_remote_estimate 'true');  
create server db3 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname 'db3', port '1922', async_capable 'true', extensions 'vector', batch_size '200', use_remote_estimate 'true');  
create server db4 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname 'db4', port '1922', async_capable 'true', extensions 'vector', batch_size '200', use_remote_estimate 'true');  
  
grant all on FOREIGN server db1,db2,db3,db4 to test;  
  
  
CREATE USER MAPPING FOR test SERVER db1 OPTIONS (user 'test', password 'test123');  
CREATE USER MAPPING FOR test SERVER db2 OPTIONS (user 'test', password 'test123');  
CREATE USER MAPPING FOR test SERVER db3 OPTIONS (user 'test', password 'test123');  
CREATE USER MAPPING FOR test SERVER db4 OPTIONS (user 'test', password 'test123');

db0, 创建fdw分区表, 参数参考postgres_fdw帮助手册, 不一定要完全参照本文, 一定要理解清楚配置的目的(例如返回量大fetch_size也可以调大, 返回量小可以调小).

\c db0 test  
  
CREATE FOREIGN TABLE tbl_0  
    PARTITION OF tbl FOR VALUES WITH ( MODULUS 4, REMAINDER 0)  
    SERVER db1 OPTIONS (schema_name 'public', table_name 'tbl', async_capable 'true', fetch_size '1');  
  
CREATE FOREIGN TABLE tbl_1  
    PARTITION OF tbl FOR VALUES WITH ( MODULUS 4, REMAINDER 1)  
    SERVER db2 OPTIONS (schema_name 'public', table_name 'tbl', async_capable 'true', fetch_size '1');  
  
CREATE FOREIGN TABLE tbl_2  
    PARTITION OF tbl FOR VALUES WITH ( MODULUS 4, REMAINDER 2)  
    SERVER db3 OPTIONS (schema_name 'public', table_name 'tbl', async_capable 'true', fetch_size '1');  
  
CREATE FOREIGN TABLE tbl_3  
    PARTITION OF tbl FOR VALUES WITH ( MODULUS 4, REMAINDER 3)  
    SERVER db4 OPTIONS (schema_name 'public', table_name 'tbl', async_capable 'true', fetch_size '1');

确认vector的欧式距离计算操作符对应的是immutable函数

\do+  
                                                        List of operators  
 Schema | Name | Left arg type | Right arg type |   Result type    |           Function            |         Description            
--------+------+---------------+----------------+------------------+-------------------------------+------------------------------  
 public | <#>  | vector        | vector         | double precision | vector_negative_inner_product |   
 public | <->  | vector        | vector         | double precision | l2_distance                   |   
 public | <=>  | vector        | vector         | double precision | cosine_distance               |   
  
  
postgres=# \df+ l2_distance  
                                                                               List of functions  
 Schema |    Name     | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Source code | Description   
--------+-------------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------  
 public | l2_distance | double precision | vector, vector      | func | immutable  | safe     | postgres | invoker  |                   | c        | l2_distance |   
(1 row)

查询一下影响fdw 分区并行的参数是否都开启了.

postgres=# show enable_gathermerge ;  
 enable_gathermerge   
--------------------  
 on  
(1 row)  
  
postgres=# show enable_async_append ;  
 enable_async_append   
---------------------  
 on  
(1 row)  
  
db0=> show enable_parallel_append ;  
 enable_parallel_append   
------------------------  
 on  
(1 row)

检查执行计划.

db0=> explain (verbose) SELECT * FROM tbl ORDER BY c1 <-> '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' LIMIT 5;  
                                                                 QUERY PLAN                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=400.04..400.36 rows=5 width=84)  
   Output: tbl.id, tbl.c1, tbl.c2, tbl.c3, ((tbl.c1 <-> '[1,2,3]'::vector))  
   ->  Merge Append  (cost=400.04..610.20 rows=3276 width=84)  
         Sort Key: ((tbl.c1 <-> '[1,2,3]'::vector))  
         ->  Foreign Scan on public.tbl_0 tbl_1  (cost=100.00..140.26 rows=819 width=84)  
               Output: tbl_1.id, tbl_1.c1, tbl_1.c2, tbl_1.c3, (tbl_1.c1 <-> '[1,2,3]'::vector)  
               Remote SQL: SELECT id, c1, c2, c3 FROM public.tbl ORDER BY (c1 OPERATOR(public.<->) '[1,2,3]'::public.vector) ASC NULLS LAST  
         ->  Foreign Scan on public.tbl_1 tbl_2  (cost=100.00..140.26 rows=819 width=84)  
               Output: tbl_2.id, tbl_2.c1, tbl_2.c2, tbl_2.c3, (tbl_2.c1 <-> '[1,2,3]'::vector)  
               Remote SQL: SELECT id, c1, c2, c3 FROM public.tbl ORDER BY (c1 OPERATOR(public.<->) '[1,2,3]'::public.vector) ASC NULLS LAST  
         ->  Foreign Scan on public.tbl_2 tbl_3  (cost=100.00..140.26 rows=819 width=84)  
               Output: tbl_3.id, tbl_3.c1, tbl_3.c2, tbl_3.c3, (tbl_3.c1 <-> '[1,2,3]'::vector)  
               Remote SQL: SELECT id, c1, c2, c3 FROM public.tbl ORDER BY (c1 OPERATOR(public.<->) '[1,2,3]'::public.vector) ASC NULLS LAST  
         ->  Foreign Scan on public.tbl_3 tbl_4  (cost=100.00..140.26 rows=819 width=84)  
               Output: tbl_4.id, tbl_4.c1, tbl_4.c2, tbl_4.c3, (tbl_4.c1 <-> '[1,2,3]'::vector)  
               Remote SQL: SELECT id, c1, c2, c3 FROM public.tbl ORDER BY (c1 OPERATOR(public.<->) '[1,2,3]'::public.vector) ASC NULLS LAST  
 Query Identifier: -3107671033622996886  
(17 rows)  
  
  
db0=> explain (verbose) select count(*) from tbl;  
                                          QUERY PLAN                                             
-----------------------------------------------------------------------------------------------  
 Aggregate  (cost=951.95..951.96 rows=1 width=8)  
   Output: count(*)  
   ->  Append  (cost=100.00..917.82 rows=13652 width=0)  
         ->  Async Foreign Scan on public.tbl_0 tbl_1  (cost=100.00..212.39 rows=3413 width=0)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_1 tbl_2  (cost=100.00..212.39 rows=3413 width=0)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_2 tbl_3  (cost=100.00..212.39 rows=3413 width=0)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_3 tbl_4  (cost=100.00..212.39 rows=3413 width=0)  
               Remote SQL: SELECT NULL FROM public.tbl  
 Query Identifier: -7696835127160622742  
(12 rows)

写入100万数据后再检查一下计划.

db0=> insert into tbl select generate_series(1,1000000), '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]', 'test', now();  
  
db1=> explain (analyze,verbose) select count(*) from tbl;  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=8924.86..8924.87 rows=1 width=8) (actual time=51.989..51.991 rows=1 loops=1)  
   Output: count(*)  
   ->  Seq Scan on public.tbl  (cost=0.00..8300.89 rows=249589 width=0) (actual time=0.016..34.891 rows=249589 loops=1)  
         Output: id, c1, c2, c3  
 Query Identifier: 5443052778932622058  
 Planning Time: 2.148 ms  
 Execution Time: 52.154 ms  
(7 rows)  
  
db0=> explain (analyze,verbose) select count(*) from tbl;   
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=951.95..951.96 rows=1 width=8) (actual time=528.009..528.016 rows=1 loops=1)  
   Output: count(*)  
   ->  Append  (cost=100.00..917.82 rows=13652 width=0) (actual time=6.367..467.490 rows=1000000 loops=1)  
         ->  Async Foreign Scan on public.tbl_0 tbl_1  (cost=100.00..212.39 rows=3413 width=0) (actual time=1.964..85.057 rows=249589 loops=1)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_1 tbl_2  (cost=100.00..212.39 rows=3413 width=0) (actual time=1.430..82.288 rows=250376 loops=1)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_2 tbl_3  (cost=100.00..212.39 rows=3413 width=0) (actual time=1.422..79.067 rows=249786 loops=1)  
               Remote SQL: SELECT NULL FROM public.tbl  
         ->  Async Foreign Scan on public.tbl_3 tbl_4  (cost=100.00..212.39 rows=3413 width=0) (actual time=1.557..76.345 rows=250249 loops=1)  
               Remote SQL: SELECT NULL FROM public.tbl  
 Query Identifier: 6515815319459192952  
 Planning Time: 2.976 ms  
 Execution Time: 582.248 ms  
(14 rows)

并不是50ms左右完成, 而且count没有下推.

db0=> SELECT * FROM tbl ORDER BY c1 <-> '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' LIMIT 5;  
 id |                                            c1                                            |  c2  |             c3               
----+------------------------------------------------------------------------------------------+------+----------------------------  
  1 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 12 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 14 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 16 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 17 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
(5 rows)  
  
  
db1=> SELECT * FROM tbl ORDER BY c1 <-> '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' LIMIT 5;  
 id |                                            c1                                            |  c2  |             c3               
----+------------------------------------------------------------------------------------------+------+----------------------------  
  1 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 12 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 14 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 16 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 17 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
(5 rows)  
  
  
db2=> SELECT * FROM tbl ORDER BY c1 <-> '[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' LIMIT 5;  
 id |                                            c1                                            |  c2  |             c3               
----+------------------------------------------------------------------------------------------+------+----------------------------  
  3 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
  5 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
  8 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
  9 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
 11 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-02 15:55:26.214479  
(5 rows)

发现了几个问题:

并不是所有请求都用了async foreign scan , 发起请求后需要等待返回第一批结果后再向第二个ftbl发送请求.

\c db1
insert into tbl select generate_series(1,1), '[2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]', 'test', now();  

\c db2
insert into tbl select generate_series(1,1), '[2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]', 'test', now();  


db0=> select * from
(select * from tbl_0
union all                                                                                                                                                                    select * from tbl_1 ) t                                                                                                                                                      order by c1 <-> '[2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' limit 5;
   id   |                                            c1                                            |  c2  |             c3             
--------+------------------------------------------------------------------------------------------+------+----------------------------
      1 | [2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:27:54.132848
 862019 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
 862021 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
 862024 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
 862029 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
(5 rows)

Time: 215.103 ms

db0=> select * from
(select * from tbl_0
union all
select * from tbl_1
union all
select * from tbl_2 ) t
order by c1 <-> '[2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]' limit 5;
   id   |                                            c1                                            |  c2  |             c3             
--------+------------------------------------------------------------------------------------------+------+----------------------------
      1 | [2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:27:54.132848
      1 | [2,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:28:52.980465
 864133 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
 864134 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
 864136 | [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] | test | 2022-03-03 11:11:52.261664
(5 rows)

Time: 348.644 ms

count虽然是build-in聚合函数, 并且是parallel safe的, 但是没有下推.

db0=> \df+ count
                                                                                                          List of functions
   Schema   | Name  | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language |   Source code   |                           Description                           
------------+-------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-----------------+-----------------------------------------------------------------
 pg_catalog | count | bigint           |                     | agg  | immutable  | safe     | postgres | invoker  |                   | internal | aggregate_dummy | number of input rows
 pg_catalog | count | bigint           | "any"               | agg  | immutable  | safe     | postgres | invoker  |                   | internal | aggregate_dummy | number of input rows for which the input expression is not null
(2 rows)

希望fdw的异步越来越完善: 《PostgreSQL 15 preview - postgres_fdw 支持更多异步, 增强基于FDW的sharding能力.例如 DML异步写入、分区表、union all、union》

参考

《PostgreSQL 多维、图像欧式距离、向量距离、向量相似查询优化 - cube,imgsmlr - 压缩、分段、异步并行》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 3 - citus 8机128shard (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 2 - 单机分区表 (dblink 异步调用并行) (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 1 - 单机单表 (4亿图像)》
《PostgreSQL 相似搜索插件介绍大汇总 (cube,rum,pg_trgm,smlar,imgsmlr,pg_similarity) (rum,gin,gist)》
《PostgreSQL 开源高维向量相似搜索插件 vector - 关联阿里云rds pg pase, cube, 人脸识别》
《PostgreSQL 开源高维向量相似搜索插件 vector - 关联阿里云rds pg pase, cube, 人脸识别》
《PostgreSQL 应用开发解决方案最佳实践系列课程 - 7. 标签搜索和圈选、相似搜索和圈选、任意字段组合搜索和圈选系统》
《PostgreSQL 应用开发解决方案最佳实践系列课程 - 3. 人脸识别和向量相似搜索》
《PostgreSQL 文本相似搜索 - pg_trgm_pro - 包含则1, 不包含则计算token相似百分比》
《PostgreSQL 在资源搜索中的设计 - pase, smlar, pg_trgm - 标签+权重相似排序 - 标签的命中率排序》
《PostgreSQL 模糊查询、相似查询 (like '%xxx%') pg_bigm 比 pg_trgm 优势在哪?》
《PostgreSQL 向量相似推荐设计 - pase》
《社交、电商、游戏等推荐系统 (相似推荐) - 阿里云pase smlar索引方案对比》
《PostgreSQL ghtree实现的海明距离排序索引, 性能不错(模糊图像) - pg-knn_hamming - bit string 比特字符串相似度搜索》
《PostgreSQL bktree 索引using gist例子 - 海明距离检索 - 短文相似、模糊图像搜索 - bit string 比特字符串相似度搜索》
《阿里云PostgreSQL案例精选2 - 图像识别、人脸识别、相似特征检索、相似人群圈选》
《PostgreSQL+MySQL 联合解决方案 - 第12课视频 - 全文检索、中文分词、模糊查询、相似文本查询》
《PostgreSQL+MySQL 联合解决方案 - 第11课视频 - 多维向量相似搜索 - 图像识别、相似人群圈选等》
《PostgreSQL+MySQL 联合解决方案 - 第9课视频 - 实时精准营销(精准圈选、相似扩选、用户画像)》
《画像系统标准化设计 - PostgreSQL roaringbitmap, varbitx , 正向关系, 反向关系, 圈选, 相似扩选(向量相似扩选)》
《阿里云PostgreSQL 向量搜索、相似搜索、图像搜索插件 palaemon - ivfflat , hnsw , nsg , ssg》
《PostgreSQL 多维、图像欧式距离、向量距离、向量相似查询优化 - cube,imgsmlr - 压缩、分段、异步并行》
《PostgreSQL 相似人群圈选，人群扩选，向量相似使用实践 - cube》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 3 - citus 8机128shard (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 2 - 单机分区表 (dblink 异步调用并行) (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 1 - 单机单表 (4亿图像)》
《PostgreSQL 相似搜索插件介绍大汇总 (cube,rum,pg_trgm,smlar,imgsmlr,pg_similarity) (rum,gin,gist)》
《Greenplum 轨迹相似(伴随分析)》
《PostgreSQL 相似文本检索与去重 - (银屑病怎么治？银屑病怎么治疗？银屑病怎么治疗好？银屑病怎么能治疗好？)》
《PostgreSQL 相似搜索分布式架构设计与实践 - dblink异步调用与多机并行(远程游标+记录 UDF实例)》
《PostgreSQL 相似搜索设计与性能 - 地址、QA、POI等文本毫秒级相似搜索实践》
《PostgreSQL 遗传学应用 - 矩阵相似距离计算 (欧式距离,...XX距离)》
《用PostgreSQL 做实时高效搜索引擎 - 全文检索、模糊查询、正则查询、相似查询、ADHOC查询》
《HTAP数据库 PostgreSQL 场景与性能测试之 17 - (OLTP) 数组相似查询》
《HTAP数据库 PostgreSQL 场景与性能测试之 16 - (OLTP) 文本特征向量 - 相似特征(海明...)查询》
《HTAP数据库 PostgreSQL 场景与性能测试之 13 - (OLTP) 字符串搜索 - 相似查询》
《海量数据,海明(simhash)距离高效检索(smlar) - 阿里云RDS PosgreSQL最佳实践 - bit string 比特字符串相似度搜索》
《17种文本相似算法与GIN索引 - pg_similarity》
《PostgreSQL结合余弦、线性相关算法在文本、图片、数组相似等领域的应用 - 3 rum, smlar应用场景分析》
《PostgreSQL结合余弦、线性相关算法在文本、图片、数组相似等领域的应用 - 2 smlar插件详解》
《PostgreSQL结合余弦、线性相关算法在文本、图片、数组相似等领域的应用 - 1 文本(关键词)分析理论基础 - TF(Term Frequency 词频)/IDF(Inverse Document Frequency 逆向文本频率)》
《导购系统 - 电商内容去重\内容筛选应用(实时识别转载\盗图\侵权?) - 文本、图片集、商品集、数组相似判定的优化和索引技术》
《从相似度算法谈起 - Effective similarity search in PostgreSQL》
《聊一聊双十一背后的技术 - 毫秒分词算啥, 试试正则和相似度》
《PostgreSQL 文本数据分析实践之 - 相似度分析》

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20220302_01.md

20220302_01.md

PostgreSQL + FDW + vector 插件加速向量检索 - 在不确定世界寻找确定答案 (例如图像相似)

作者

日期

标签

背景

postgres_fdw 如何实现select并行

例子

参考

期望 PostgreSQL 增加什么功能?

PolarDB for PostgreSQL云原生分布式开源数据库

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥

Files

20220302_01.md

Latest commit

History

20220302_01.md

File metadata and controls

PostgreSQL + FDW + vector 插件加速向量检索 - 在不确定世界寻找确定答案 (例如图像相似)

作者

日期

标签

背景

postgres_fdw 如何实现select并行

例子

参考

期望 PostgreSQL 增加什么功能?

PolarDB for PostgreSQL云原生分布式开源数据库

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥