Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
324 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,322 @@ | ||
## PostgreSQL 数组里面的元素,模糊搜索,模糊查询,like,前后百分号,正则查询,倒排索引 | ||
### 作者 | ||
digoal | ||
|
||
### 日期 | ||
2019-03-20 | ||
|
||
### 标签 | ||
PostgreSQL , 数组 , 元素模糊匹配 , 倒排索引 , gin | ||
|
||
---- | ||
## 背景 | ||
PostgreSQL通过倒排索引,支持模糊查询。 | ||
|
||
[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md) | ||
|
||
如果你使用了数组类型,那么在数组里面的元素如何实现模糊查询呢? | ||
|
||
同样可以使用PG的GIN倒排索引。 | ||
|
||
## 例子 | ||
1、新建一个数据库,为了支持中文模糊查询,请务必不要使用lc_ctype='C'的数据库,这个无法改,不能支持中文模糊查询。 | ||
|
||
``` | ||
db4=# create database newdb1 with template template0 encoding 'UTF8' lc_collate 'C' lc_ctype 'en_US.utf8'; | ||
CREATE DATABASE | ||
``` | ||
|
||
2、创建pg_trgm插件,支持模糊查询。 | ||
|
||
``` | ||
db4=# \c newdb1 | ||
You are now connected to database "newdb1" as user "postgres". | ||
newdb1=# create extension pg_trgm; | ||
CREATE EXTENSION | ||
``` | ||
|
||
3、新建测试表,文本数组 | ||
|
||
``` | ||
newdb1=# create table t (i text[]); | ||
CREATE TABLE | ||
``` | ||
|
||
4、新建immutable函数,将数组转换为文本类型(因为表达式索引的表达式、函数、操作符等都必须是immutable的) | ||
|
||
``` | ||
newdb1=# create or replace function imm_array_to_string (anyarray,text) returns text as $$ | ||
select array_to_string($1,$2); | ||
$$ language sql strict immutable; | ||
CREATE FUNCTION | ||
``` | ||
|
||
5、创建索引 | ||
|
||
``` | ||
newdb1=# create index idx_t_1 on t using gin ( imm_array_to_string(i,',') gin_trgm_ops); | ||
CREATE INDEX | ||
``` | ||
|
||
6、模糊查询语法测试,已支持索引检索。 | ||
|
||
``` | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Bitmap Heap Scan on t (cost=14.35..22.33 rows=7 width=32) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..14.35 rows=7 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text) | ||
(4 rows) | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Bitmap Heap Scan on t (cost=668.78..1051.63 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..668.71 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text) | ||
(4 rows) | ||
Time: 0.462 ms | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL%'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------- | ||
Bitmap Heap Scan on t (cost=557.68..940.53 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..557.61 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text) | ||
(4 rows) | ||
Time: 0.484 ms | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text) | ||
(4 rows) | ||
Time: 0.473 ms | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '%你好PostgreSQL'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------- | ||
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '%你好PostgreSQL'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~* '%你好PostgreSQL'::text) | ||
(4 rows) | ||
Time: 0.918 ms | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好PostgreSQL'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Bitmap Heap Scan on t (cost=723.78..1106.63 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '你好PostgreSQL'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..723.71 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~* '你好PostgreSQL'::text) | ||
(4 rows) | ||
Time: 0.816 ms | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好%PostgreSQL'; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------- | ||
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '你好%PostgreSQL'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~~* '你好%PostgreSQL'::text) | ||
(4 rows) | ||
Time: 0.796 ms | ||
``` | ||
|
||
|
||
### 包含任意元素:元素以xx开头 | ||
``` | ||
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5; | ||
不区分大小写 | ||
select * from t where imm_array_to_string(i,',') ~* '[,|^]d0dc1' limit 5; | ||
``` | ||
|
||
### 包含任意元素:元素以xx结尾 | ||
``` | ||
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5; | ||
不区分大小写 | ||
select * from t where imm_array_to_string(i,',') ~* '19d0b[,|$]' limit 5; | ||
``` | ||
|
||
### 包含任意元素:元素内包含xx | ||
``` | ||
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5; | ||
不区分大小写 | ||
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5; | ||
``` | ||
|
||
## 性能测试 | ||
1、创建函数,生成随机字符串数组 | ||
|
||
``` | ||
create or replace function gen_rand_arr(int) returns text[] as $$ | ||
select array(select md5(random()::text) from generate_series(1,$1)); | ||
$$ language sql strict; | ||
``` | ||
|
||
``` | ||
newdb1=# select gen_rand_arr(5); | ||
-[ RECORD 1 ]+----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
gen_rand_arr | {9d193327808319b0d66c12e29eaa33f6,5d761dad5b6e886eb87480d86d8640fb,07856d9f11b156bde20b527192c00361,35895b270c7665afd34d62f58220a83a,309b4d117f07974cb938bab890a7f327} | ||
``` | ||
|
||
2、写入大量数据 | ||
|
||
``` | ||
vi test.sql | ||
insert into t select gen_rand_arr(8) from generate_series(1,100); | ||
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 120 newdb1 | ||
``` | ||
|
||
3、记录数 | ||
|
||
``` | ||
newdb1=# select count(*) from t; | ||
count | ||
--------- | ||
2816900 | ||
(1 row) | ||
``` | ||
|
||
4、收集统计信息,将索引的pending list合并。 | ||
|
||
``` | ||
newdb1=# vacuum (analyze,verbose) t; | ||
``` | ||
|
||
5、查询效率 | ||
|
||
|
||
### 包含任意元素:元素以xx开头 | ||
|
||
``` | ||
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5; | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Limit (cost=224.38..231.17 rows=5 width=312) | ||
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text) | ||
(5 rows) | ||
newdb1=# select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5; | ||
i | ||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
------------------------------ | ||
{1a2dfc55b7b988a2483c6243d8c2ecf9,d0dc1a12d874d6a93fe769982d0095d4,362c933460bf5a5c4920153b4dc7de69,cef755962c451ccb892ff23405763a14,82466b1477f74e136b4c0d38fcee703e,26e04120f3d16a3f6763e5d95bfa6069,10dccf2d7372f88437dbc501c2a7181d,2b9b | ||
8fef33d8ed03178d77b7ab0eabe1} | ||
{2a6beb43a3dcb0f880f564b402931ece,6f623041e6291ddef51c2b353ff9b5c3,d0dc13e47510225b130de68332a5cf82,5abb5ec622a3bcde0e28e8ca9df9b0b2,123440fe5c6d1000223aaa667219fb02,7f25ad33f3114ded94c86c12681450f5,fa6748e69665fa5c1b5b497457bd3e59,40f8 | ||
3894401e278d72ffe923cd106f69} | ||
{5a826d56b29a6d35df7c026875f4d88c,ca998171aeadf1a9aa4ce5cf29216984,27cae196fbf43d352828271b236535d2,8d31bfd2dd50d0321de6d88a441ffdb0,fc405e0388c6420d647421cb31bc29d7,096af6a4e8b9e945392b8419ffa987b6,2243b26416b365ece9287f3c2251a1bd,d0dc | ||
146dff328d79311b120efd91ce59} | ||
{708d002fc1c940ae2f89b0dee1420f26,bcd71f398c104ed22b46afe551822f5c,d0dc1a1929ecf7df838012d40accf4dc,49627dd60c3639fcb5d4e5414a8e9417,5bf98eb3cf0f2740f8f25370972e63e8,df8b198bfe0fc1a465dc875fce4c7651,12525bdd07fcde399a5fe7673ba1f139,61f2 | ||
d94d7a710670f7115bab85502893} | ||
{963c013b3569bf5fbe3f68eeb51117d4,8d572873fd6c828c50be6a6e30c9bba5,d0dc1c96169c37d05b4979e7a4195005,6195d7bebf0a92380c1896ac9f2ad9b6,1ef2fba453aa86e571b338c188ddfea6,683b48f7bfcccd030700f00e28430b07,b413909126a0f427efc7ed64d25c4dd3,6a9f | ||
3ded525226d5d689adc6d611b9db} | ||
(5 rows) | ||
Time: 23.184 ms | ||
``` | ||
|
||
### 包含任意元素:元素以xx结尾 | ||
``` | ||
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5; | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5; | ||
QUERY PLAN | ||
------------------------------------------------------------------------------------ | ||
Limit (cost=224.38..231.17 rows=5 width=312) | ||
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text) | ||
(5 rows) | ||
newdb1=# select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5; | ||
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
i | {68e92a1eae2aab0283c003bf1abd5f90,f9dc4ce34dd4cac3fa8f26e816705227,41200ccb84d0dc364e0127b681010858,8636f55c5bfc09bbe3ccb2acefa19d0b,eb3579e6b44896bd04e1edade3f44de6,88a6c079f261d45f7ed9793bfd877bff,af4bc563d1be87bf9fb6953269fb7466,da73c8735908799ab1b168b24a7b73ab} | ||
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
i | {0661d60b8dffee5f51c6b6d7403aeceb,ba6434c35e17dc53e0c981136128a3d9,12ecc4a208c48bfc7870efcf01dc0a12,e9ab9565953b828e727dfb62222061d4,41d1d82a7a397de69d0e0436ea05d639,45f6564f866de964599850fc12119d0b,9e6415c9da0c83375926940684b330bd,a5981c3baf438e3bd66df816c3cd85fa} | ||
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
i | {dce1c900a5a4b8ff88694e4dd1f7727b,92d2f6b5e46139f854f0f0f795b04e6f,e6aae989ef89fcbe2aa045a250c19f0f,dea731b2a24ce4e91d9a5b83a0e35bd2,74aec9033979243c149b029a1c84948d,1f0021e7cd8ea1a32d480068d2aac632,bdb3d27aef3d8d9a8c87c52c25c19d0b,cf02176aa82cb369ba55de698047780e} | ||
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
i | {1866ab11ebacdc621c320f3376fe4b5f,efbcd77cb4f44e562bd19a6a50a38960,76cbcbc1c0f05ec4121ef20962baaaba,7ac92fb5785eda268b456675b7bf7dd9,1fb6e2ead6f5c9fa7959572d61e9a746,7b08eac90dcba3857a53eeeb43e19d0b,c1e7c7d29329243cf8c0a1bee25b44bf,40c27c32a3961f2b9182efbd6eb51ec5} | ||
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
i | {fd8103d4db72cdcfb4768febd61abeb4,b128ccb9e419f3b4be6c35b6e3b7ccd8,5e39686fd5c527e6d1905c31d1cabfa6,6e37d73b2838c5f74134a64d683b602b,1d6587aa90385e4f1f421cec319d4f7d,423b4ab4eeb87a15f3b164a8ecc69366,4a723c53547effc20c3a3b5916a19d0b,5d0db5a0b90ec54cc474456576156d87} | ||
Time: 23.950 ms | ||
``` | ||
|
||
### 包含任意元素:元素内包含xx | ||
``` | ||
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5; | ||
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5; | ||
QUERY PLAN | ||
-------------------------------------------------------------------------------- | ||
Limit (cost=113.28..120.07 rows=5 width=312) | ||
-> Bitmap Heap Scan on t (cost=113.28..496.13 rows=282 width=312) | ||
Recheck Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text) | ||
-> Bitmap Index Scan on idx_t_1 (cost=0.00..113.21 rows=282 width=0) | ||
Index Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text) | ||
(5 rows) | ||
newdb1=# select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5; | ||
i | ||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
------------------------------ | ||
{adfa0cebeef5e67f15d1428c1c42f9ec,95abce5cece413de8fbc2a90846e37bc,86b367a1d39b0de39b2259d368d7fe2e,02d1ffd0f89676139b88fcd1422bb9c5,374a49309103cfd1ff4deeb61d650e32,67da44f5ee6ee28f960d847c72bfdea5,78d39fab51e2ee91756c9c5dd955cc85,cca5 | ||
f52519170de192873ac5caf4a4c9} | ||
{7e45e4c81c9f8a184f1dab58adce061a,281bf5ac39c801d65ce5e5939a43d852,42c4a6a28d851653001b24a5296803e2,21cad5d25b736c70369fabcec9336c50,be9260c156f7165781db45419c1b64a8,20ed1a34337781b7bfbd6714d0fe89a5,9df8459f5b68ba52dd20c96baef200a6,d2b5 | ||
052977df6c4377b963d9dfd80914} | ||
{8f99bf47bdbeb6972e564b2ef0abce9f,e705fbcc909792a7ef9585e2a381ed6d,015eafcb49e4558ab99733b95c0a895b,34b2762f16a85f1a10bb4e1c21228591,2e86f833ca4f5208e84a57f0ef1e945e,f1262c8a852ab4e174627cf5e091f91c,888f4f36bfb0e4274e576f1afc831117,fe1c | ||
3638c18cd54da49311c09a279e5d} | ||
{95926e58ed85b5e40a748cf5894b1ed5,571a4aa15aad73036fd570356e16bd60,af23841c7dc79ce8c4a10d21e914f7fe,8306d637609dab1e6d8aced9d29cb172,c8ca176600bffa9de6f1d0e20c0aa490,f29522edfdbcf1b00d62a978ee1182fd,19d77522d9b1c85b419fdca00eb2320c,991d | ||
b9143192dabcee56b0bd5370b523} | ||
{e1378cb77c91879455302cda26cf8c84,0605fb91660d844acf975aa6a2501d91,1013e3eab01c7aea8f4f2ccd1dcffd4f,5f63e3543e8a0850a456c0d83d53f0f3,b3abceebb447b11314e841d26922dd59,27b6139b685aab9cfeb699d12eb536df,7cee2103fc14869cc7e49f41a191da54,9ca8 | ||
6b54ed3f97870f15ddcde27e4de3} | ||
(5 rows) | ||
Time: 28.734 ms | ||
``` | ||
|
||
中文,单字,双字匹配,请参考 | ||
|
||
[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md) | ||
|
||
## 参考 | ||
[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md) | ||
|
||
|
||
<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a> | ||
|
||
|
||
## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae") | ||
|
||
|
||
## [免费领取阿里云RDS PostgreSQL实例、ECS虚拟机](https://free.aliyun.com/ "57258f76c37864c6e6d23383d05714ea") | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.