Skip to content

Commit

Permalink
new doc
Browse files Browse the repository at this point in the history
  • Loading branch information
digoal committed Mar 20, 2019
1 parent f2278ea commit 63b2e2c
Show file tree
Hide file tree
Showing 3 changed files with 324 additions and 0 deletions.
322 changes: 322 additions & 0 deletions 201903/20190320_01.md
@@ -0,0 +1,322 @@
## PostgreSQL 数组里面的元素,模糊搜索,模糊查询,like,前后百分号,正则查询,倒排索引
### 作者
digoal

### 日期
2019-03-20

### 标签
PostgreSQL , 数组 , 元素模糊匹配 , 倒排索引 , gin

----
## 背景
PostgreSQL通过倒排索引,支持模糊查询。

[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md)

如果你使用了数组类型,那么在数组里面的元素如何实现模糊查询呢?

同样可以使用PG的GIN倒排索引。

## 例子
1、新建一个数据库,为了支持中文模糊查询,请务必不要使用lc_ctype='C'的数据库,这个无法改,不能支持中文模糊查询。

```
db4=# create database newdb1 with template template0 encoding 'UTF8' lc_collate 'C' lc_ctype 'en_US.utf8';
CREATE DATABASE
```

2、创建pg_trgm插件,支持模糊查询。

```
db4=# \c newdb1
You are now connected to database "newdb1" as user "postgres".
newdb1=# create extension pg_trgm;
CREATE EXTENSION
```

3、新建测试表,文本数组

```
newdb1=# create table t (i text[]);
CREATE TABLE
```

4、新建immutable函数,将数组转换为文本类型(因为表达式索引的表达式、函数、操作符等都必须是immutable的)

```
newdb1=# create or replace function imm_array_to_string (anyarray,text) returns text as $$
select array_to_string($1,$2);
$$ language sql strict immutable;
CREATE FUNCTION
```

5、创建索引

```
newdb1=# create index idx_t_1 on t using gin ( imm_array_to_string(i,',') gin_trgm_ops);
CREATE INDEX
```

6、模糊查询语法测试,已支持索引检索。

```
newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=14.35..22.33 rows=7 width=32)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..14.35 rows=7 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
(4 rows)
newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=668.78..1051.63 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..668.71 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
(4 rows)
Time: 0.462 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL%';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=557.68..940.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..557.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text)
(4 rows)
Time: 0.484 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text)
(4 rows)
Time: 0.473 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '%你好PostgreSQL';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '%你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~* '%你好PostgreSQL'::text)
(4 rows)
Time: 0.918 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好PostgreSQL';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=723.78..1106.63 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..723.71 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~* '你好PostgreSQL'::text)
(4 rows)
Time: 0.816 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好%PostgreSQL';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~* '你好%PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~* '你好%PostgreSQL'::text)
(4 rows)
Time: 0.796 ms
```


### 包含任意元素:元素以xx开头
```
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
不区分大小写
select * from t where imm_array_to_string(i,',') ~* '[,|^]d0dc1' limit 5;
```

### 包含任意元素:元素以xx结尾
```
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
不区分大小写
select * from t where imm_array_to_string(i,',') ~* '19d0b[,|$]' limit 5;
```

### 包含任意元素:元素内包含xx
```
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
不区分大小写
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
```

## 性能测试
1、创建函数,生成随机字符串数组

```
create or replace function gen_rand_arr(int) returns text[] as $$
select array(select md5(random()::text) from generate_series(1,$1));
$$ language sql strict;
```

```
newdb1=# select gen_rand_arr(5);
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
gen_rand_arr | {9d193327808319b0d66c12e29eaa33f6,5d761dad5b6e886eb87480d86d8640fb,07856d9f11b156bde20b527192c00361,35895b270c7665afd34d62f58220a83a,309b4d117f07974cb938bab890a7f327}
```

2、写入大量数据

```
vi test.sql
insert into t select gen_rand_arr(8) from generate_series(1,100);
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 120 newdb1
```

3、记录数

```
newdb1=# select count(*) from t;
count
---------
2816900
(1 row)
```

4、收集统计信息,将索引的pending list合并。

```
newdb1=# vacuum (analyze,verbose) t;
```

5、查询效率


### 包含任意元素:元素以xx开头

```
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=224.38..231.17 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text)
(5 rows)
newdb1=# select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
i
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------
{1a2dfc55b7b988a2483c6243d8c2ecf9,d0dc1a12d874d6a93fe769982d0095d4,362c933460bf5a5c4920153b4dc7de69,cef755962c451ccb892ff23405763a14,82466b1477f74e136b4c0d38fcee703e,26e04120f3d16a3f6763e5d95bfa6069,10dccf2d7372f88437dbc501c2a7181d,2b9b
8fef33d8ed03178d77b7ab0eabe1}
{2a6beb43a3dcb0f880f564b402931ece,6f623041e6291ddef51c2b353ff9b5c3,d0dc13e47510225b130de68332a5cf82,5abb5ec622a3bcde0e28e8ca9df9b0b2,123440fe5c6d1000223aaa667219fb02,7f25ad33f3114ded94c86c12681450f5,fa6748e69665fa5c1b5b497457bd3e59,40f8
3894401e278d72ffe923cd106f69}
{5a826d56b29a6d35df7c026875f4d88c,ca998171aeadf1a9aa4ce5cf29216984,27cae196fbf43d352828271b236535d2,8d31bfd2dd50d0321de6d88a441ffdb0,fc405e0388c6420d647421cb31bc29d7,096af6a4e8b9e945392b8419ffa987b6,2243b26416b365ece9287f3c2251a1bd,d0dc
146dff328d79311b120efd91ce59}
{708d002fc1c940ae2f89b0dee1420f26,bcd71f398c104ed22b46afe551822f5c,d0dc1a1929ecf7df838012d40accf4dc,49627dd60c3639fcb5d4e5414a8e9417,5bf98eb3cf0f2740f8f25370972e63e8,df8b198bfe0fc1a465dc875fce4c7651,12525bdd07fcde399a5fe7673ba1f139,61f2
d94d7a710670f7115bab85502893}
{963c013b3569bf5fbe3f68eeb51117d4,8d572873fd6c828c50be6a6e30c9bba5,d0dc1c96169c37d05b4979e7a4195005,6195d7bebf0a92380c1896ac9f2ad9b6,1ef2fba453aa86e571b338c188ddfea6,683b48f7bfcccd030700f00e28430b07,b413909126a0f427efc7ed64d25c4dd3,6a9f
3ded525226d5d689adc6d611b9db}
(5 rows)
Time: 23.184 ms
```

### 包含任意元素:元素以xx结尾
```
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=224.38..231.17 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text)
(5 rows)
newdb1=# select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {68e92a1eae2aab0283c003bf1abd5f90,f9dc4ce34dd4cac3fa8f26e816705227,41200ccb84d0dc364e0127b681010858,8636f55c5bfc09bbe3ccb2acefa19d0b,eb3579e6b44896bd04e1edade3f44de6,88a6c079f261d45f7ed9793bfd877bff,af4bc563d1be87bf9fb6953269fb7466,da73c8735908799ab1b168b24a7b73ab}
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {0661d60b8dffee5f51c6b6d7403aeceb,ba6434c35e17dc53e0c981136128a3d9,12ecc4a208c48bfc7870efcf01dc0a12,e9ab9565953b828e727dfb62222061d4,41d1d82a7a397de69d0e0436ea05d639,45f6564f866de964599850fc12119d0b,9e6415c9da0c83375926940684b330bd,a5981c3baf438e3bd66df816c3cd85fa}
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {dce1c900a5a4b8ff88694e4dd1f7727b,92d2f6b5e46139f854f0f0f795b04e6f,e6aae989ef89fcbe2aa045a250c19f0f,dea731b2a24ce4e91d9a5b83a0e35bd2,74aec9033979243c149b029a1c84948d,1f0021e7cd8ea1a32d480068d2aac632,bdb3d27aef3d8d9a8c87c52c25c19d0b,cf02176aa82cb369ba55de698047780e}
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {1866ab11ebacdc621c320f3376fe4b5f,efbcd77cb4f44e562bd19a6a50a38960,76cbcbc1c0f05ec4121ef20962baaaba,7ac92fb5785eda268b456675b7bf7dd9,1fb6e2ead6f5c9fa7959572d61e9a746,7b08eac90dcba3857a53eeeb43e19d0b,c1e7c7d29329243cf8c0a1bee25b44bf,40c27c32a3961f2b9182efbd6eb51ec5}
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {fd8103d4db72cdcfb4768febd61abeb4,b128ccb9e419f3b4be6c35b6e3b7ccd8,5e39686fd5c527e6d1905c31d1cabfa6,6e37d73b2838c5f74134a64d683b602b,1d6587aa90385e4f1f421cec319d4f7d,423b4ab4eeb87a15f3b164a8ecc69366,4a723c53547effc20c3a3b5916a19d0b,5d0db5a0b90ec54cc474456576156d87}
Time: 23.950 ms
```

### 包含任意元素:元素内包含xx
```
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
newdb1=# explain select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
QUERY PLAN
--------------------------------------------------------------------------------
Limit (cost=113.28..120.07 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=113.28..496.13 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..113.21 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text)
(5 rows)
newdb1=# select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
i
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------
{adfa0cebeef5e67f15d1428c1c42f9ec,95abce5cece413de8fbc2a90846e37bc,86b367a1d39b0de39b2259d368d7fe2e,02d1ffd0f89676139b88fcd1422bb9c5,374a49309103cfd1ff4deeb61d650e32,67da44f5ee6ee28f960d847c72bfdea5,78d39fab51e2ee91756c9c5dd955cc85,cca5
f52519170de192873ac5caf4a4c9}
{7e45e4c81c9f8a184f1dab58adce061a,281bf5ac39c801d65ce5e5939a43d852,42c4a6a28d851653001b24a5296803e2,21cad5d25b736c70369fabcec9336c50,be9260c156f7165781db45419c1b64a8,20ed1a34337781b7bfbd6714d0fe89a5,9df8459f5b68ba52dd20c96baef200a6,d2b5
052977df6c4377b963d9dfd80914}
{8f99bf47bdbeb6972e564b2ef0abce9f,e705fbcc909792a7ef9585e2a381ed6d,015eafcb49e4558ab99733b95c0a895b,34b2762f16a85f1a10bb4e1c21228591,2e86f833ca4f5208e84a57f0ef1e945e,f1262c8a852ab4e174627cf5e091f91c,888f4f36bfb0e4274e576f1afc831117,fe1c
3638c18cd54da49311c09a279e5d}
{95926e58ed85b5e40a748cf5894b1ed5,571a4aa15aad73036fd570356e16bd60,af23841c7dc79ce8c4a10d21e914f7fe,8306d637609dab1e6d8aced9d29cb172,c8ca176600bffa9de6f1d0e20c0aa490,f29522edfdbcf1b00d62a978ee1182fd,19d77522d9b1c85b419fdca00eb2320c,991d
b9143192dabcee56b0bd5370b523}
{e1378cb77c91879455302cda26cf8c84,0605fb91660d844acf975aa6a2501d91,1013e3eab01c7aea8f4f2ccd1dcffd4f,5f63e3543e8a0850a456c0d83d53f0f3,b3abceebb447b11314e841d26922dd59,27b6139b685aab9cfeb699d12eb536df,7cee2103fc14869cc7e49f41a191da54,9ca8
6b54ed3f97870f15ddcde27e4de3}
(5 rows)
Time: 28.734 ms
```

中文,单字,双字匹配,请参考

[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md)

## 参考
[《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》](../201704/20170426_01.md)


<a rel="nofollow" href="http://info.flagcounter.com/h9V1" ><img src="http://s03.flagcounter.com/count/h9V1/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_12/viewers_0/labels_0/pageviews_0/flags_0/" alt="Flag Counter" border="0" ></a>


## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")


## [免费领取阿里云RDS PostgreSQL实例、ECS虚拟机](https://free.aliyun.com/ "57258f76c37864c6e6d23383d05714ea")

1 change: 1 addition & 0 deletions 201903/readme.md
Expand Up @@ -2,6 +2,7 @@

### 文章列表
----
##### 20190320_01.md [《PostgreSQL 数组里面的元素,模糊搜索,模糊查询,like,前后百分号,正则查询,倒排索引》](20190320_01.md)
##### 20190319_02.md [《PostgreSQL 10 CLogControlLock 等待事件分析与优化 - hint bit, freeze, autovacuum, 风暴》](20190319_02.md)
##### 20190319_01.md [《PostgreSQL 并行计算解说 汇总》](20190319_01.md)
##### 20190318_04.md [《PostgreSQL 复杂SQL优化器优化建议 - pg_plan_advsr》](20190318_04.md)
Expand Down

0 comments on commit 63b2e2c

Please sign in to comment.