地理位置画像、连锁店圈人、地理围栏圈人、多地圈选、multi-polygon圈选、多点圈选

作者

digoal

日期

2019-01-16

背景

基于地理位置的圈选，例如我们有人的位置数据，有门店的位置数据，一家连锁店全国有1000家门店，如何快速的圈选所有门店附近的人群进行透视。

使用PostGIS和gis索引，以及PG的并行计算，可以实现高效率圈选。

如果还要加上其他非空间条件，使用btree_gist，组合搜索更快。

《HTAP数据库 PostgreSQL 场景与性能测试之 47 - (OLTP) 空间应用 - 高并发空间位置更新、多属性KNN搜索并测（含空间索引）末端配送、新零售类项目》

例子

我们有人的位置数据，有门店的位置数据，一家连锁店全国有1000家门店，如何快速的圈选所有门店附近的人群进行透视。

1、创建地理位置信息插件

create extension postgis;

2、用户位置

create unlogged table tbl1 (  
  uid int8 primary key, -- 用户ID  
  pos geometry  -- 用户位置  
);

3、门店位置

create unlogged table tbl2 (  
  bid int,   -- 商家ID  
  pos geometry  -- 商家某门店位置  
  cbid int,  -- 商家某门店ID  
);

4、用户位置索引，

create index idx_tbl1_1 on tbl1 using gist (pos);

5、门店表，商家ID索引

create index idx_tbl2_1 on tbl2 (bid);

6、生成用户位置

vi test1.sql  
\set uid random(1,2000000000)  
insert into tbl1 values (:uid, st_setsrid(st_makepoint(118+2*random(), 30+3*random()), 4326)) on conflict (uid) do nothing;

7、生成门店位置

vi test2.sql  
\set bid random(1,100000)  
insert into tbl2 values(:bid, st_setsrid(st_makepoint(118+2*random(), 30+3*random()), 4326));

8、最终数据量

12.8亿用户

pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -t 40000000

10万商家，2亿门店，平均每个商家2000个门店

pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 20 -j 20 -t 10000000

vacuum analyze tbl1;  
vacuum analyze tbl2;

9、查询某个商家所有门店附近的人群。

根据商家ID找到所有门店的地理位置，然后根据地理位置，圈出附近的人群，以下SQL，2179个门店，每个门店附近圈出100人，返回21.79万人，可以使用游标返回。

postgres=# explain analyze select tbl1.uid from tbl1,tbl2 where tbl2.bid=1 and st_dwithin(tbl1.pos, tbl2.pos, 0.005);  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.97..23056993.11 rows=336364 width=8) (actual time=0.145..191.189 rows=5347 loops=1)  
   ->  Index Scan using idx_tbl2_1 on tbl2  (cost=0.49..49.26 rows=2144 width=32) (actual time=0.037..4.888 rows=2179 loops=1)  
         Index Cond: (bid = 1)  
   ->  Index Scan using idx_tbl1_1 on tbl1  (cost=0.48..10754.17 rows=2 width=40) (actual time=0.070..0.084 rows=2 loops=2179)  
         Index Cond: (pos && st_expand(tbl2.pos, '0.0001'::double precision))  
         Filter: ((tbl2.pos && st_expand(pos, '0.0001'::double precision)) AND _st_dwithin(pos, tbl2.pos, '0.0001'::double precision))  
         Rows Removed by Filter: 1  
 Planning Time: 0.258 ms  
 Execution Time: 191.620 ms  
(9 rows)  
  
postgres=# explain analyze select tbl1.uid from tbl1,tbl2 where tbl2.bid=1 and st_dwithin(tbl1.pos, tbl2.pos, 0.0006);  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.97..23474312.47 rows=342457 width=8) (actual time=0.146..1356.814 rows=195721 loops=1)  
   ->  Index Scan using idx_tbl2_1 on tbl2  (cost=0.49..49.50 rows=2155 width=32) (actual time=0.031..8.108 rows=2179 loops=1)  
         Index Cond: (bid = 1)  
   ->  Index Scan using idx_tbl1_1 on tbl1  (cost=0.48..10892.93 rows=2 width=40) (actual time=0.057..0.609 rows=90 loops=2179)  
         Index Cond: (pos && st_expand(tbl2.pos, '0.0006'::double precision))  
         Filter: ((tbl2.pos && st_expand(pos, '0.0006'::double precision)) AND _st_dwithin(pos, tbl2.pos, '0.0006'::double precision))  
         Rows Removed by Filter: 28  
 Planning Time: 0.152 ms  
 Execution Time: 1367.205 ms  
(9 rows)

10、如果是返回UID进行推广，可以游标式返回。响应速度更快。

postgres=# begin;  
BEGIN  
postgres=# declare cur1 cursor for select tbl1.uid from tbl1,tbl2 where tbl2.bid=1 and st_dwithin(tbl1.pos, tbl2.pos, 0.0006);  
DECLARE CURSOR  
postgres=# \timing  
Timing is on.  
postgres=# fetch 10 from cur1;  
    uid       
------------  
  767330778  
  240805365  
 1753394423  
 1026446114  
   30318931  
  881053980  
   32654928  
 1420031417  
  162509026  
 1019355204  
(10 rows)  
  
Time: 0.385 ms  
  
postgres=# fetch 10 from cur1;  
    uid       
------------  
 1732633872  
 1966961936  
  710467223  
  323584211  
 1138058052  
  778209047  
 1384104188  
  353370951  
 1980956718  
  339078065  
(10 rows)  
  
Time: 0.201 ms