Skip to content

Latest commit

 

History

History
209 lines (149 loc) · 8.66 KB

20210121_04.md

File metadata and controls

209 lines (149 loc) · 8.66 KB

PostgreSQL 用advisory lock 限制每一个分组中最多有多少条记录

作者

digoal

日期

2021-01-21

标签

PostgreSQL , 分组 , limit


背景

例如有一个表, 里面有一个gid字段, 要求每个gid value中最多有5条记录, 这个怎么实现呢?

写入的时候, 只能支持rc事务隔离级别, 同一个gid的写入操作互斥, 检查是满足这个gid只有n条记录, 不满足则不写入, 满足则写入, 写入完成释放锁.

为什么只能用rc隔离级别呢, 因为rr, ssi隔离级别有可能其他gid在你事务开启后写入, 但是你看到的还是一个比较老的快照, 以为还没有写入那么多条.

这个文档应该再说明一下, 仅仅支持rc模式, 或者在trigger中判断, 如果不是rc模式, 报错.

https://www.depesz.com/2021/01/08/how-to-limit-rows-to-at-most-n-per-category-fix/

advisory除了可以实现本例的功能, 还有很多能力用法:

《PostgreSQL 秒杀4种方法 - 增加 批量流式加减库存 方法》
《HTAP数据库 PostgreSQL 场景与性能测试之 30 - (OLTP) 秒杀 - 高并发单点更新》
《聊一聊双十一背后的技术 - 不一样的秒杀技术, 裸秒》
《PostgreSQL 秒杀场景优化》
《PostgreSQL 变态需求实现 - 堵塞式读, 不堵塞写 - 串行读,并行写 - advisory lock》
《Locking issue with concurrent DELETE / INSERT in PostgreSQL - 解法 advisory lock》
《advisory lock 实现高并发非堵塞式 业务锁》
《PostgreSQL 使用advisory lock实现行级读写堵塞》
《PostgreSQL 无缝自增ID的实现 - by advisory lock》
《PostgreSQL 使用advisory lock或skip locked消除行锁冲突, 提高几十倍并发更新效率》
《PostgreSQL SELECT 的高级用法(CTE, LATERAL, ORDINALITY, WINDOW, SKIP LOCKED, DISTINCT, GROUPING SETS, ...) - 珍藏级》
《快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法》

A bit ago I wrote a blog post that was supposed to show how to keep number of rows in table to N per some category.

Unfortunately, I overlooked a problem related to concurrency.

As Mikhail wrote, we will get problems if we'd do operations from multiple connections, at once-ish.

Let's see it:

CREATE TABLE users (  
    id       text NOT NULL,  
    PRIMARY KEY (id)  
);  
INSERT INTO users (id) VALUES ('depesz');  
CREATE TABLE addresses (  
    id      int8 generated BY DEFAULT AS IDENTITY,  
    user_id text NOT NULL REFERENCES users (id),  
    PRIMARY KEY (id)  
);  
CREATE INDEX addresses_user_id ON addresses (user_id);  

Now, let's add limiting triggers, with limit being 3:

CREATE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$  
DECLARE  
    v_count int4;  
BEGIN  
    SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;  
    IF v_count >= 3 THEN  
        raise exception  
            'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1  
            USING ERRCODE = 'check_violation';  
    END IF;  
    RETURN NEW;  
END;  
$$ LANGUAGE plpgsql;  
CREATE TRIGGER trg_check_addresses_per_user  
    BEFORE INSERT OR UPDATE ON addresses  
    FOR EACH ROW EXECUTE FUNCTION trg_check_addresses_per_user();  

Now, I will need two separate psql sessions:

psql #1 psql #2
begin; begin;
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');
commit; commit;

Nothing failed, and checking state shows:

$ SELECT * FROM addresses;  
 id | user_id   
----+---------  
  4 | depesz  
  5 | depesz  
  6 | depesz  
  7 | depesz  
  8 | depesz  
  9 | depesz  
(6 ROWS)  

The problem stems from the fact that while checking if there isn't too many rows, trigger can't see rows inserted by another connection in uncommitted transaction.

So, what can we do? We need some kind of locking.

First, simplistic approach, for our case, is to lock the user that we're inserting data for:

CREATE OR REPLACE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$  
DECLARE  
    v_count int4;  
BEGIN  
    PERFORM 1 FROM users WHERE id = NEW.user_id FOR UPDATE;  
    SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;  
    IF v_count >= 3 THEN  
        raise exception  
            'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1  
            USING ERRCODE = 'check_violation';  
    END IF;  
    RETURN NEW;  
END;  
$$ LANGUAGE plpgsql;  

Please note the line #5 – this is the locking.

So, let's redo the test. Of course I first had to truncate addresses, but let's see what happens then:

psql #1 psql #2
begin; begin;
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');

Now, insert in 2nd psql session doesn't finish – it hangs waiting for lock on users.

When I issued commit; in psql #1, psql #2 showed:

depesz=*# INSERT INTO addresses (user_id) VALUES ('depesz'), ('depesz'), ('depesz');  
ERROR:  USER depesz would have 4 addresses, but ONLY 3 allowed.  
CONTEXT:  PL/pgSQL FUNCTION trg_check_addresses_per_user() line 8 at RAISE  

This is nice. Solves the problem. But – it will also block all other updates to row in users.

Luckily PostgreSQL has Advisory Locks. I wrote about them earlier, so I'll skip the theory and move to actual code.

CREATE OR REPLACE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$  
DECLARE  
    v_count int4;  
BEGIN  
    PERFORM pg_advisory_xact_lock( 123, hashtext( NEW.user_id ) );  
    SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;  
    IF v_count >= 3 THEN  
        raise exception  
            'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1  
            USING ERRCODE = 'check_violation';  
    END IF;  
    RETURN NEW;  
END;  
$$ LANGUAGE plpgsql;  

Functionally – it behaves the same way – INSERT from 2nd psql hangs until psql #1 will not finish transaction.

But the difference is that now I still can update users.

Advisory lock will get automatically removed when transaction ended, so you don't have to worry about it.

In case you wonder – value 123 was picked randomly. I prefer to use two value advisory locks, where first argument is “namespace", and second is what I want to lock. Since I am using text identifiers had to convert it to int somehow – hence the hashtext() function.

Also – the version that kept counts in users table was safe with regards to this issue, as update to users is locking updated row.

Hope it helps.

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat