Change comparisons in cursor-based pagination #4287

dullbananas · 2023-12-18T01:16:57Z

Currently, posts are compared with the cursor with a condition like this:

(a >= 1) AND (a != 1 OR b >= 2) AND (a != 1 OR b != 2 OR b >= 3)

This PR changes it to:

(a > 1) OR (a = 1 AND b > 2) OR (a = 1 AND b = 2 AND c > 3)

This should make better use of the index. At worst it should be 3 very efficient index scans that are combined together. The 3 comparison groups closely resemble examples of index-friendly conditions in the PostgreSQL manual. Because they are combined with OR, the scan of items that aren't included in the final result should only be caused by other filters that don't use the index, such as deleted = false.

dullbananas · 2023-12-18T07:04:00Z

query plan still dum

phiresky · 2023-12-18T11:26:55Z

thanks so much for your effort. did you try the tuple comparison? I don't think it should need much new index(es) because almost all the sorts are purely descending (featured_local DESC, scaled_rank DESC, published DESC), only the "old sort" has ascending in there I think

dullbananas · 2023-12-19T00:37:36Z

Query plan not dum. Me dum.

I increased the number of posts, and now it uses the index for all pagination filters (except post_id which is not in the index).

I tried tuple comparison on one of the sorts, and that makes it use 1 index scan instead of 3, which is more simple, so it might be somewhat faster, but it would not affect the amount of index-scanned rows. I don't think it's worth the trouble of doing the ugly stuff needed for tuple comparisons with mixed sorting directions, but I could make tuple comparison automatically done whenever possible in the pagination library I'm working on.

I tried it again without this PR but still with many posts, and confirmed that it previously couldn't use the index in this way, so this PR will improve performance a lot.

->  Bitmap Heap Scan on post_aggregates  (cost=13.07..62.03 rows=16 width=106) (actual time=0.260..1.821 rows=4690 loops=1)
		Recheck Cond: (((community_id = 14) AND (featured_community < false)) OR ((community_id = 14) AND (NOT featured_community) AND (comments < '0'::bigint)) OR ((community_id = 14) AND (NOT featured_community) AND (comments = '0'::bigint)))
		Filter: ((featured_community < false) OR ((comments < '0'::bigint) AND (NOT featured_community)) OR ((post_id < 4712) AND (NOT featured_community) AND (comments = '0'::bigint)))
		Rows Removed by Filter: 1310
		Heap Blocks: exact=117
		->  BitmapOr  (cost=13.07..13.07 rows=18 width=0) (actual time=0.247..0.248 rows=0 loops=1)
			->  Bitmap Index Scan on idx_post_aggregates_featured_community_score  (cost=0.00..4.40 rows=12 width=0) (actual time=0.003..0.003 rows=0 loops=1)
					Index Cond: ((community_id = 14) AND (featured_community < false))
			->  Bitmap Index Scan on idx_post_aggregates_featured_community_most_comments  (cost=0.00..4.36 rows=6 width=0) (actual time=0.002..0.002 rows=0 loops=1)
					Index Cond: ((community_id = 14) AND (featured_community = false) AND (comments < '0'::bigint))
			->  Bitmap Index Scan on idx_post_aggregates_featured_community_most_comments  (cost=0.00..4.29 rows=1 width=0) (actual time=0.242..0.242 rows=6000 loops=1)
					Index Cond: ((community_id = 14) AND (featured_community = false) AND (comments = '0'::bigint))

phiresky · 2023-12-19T00:49:46Z

just fyi, a bitmap index scan is not a "real" index scan depending on definitions and can be much worse than a normal index scan.

"The bitmap is one bit per heap page. The bitmap index scan sets the bits based on the heap page address that the index entry points to.

So when it goes to do the bitmap heap scan, it just does a linear table scan, reading the bitmap to see whether it should bother with a particular page or seek over it." https://dba.stackexchange.com/questions/119386/understanding-bitmap-heap-scan-and-bitmap-index-scan

to test you should probably also put at least 1 millions rows in your table because otherwise the results may be completely different from the real world (PG knows scanning through 1000 rows doesn't matter so it does it)

dullbananas · 2023-12-20T21:37:29Z

The index scans are oddly inconsistent. This is with 999999 posts, each with different timestamps:

->  Index Scan using idx_post_aggregates_featured_community_published on post_aggregates  (cost=0.42..77988.07 rows=311457 width=106) (actual time=5.691..5.701 rows=21 loops=1)
		Index Cond: (community_id = 2)
		Filter: ((featured_community < false) OR ((published < '2023-12-20 19:31:41.673269+00'::timestamp with time zone) AND (NOT featured_community)) OR ((post_id < 673364) AND (NOT featured_community) AND (published = '2023-12-20 19:31:41.673269+00'::timestamp with time zone)))
		Rows Removed by Filter: 19980

The command used (#4285):

scripts/db_perf.sh --read-post-pages 1000 --posts 1000000

phiresky · 2023-12-21T03:20:40Z

Index scan with only community id in condition is still really bad because that means it reads all rows from the community into memory. condition should be doing 99% of the filtering, filter should ideally be empty. You can tell by "filtered rows" being the number ofposts in the community 2 you're looking at - 10.

That was exactly why I added that prefetch/upper boind function because PG was not able to understand the hot posts query enough and the index filters were making it fetch all community posts into RAM. (should be in the discussion in #3872 )

also remember that the expensive queries are the ones looking at subscribed where it has to be able to filter by multiple community IDs simultaneously

dullbananas · 2023-12-22T06:33:15Z

Replaced by #4320

dullbananas added 5 commits December 17, 2023 17:51

Change comparisons in cursor-based pagination

d063b1b

Fmt

93b3a93

{

8b59309

no ()

3e77a74

Update post_view.rs

5c9d288

dullbananas marked this pull request as ready for review December 18, 2023 03:16

dullbananas requested review from Nutomic, dessalines and phiresky as code owners December 18, 2023 03:16

dullbananas marked this pull request as draft December 18, 2023 03:16

dullbananas added 7 commits December 17, 2023 20:23

Update post_view.rs

97d1103

Update post_view.rs

36b6156

Update post_view.rs

ce01778

Update post_view.rs

9157238

Update post_view.rs

7bdfb64

Update post_view.rs

219125c

fix

2cd2810

dullbananas closed this Dec 18, 2023

dullbananas reopened this Dec 19, 2023

Merge branch 'main' into patch-11

2fdcaad

dullbananas marked this pull request as ready for review December 19, 2023 00:38

Merge branch 'main' into patch-11

18e0ddc

dullbananas closed this Dec 22, 2023

dullbananas mentioned this pull request Jan 19, 2024

Better query plan viewing experience #4285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change comparisons in cursor-based pagination #4287

Change comparisons in cursor-based pagination #4287

dullbananas commented Dec 18, 2023

dullbananas commented Dec 18, 2023

phiresky commented Dec 18, 2023 •

edited

Loading

dullbananas commented Dec 19, 2023

phiresky commented Dec 19, 2023 •

edited

Loading

dullbananas commented Dec 20, 2023 •

edited

Loading

phiresky commented Dec 21, 2023

dullbananas commented Dec 22, 2023

Change comparisons in cursor-based pagination #4287

Change comparisons in cursor-based pagination #4287

Conversation

dullbananas commented Dec 18, 2023

dullbananas commented Dec 18, 2023

phiresky commented Dec 18, 2023 • edited Loading

dullbananas commented Dec 19, 2023

phiresky commented Dec 19, 2023 • edited Loading

dullbananas commented Dec 20, 2023 • edited Loading

phiresky commented Dec 21, 2023

dullbananas commented Dec 22, 2023

phiresky commented Dec 18, 2023 •

edited

Loading

phiresky commented Dec 19, 2023 •

edited

Loading

dullbananas commented Dec 20, 2023 •

edited

Loading