Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset-based pagination is slow #3473

Open
romshark opened this issue May 28, 2019 · 2 comments

Comments

Projects
None yet
4 participants
@romshark
Copy link

commented May 28, 2019

Since after-based pagination doesn't work (see #3472 and #2744 and the forum) I had to fall back to offset-based pagination instead, which I expected to be slow (since offsets usually never use the indexes), yet I'm not sure whether it's generally expected to be that slow.

  • What version of Dgraph are you using?

  • Have you tried reproducing the issue with latest release?

  • What is the hardware spec (RAM, OS)?

    • OS: Windows 10 (Linux in docker)
    • RAM: 64 GB
  • Steps to reproduce the issue (command/config used to run Dgraph).

    Post.id: string @index(hash) .
    Post.creation: dateTime .
    Post.title: string .
    Post.contents: string .
    
    • Fill the database with lots of data using this template:
    {
      set {
        _:post1 <Post.id> "00000000000000000000000000000006" .
        _:post1 <Post.title> "post 1" .
        _:post1 <Post.contents> "post 1 contents" .
        _:post1 <Post.creation> "2019-05-28T10:00:00+00:00" .
      }
    }
    

    (I used a dataset of 83.719 nodes)

    • Read the last 10 items of, say, 100k:
    {
      all(
        func: has(Post.id),
        orderasc: Post.id,
        first: 10,
        offset: 99990
      ) {
        uid
        Post.id
        Post.title
        Post.contents
        Post.creation
      }
    }
    
  • Expected behaviour and actual result.

    • expected: as I already said, I expected offset to be slow, but since I had no other option left for pagination I could expect it to optimize this query using the hash index, otherwise pagination is pretty much impossible to get fast.
    • actual: it takes almost 2.5 - 5 seconds!
@danielmai

This comment has been minimized.

Copy link
Member

commented Jun 10, 2019

I don't think offset per-se is what's slow here. Pagination (first, offset, after) is fairly cheap. According to the query trace for the query you shared with ~100k Posts from your example, most of the time is taken with sorting. Here's a trace from Jaeger, showing that sorting took 1.8 seconds.

Screenshot_2019-06-10 Jaeger UI

Removing the sort criteria from the query (orderasc: Post.id) speeds up the query significantly, from >2s down to 300ms, which is mostly taken up by has() as it doesn't use an index and iterates over the database. There might be some optimizations we can do here with sorting and pagination combined.

@romshark

This comment has been minimized.

Copy link
Author

commented Jun 10, 2019

@danielmai I understand, but how do we do pagination over a sorted dataset then? 😄

What if I wanted to serve a paginable list of 100k+ posts sorted by Post.creation and Post.id (since Post.creation isn't unique). AFAIK there's no way to make your own index using a sorted edge like postListByCreationTime: uid @index(hash) @sort(Post.creation, Post.id) which would allow for fast offset based pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.