Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagination using "after" doens't respect sort order #3472

Open
romshark opened this issue May 28, 2019 · 4 comments

Comments

Projects
None yet
4 participants
@romshark
Copy link

commented May 28, 2019

The following issue is pretty much a clone of #2744 but with a different dataset and based on v1.0.14

  • What version of Dgraph are you using?

  • Have you tried reproducing the issue with latest release?

  • What is the hardware spec (RAM, OS)?

    • OS: Windows 10 (Linux in docker)
    • RAM: 64 GB
  • Steps to reproduce the issue (command/config used to run Dgraph).

    Post.id: string @index(exact) .
    Post.creation: dateTime .
    Post.title: string .
    Post.contents: string .
    
    • Fill the database with test data:
    {
      set {
        _:post1 <Post.id> "00000000000000000000000000000006" .
        _:post1 <Post.title> "post 1" .
        _:post1 <Post.contents> "post 1 contents" .
        _:post1 <Post.creation> "2019-05-28T10:00:00+00:00" .
    
        _:post2 <Post.id> "00000000000000000000000000000005" .
        _:post2 <Post.title> "post 2" .
        _:post2 <Post.contents> "post 2 contents" .
        _:post2 <Post.creation> "2019-05-28T10:30:00+00:00" .
        
        _:post3 <Post.id> "00000000000000000000000000000004" .
        _:post3 <Post.title> "post 3" .
        _:post3 <Post.contents> "post 3 contents" .
        _:post3 <Post.creation> "2019-05-28T11:00:00+00:00" .
        
        _:post4 <Post.id> "00000000000000000000000000000003" .
        _:post4 <Post.title> "post 4" .
        _:post4 <Post.contents> "post 4 contents" .
        _:post4 <Post.creation> "2019-05-28T11:30:00+00:00" .
        
        _:post5 <Post.id> "00000000000000000000000000000002" .
        _:post5 <Post.title> "post 5" .
        _:post5 <Post.contents> "post 5 contents" .
        _:post5 <Post.creation> "2019-05-28T12:00:00+00:00" .
        
        _:post6 <Post.id> "00000000000000000000000000000001" .
        _:post6 <Post.title> "post 6" .
        _:post6 <Post.contents> "post 6 contents" .
        _:post6 <Post.creation> "2019-05-28T12:30:00+00:00" .
      }
    }
    
    • Read order of objects:
    {
      all(
        func: has(Post.id),
        orderasc: Post.id
      ) {
        uid
        Post.id
        Post.title
        Post.contents
        Post.creation
      }
    }
    

    (my results: 0x7, 0xc, 0xb, 0xa, 0x9, 0x8)

    • Try read a page of 3 posts after the third:
    {
      all(
        func: has(Post.id),
        orderasc: Post.id,
        first: 3,
        after: 0xb
      ) {
        uid
        Post.id
        Post.title
        Post.contents
        Post.creation
      }
    }
    
  • Expected behaviour and actual result.

    • expected: 0xa, 0x9, 0x8
    • actual: 0xc
@MichelDiz

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Turns out that "after" don't support ordering (only UID), so this will be turn to a feature request.

@manishrjain

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

You should be able to use offset along with first. Does that not work?

@MichelDiz

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

offset works fine indeed. We get the right order even 1 by 1. ( first: 1, offset: 0, 1, 2, 3, 4, 5)

@romshark

This comment has been minimized.

Copy link
Author

commented Jun 7, 2019

@manishrjain offset doesn't scale (see #3473), it's very slow on relatively large datasets. I've tested it on ~83k nodes, which is far from a million but even that took almost forever (2.5 - 5 seconds). I had to suspend the pagination feature in my tech-demo because of that.

In SQL I'd usually use an indexed cursor-based approach: after: <id>; limit: 100 where the cursor must be unique because offset usually results in a full-table-scan, which is obviously slow and this seems to be what Dgraph is doing. Cursor-based pagination makes the database quickly find the row/node to start reading from and then reads 100 rows since the cursor (id) is ordered.

I think it could be possible to simulate such an index with an ordered/indexed edge but that's probably far from ideal.

@mangalaman93 mangalaman93 self-assigned this Jun 10, 2019

@mangalaman93 mangalaman93 removed their assignment Jun 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.