Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] perf regression on ORDER BY queries #10555

Closed
walterddr opened this issue Apr 5, 2023 · 11 comments
Closed

[multistage] perf regression on ORDER BY queries #10555

walterddr opened this issue Apr 5, 2023 · 11 comments

Comments

@walterddr
Copy link
Contributor

Example

see:
https://github.com/apache/pinot/actions/runs/4440707776/jobs/7794852115 vs https://github.com/apache/pinot/actions/runs/4441822565/jobs/7797366601

Research

seems like the issue occurs on #10408

  • Did a bit of research with @somandal and it seems like the issue was with the seenMail trigger is being swallowed due to priority queue insertion taking much longer than simply returning the block back.
  • after modifying DEFAULT_SCHEDULER_RELEASE_TIMEOUT_MS = 1_000; // 10_000; for the RoundRobinScheduler it is becoming much faster.

Action items

  • tagging @ankitsultana and @somandal and see what would be the best solution
  • tagging @walterddr to come up with a regression test instead of having to figure it out by looking at the integration test log.
@ankitsultana
Copy link
Contributor

Hmm I see.. this is also related to some other known gaps with how the callbacks are handled.

I haven't gotten a chance to work on it lately: #10424

I think I can prioritize it for next week (we are also planning to finalize our prod build internally so this is aligned with that).

@somandal
Copy link
Contributor

somandal commented Apr 5, 2023

Some logs for the testing done to identify the RC:

One example Send <-> Receive interaction:

Sending data block: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1}, split: 0
Sending data block: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1}, split: 1
sending EOS mailbox id: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1}

...

Time to add 7182 elements to priority queue: 2, time since start of operator: 513, mailbox: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1} 
openmailbox count: 1, eosMailboxCount: 0, returning block type: noop
Time to add 2818 elements to priority queue: 1, time since start of operator: 519, mailbox: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1}
openmailbox count: 1, eosMailboxCount: 0, returning block type: noop
EOS received after 16545 since operator creation for mailbox: {"jobId":"4_2","from":"0@localhost:8422","to":"0@localhost:8422","senderStageId":2,"receiverStageId":1}
Sorted block construction took: 7, total time to process all data in this receive: 16552, openmailbox: 1, eosmailboxcount: 1
openmailbox count: 0, eosMailboxCount: 0, returning block type: eos

@ankitsultana let's discuss a potential solution for this. As seen above, we added the blocks to the priority queue fairly quickly, then waited almost 15 seconds to get the EOS. The Sender had sent out the blocks and EOS much earlier.

All time calculations are using System.currentTimeMillis()

@ankitsultana
Copy link
Contributor

Is there a way to repro this on local easily?

@somandal
Copy link
Contributor

somandal commented Apr 5, 2023

@ankitsultana I've been using MultiStageEngineIntegrationTest::testGeneratedQueries()
I modified some code to only run the ORDER BY queries as the others finish quickly.

@ankitsultana
Copy link
Contributor

Sounds good. Let me take an initial look and get back. We can sync up later this week on the follow-ups. Does that work?

@somandal
Copy link
Contributor

somandal commented Apr 5, 2023

Yup that works. Let me know if you need any more information

@somandal
Copy link
Contributor

somandal commented Apr 5, 2023

One more thing I tried: I tried returning a No-op block right after adding to the priorityQueue, but still ran into the same issue. So there is some delay caused by adding to the priorityQueue, but it shouldn't result in so much slowness (and after this step is performed, the SortOperator should only be applying LIMIT and FETCH so the overall ORDER BY should still be reasonable).

Never mind the above, returning a no-op of course won't help as it gives the signal to yield which results in the long wait for _seenMail, which doesn't happen since the EOS is already sent out, resulting in the default wait time to reschedule

@ankitsultana
Copy link
Contributor

ankitsultana commented Apr 6, 2023

Took a look and I think this is what's happening:

Say we are sorting in MailboxReceiveOperator and there's a single receiving mailbox. Say these blocks were sent in order: B1, B2, B3, EOS.

Either via callback or during first run of the OpChain, we will consume B1 and add it to the priority queue. However, even though we consumed something, we return a NoOp block to the top.

Since we receive a NoOp block, we will try to yield, in where we'll check whether there's any callback for this OpChain (by looking up _seenMail).

Say there is a callback, in that case we will schedule the OpChain again and clear the _seenMail entry. In the second run we'll consume B2 and try to yield again. This time there's no callback entry so stay yielded and wait for automatic promotion to happen.

A possible fix for this would be to add a new MailboxReceiveOperator, say SortedMailboxReceiveOperator which does something similar to buildBroadcastHashTable in the join operator, i.e. continuously poll mailboxes until you get null or EOS. We could also create a SortOperator > MailboxReceiveOperator combo when we need to sort on receiver side so we don't have to re-implement Mailbox receive logic.

There's a separate issue of missed callbacks btw which I have been punting for a while. I'll invest more time on this ticket next week: #10424

Lmk if I missed something.

@somandal
Copy link
Contributor

somandal commented Apr 6, 2023

Thanks for the analysis, I agree that for the sorting logic we need to keep polling until EOS or null is seen. I’ll take up this fix today. I don’t want to add a SortOperator on top as in the future we want to things like sort on sender and have receiver do k way merge and things like partitioned sort for window functions where we only sort within a partition instead of a full sort. Sort operator also applies an upper limit for limit which needs to be handled very carefully for window functions to avoid correctness issues. Also the sort exchange means that any receiving operators above it can expect sorted data.

@somandal
Copy link
Contributor

somandal commented Apr 7, 2023

The fix for the ORDER BY regression is merged.
Still need some follow up on the improvements around callback handling cc @walterddr @ankitsultana -> If this is tracked in another issue perhaps we can close this one (and reference this one there for context)

@walterddr
Copy link
Contributor Author

Yup sounds good. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants