[multistage][POC] Move mailbox instance from global map to request context #9836
[multistage][POC] Move mailbox instance from global map to request context #983661yao wants to merge 6 commits intoapache:masterfrom
Conversation
|
I haven't figured out a good way to test this. Will follow up with a test. |
|
Thanks @61yao! I think you've identified the right problem to solve, but I think the abstractions in this PR aren't what we need. From a code ownership perspective, I don't think it makes sense for an operator chain to "own" the underlying physical resources it uses to send/receive data. That'll make it much more difficult in the future to share those resources, and it makes management of those resources happen via various callbacks/asynchronous actions - that in turn makes it harder to debug and harder to reason about. Instead, I think the current abstraction is pretty good (there's a centralized MailboxService that maintains the mailboxes and there's only one of them for the lifetime of a Pinot server). The problem you've identified is that we need some way to let that centralized service clean up resources that don't exist. I think there might be various different triggers to trigger that:
The design here (decentralizing the mailbox ownership) would make 2/3 much more difficult - and it leaks information into places that don't need that information. As an aside, I don't think contention for the concurrent hashmap is a problem - so long as there isn't key contention (which there should almost never be) it will perform extremely fast (and access to it is almost certainly not a bottleneck compared to all the other things a query needs to do). |
I agree centralized resource should be in central place. but resources created per request should clean up per request instead of leaving it to centralized request management. For example, shared channel between servers is shared and should be managed globally. I agree it should not leave inside opchain. but I haven't found a good place to hold the request context yet. It doesn't make sense to put per request resource in the central place and clean it up later. It makes it so much easier to leak. Ideally, timeout and error should receive the same exit point where we clean up per request resource. The actual physical resource say the data block can live globally but the instance of mailbox and the streaming channel should still be managed per resource purpose. |
|
Discussed with @agavra offline. We will put more thoughts into this and figure out the right next step |
|
I took a look again about the fairness scheduling. This PR has nothing to do with that because it only deals with sending mailbox instead of receiving. I agree receiving side needs more thoughts. We want to have different connection for different request due to isolation. having one single connection between servers doesn't seem to be a good idea. If one request crashes or has error on the channel, all following request will fail. |
agavra
left a comment
There was a problem hiding this comment.
after digesting this for a bit, I actually really like the approach. I think adding a close to operator makes a lot of sense, and I think it'll integrate well with the scheduler with only a little bit of added work.
mostly minor comments; the only major comment is that I'd prefer we avoid exposing BlockExchange - (see the two inline comments). Perhaps we can add Operator#close(@Nullable Throwable e) and allow the MailboxSendOperator to send an error message if it's closed with an error?
| } | ||
|
|
||
| @Override | ||
| public void close() { |
There was a problem hiding this comment.
should we clear the queue here?
| if(!_isCompleted.get()){ | ||
| _isCompleted.set(true); |
There was a problem hiding this comment.
there's a race condition here, it should be if (!isCompleted.compareAndSet(false, true))
|
|
||
| @Override | ||
| public void onCompleted() { | ||
| finishLatch.countDown(); |
There was a problem hiding this comment.
nit: maybe we move this and _isCompleted.set(true) to shutdown so both onError and onCompleted have the same behavior.
| register(operatorChain); | ||
| } else { | ||
| LOGGER.info("Execution time: " + timer.getThreadTimeNs()); | ||
| operatorChain.getRoot().close(); |
There was a problem hiding this comment.
nit: let's keep both log statements (though looks like I forgot to change this one to debug!)
| } | ||
| } catch (Exception e) { | ||
| LOGGER.error("Failed to execute query!", e); | ||
| operatorChain._context.getExchange().send(TransferableBlockUtils.getErrorTransferableBlock(e)); |
There was a problem hiding this comment.
this breaks some abstraction boundaries - this scheduler service should know nothing about the exchange or sending blocks; instead we should consider adding this to MailboxSendOperator (which is always the root operator for these chains). FWIW, I think that's already the case.
| @Override | ||
| public void runJob() { | ||
| public void runJob() | ||
| throws InterruptedException { |
There was a problem hiding this comment.
note: this should never throw as the worker pool threads will just die and we'll be left with a dangling worker pool (these issues are really tough to debug). Instead let's catch any exceptions and handle them
| public OpChain(Operator<TransferableBlock> root) { | ||
| _root = root; | ||
| // TODO: refactor this into OpChainContext | ||
| public PlanRequestContext _context; |
| protected final int _port; | ||
| protected final Map<Integer, StageMetadata> _metadataMap; | ||
| // TODO: Add exchange map if multiple exchanges are needed. | ||
| BlockExchange _exchange; |
There was a problem hiding this comment.
I think it breaks some abstraction barriers to allow any piece of code that has access to the PlanRequestContext to exchange blocks via a BlockExchange. Only the MailboxSendOperator should be able to send blocks IMO - otherwise it can be difficult to debug the ordering of events that are sent.
| new Object[]{"SELECT * FROM b WHERE col3 < 0.5"}, | ||
|
|
||
| // Hybrid table | ||
| // new Object[]{"SELECT * FROM b ORDER BY col1, col2 DESC LIMIT 3"}, |
There was a problem hiding this comment.
(reminder) I know this is a draft, but let's make sure these pass and uncomment them (or delete them if we don't want them anymore)
| } | ||
|
|
||
| @Override | ||
| public void close() |
There was a problem hiding this comment.
(suggestion) maybe this is the default implementation in BaseOperator? (will make the review a bit easier)
|
This PR has too many merge conflicts now. I'll just write a new one |
This PR moves sending mailbox map from global map (MailboxService) to per request context (PlanRequestContext).
The old design has two issues:
The new implementation moves the instance and exchange to per request context.
Following PRs would be: