-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: change the contract of Next to minimize wasteful allocations and copies #24452
Comments
I think there's an issue with this when using any sort of asynchronous communication between processors as described in #22462. Defensive copies won't solve this issue because if the upstream doesn't create a copy, it could overwrite rows before the downstream processor has had a chance to read & copy. I think that this is a good idea in the general case but it's inevitable that we determine whether a processor is running asynchronously to others at plan time and swap out its |
My thinking is that (I think this is similar to what @asubiotto is describing) |
Such an upstream processor would need to do the copy ahead of time. Downstream processors would be informed of this special case by inspecting the upstream processor, like I was trying to describe in my top level comment. Am I still missing something? As a strawman, let's say
Yes, that could work. |
#24589 is a stab at fixing this. |
24589: distsqlrun: don't allocate between fused processors r=jordanlewis a=jordanlewis distsqlrun: don't allocate between fused processors Previously, `ProcOutputHelper.ProcessRow` (and, by extension, all `RowSource.Next` implementations) always allocated a fresh `EncDatumRow`. This was wasteful - not every processor needs to be able to hold a reference to the output of `RowSource.Next`. Now, `ProcessRow` never allocates a fresh `EncDatumRow`, and the contract of `RowSource.Next` has been changed to say that it's not valid to hang on to a row returned by `Next` past the next call to `Next`. Processors that need to hold on to a row from their upstreams have been modified to make an explicit copy to achieve this safely. Finally, a new `copyingRowReceiver` is introduced that makes a copy of every row that is `Push`'d to it. A `copyingRowReceiver` is inserted before every router, since routers all expect that their inputs will be immutable. This preserves the safety of sending outputs of `RowSource.Next`, which aren't safe to hold on to, to routers, which expect rows that *are* safe to hold on to. Release note: None Fixes #22462. Fixes #24452. 26355: libroach: fix excessive compactions performed by DBCompactRange r=bdarnell,tschottdorf a=petermattis Fix excessive compactions from `DBCompactRange` due to mishandling of the first and last ranges to compact. When a non-empty start or end key is specified, DBCompactRange was previously calling `rocksdb::DB::CompactRange` with a `null` start/end key resulting in compacting from the beginning (or to the end) of the entire key space. See #24029 Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Peter Mattis <petermattis@gmail.com>
Currently, the contract of
RowSource.Next
requires that it return rows that are usable forever. This requires a lot of extra allocations and copies inProcOutputHelper
that may not be necessary.It would be better to allow
Next
implementations to return containers that can be overwritten by the next call toNext
, the wayplanNode.Next
and many other components work. This would force changes in processors that need to retain references to the values they get from their inputs - those processors would need to start making copies of those values.One complication is that these defensive copies themselves may not always be necessary, depending on the
RowSource
a processor is reading from. For example, if theRowSource
is aRowChannel
, the rows returned fromNext()
will not be ephemeral and therefore do not be copied.To resolve this last situation, a
RowSource
should be queryable to determine whether or not it has this extra property of allowingNext
values to be held on to. This could be a method onRowSource
or just a big type switch in a function somewhere.Ultimately, this change will allow processors to allocate their output
EncDatumRow
just once up front, and copy values into that output container when returning fromNext
. This will massively reduce allocations done per processor, shifting that allocation burden when strictly necessary, and removing it altogether otherwise.The text was updated successfully, but these errors were encountered: