Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text #2728

Merged
merged 3 commits into from Feb 8, 2023

Conversation

jnturton
Copy link
Contributor

@jnturton jnturton commented Dec 22, 2022

DRILL-8372: Unfreed buffers when running a LIMIT 0 query over delimited text

Description

With the following data layout

/tmp/foo
 ├──a
 │  └──test_data.csvh
 └──b
    └──test_data.csvh

a LIMIT 0 query over it results in unfreed buffer errors as shown below.

apache drill (dfs.tmp)> select * from `foo` limit 0;
Error: SYSTEM ERROR: IllegalStateException: Allocator[op:0:0:4:EasySubScan] closed with outstanding buffers allocated (3).
Allocator(op:0:0:4:EasySubScan) 1000000/299008/3182592/10000000000 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 3
    ledger[113] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 262144, references: 1, life: 277785186322881..0, allocatorManager: [109, life: 277785186258906..0] holds 1 buffers.
        DrillBuf[142], udle: [110 0..262144]
    ledger[114] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 32768, references: 1, life: 277785186463824..0, allocatorManager: [110, life: 277785186414654..0] holds 1 buffers.
        DrillBuf[143], udle: [111 0..32768]
    ledger[112] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 4096, references: 1, life: 277785186046095..0, allocatorManager: [108, life: 277785185921147..0] holds 1 buffers.
        DrillBuf[141], udle: [109 0..4096]
  reservations: 0 

Documentation

N/A

Testing

TODO

@@ -75,7 +75,7 @@ public IterOutcome innerNext() {
upStream = next(incoming);
}
// If EMIT that means leaf operator is UNNEST, in this case refresh the limit states and return EMIT.
if (upStream == EMIT) {
if (upStream == EMIT || upStream == NONE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be quite the right solution. This block of code is for a very particular case: an UNNEST.

Expanding this code, look at the top of the loop:

    if (!first && !needMoreRecords(numberOfRecords)) {

With a LIMIT 0, we hit the limit on the first batch. I'm not quite sure why the !first is in place. Maybe history would tell us. Perhaps the right answer is something like:

if ( !needMoreRecords(numberOfRecords)) {
     outgoingSv.setRecordCount(0);
     VectorAccessibleUtilities.clear(incoming);
     return super.innerNext();
}
if (!first) {
  ...

I suspect that the logic actually needs more analysis. What does it do on the first batch now? What does super.innerNext() do, and do we want that if we've reached the limit?

Generally, the debugger is the best way to sort this out. Try a LIMIT 0, a LIMIT n where n < size of the first batch, LIMIT n where n > batch size && n < 2 * batch size, etc.

@jnturton jnturton marked this pull request as ready for review February 7, 2023 16:11
Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! +1

@jnturton jnturton merged commit 03ba190 into apache:master Feb 8, 2023
@jnturton jnturton deleted the 8372-limit0-unfreed-bufs branch February 8, 2023 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants