Simplify MemoryWriteExec #6154

tustvold · 2023-04-28T11:07:15Z

Which issue does this PR close?

Closes #.

Rationale for this change

Simplifies the logic in #6049 to make it a bit easier to see what is actually going on.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

tustvold · 2023-04-28T11:08:25Z

datafusion/core/src/physical_plan/memory.rs

-            )?))
-        }
+        let schema = self.schema.clone();
+        let state = (data, self.batches[partition % batch_count].clone());


The previous logic had a special case for if the partitioning matched, this would effectively save an atomic increment per batch. Given we are polling a dyn Stream here, I am very skeptical there is any performance difference

There is a performance difference of batch_count * acquire_lock, if this is OK, we can move on to this. Do you think it would affect the benchmarks? I am not quite familiar with that part.

tustvold · 2023-04-28T11:14:40Z

datafusion/core/src/physical_plan/memory.rs

+        let schema = self.schema.clone();
+        let state = (data, self.batches[partition % batch_count].clone());
+
+        let stream = futures::stream::unfold(state, |mut state| async move {


unfold is incredibly useful. The only downside is you can't name the type, at least until we get existential types

Now I get what you mean, if we do not hold the async lock in the state as acquired, the folding becomes possible. This substantially shrinks the code size, cool pattern.

We could easily hold the lock in the state if we wanted to, I just didn't think it warranted the added complexity. You could definitely do something like

let stream = futures::stream::unfold(state, |mut state| async move { let locked = state.1.write_owned().await; loop { let batch = match state.0.next().await { Some(Ok(batch)) => batch, Some(Err(e)) => return Some((Err(e), state)), None => return None, }; locked.push(batch) } });

Or even

let stream = futures::stream::unfold(state, |mut state| async move { let locked = state.1.write().await; loop { let batch = match state.0.next().await { Some(Ok(batch)) => batch, Some(Err(e)) => { drop(locked); return Some((Err(e), state)) } None => { drop(locked); return None } }; locked.push(batch) } });

Ultimately an uncontended lock is not going to matter to performance, unless in a hot loop with no other branches

LGTM, both implementations are neat (partitions matching & not matching). Thanks for the PR.

alamb

Thank you @tustvold

This looks like a great simplification to me. cc @metesynnada and @ozankabak

ozankabak · 2023-04-28T14:14:46Z

Will take a look and measure if the special casing is really not useful or not

alamb · 2023-04-30T11:23:21Z

Will take a look and measure if the special casing is really not useful or not

@ozankabak have you had a chance to review the performance of this PR? If so, is it acceptable to merge?

ozankabak · 2023-04-30T14:13:15Z

Will do tomorrow and let you know

ozankabak · 2023-05-02T14:49:19Z

We did the benchmarks, @metesynnada will share them here today

metesynnada · 2023-05-02T15:14:04Z

TLDR; it is recommended to retain both versions of the lock mechanism to accommodate both scenarios where the partitions match and where they do not match.

So, we ran a benchmark to see how two different locking strategies perform when processing a stream of RecordBatch objects. Here are the two functions we compared:

lock_only_once:

fn lock_only_once(schema: SchemaRef, state: (SendableRecordBatchStream, Arc<RwLock<Vec<RecordBatch>>>)) -> SendableRecordBatchStream{
    let iter = futures::stream::unfold(state, |mut state| async move {
        let mut locked = state.1.write().await;
        loop {
            let batch = match state.0.next().await {
                Some(Ok(batch)) => batch,
                Some(Err(e)) => {
                    drop(locked);
                    return Some((Err(e), state))
                }
                None => {
                    drop(locked);
                    return None
                }
            };
            locked.push(batch)
        }
    });
    Box::pin(RecordBatchStreamAdapter::new(schema, iter))
}

lock_multiple:

fn lock_multiple(schema: SchemaRef, state: (SendableRecordBatchStream, Arc<RwLock<Vec<RecordBatch>>>)) -> SendableRecordBatchStream{
    let iter = futures::stream::unfold(state, |mut state| async move {
        loop {
            let batch = match state.0.next().await {
                Some(Ok(batch)) => batch,
                Some(Err(e)) => return Some((Err(e), state)),
                None => return None,
            };
            state.1.write().await.push(batch)
        }
    });
    Box::pin(RecordBatchStreamAdapter::new(schema, iter))
}

We tested these functions with an input size of 10,000, and here's what we found:

Multiple lock strategy (lock_multiple function):

Mean execution time: 545.97 µs
Range: [544.78 µs, 547.26 µs]
A couple of outliers: 2 (2.00%) high mild

Single lock strategy (lock_only_once function):

Mean execution time: 330.21 µs
Range: [329.53 µs, 330.94 µs]
A few outliers: 7 (7.00%) in total
- 1 (1.00%) low mild
- 3 (3.00%) high mild
- 3 (3.00%) high severe

So, it looks like the "Single lock" strategy (using the lock_only_once function) works better than the "Multiple lock" strategy (using the lock_multiple function) for the given input size of 10,000, as we expected. The "Single lock" strategy takes about 39.5% less time to complete compared to the "Multiple lock" strategy.

tustvold · 2023-05-02T16:39:29Z

I can add the lock multiple approach if people feel strongly, although at 20ns per row that seems unlikely this would be visible in any practical workload - I wish our kernels were a similar order of magnitude 😅

ozankabak · 2023-05-02T17:18:04Z

Thank you! Since the code difference is small thanks to how you are leveraging unfold, let's add it and then this is good to go from our perspective.

…-exec

alamb · 2023-05-04T13:19:09Z

I made the change suggested by @metesynnada by #6154 (comment) in 01fa417 and merged up from main to try and nudge this PR along.

datafusion/core/src/physical_plan/memory.rs

Co-authored-by: Metehan Yıldırım <100111937+metesynnada@users.noreply.github.com>

ozankabak · 2023-05-04T17:34:06Z

Seems like we have a whitespace issue making cargo fmt fail

alamb · 2023-05-04T20:29:47Z

Thanks @metesynnada @tustvold and @ozankabak

Here is another PR that I think simplifies and speeds up this code even more: #6236

Simplify MemoryWriteExec

24eff42

github-actions bot added the core Core DataFusion crate label Apr 28, 2023

tustvold commented Apr 28, 2023

View reviewed changes

tustvold mentioned this pull request Apr 28, 2023

MemoryExec INSERT INTO refactor to use ExecutionPlan #6049

Merged

tustvold commented Apr 28, 2023

View reviewed changes

alamb approved these changes Apr 28, 2023

View reviewed changes

alamb added 2 commits May 4, 2023 09:07

Merge remote-tracking branch 'apache/main' into simplify-memory-write…

cbd6180

…-exec

Hold lock for entire loop

01fa417

metesynnada reviewed May 4, 2023

View reviewed changes

datafusion/core/src/physical_plan/memory.rs Outdated Show resolved Hide resolved

Update datafusion/core/src/physical_plan/memory.rs

a14d7db

Co-authored-by: Metehan Yıldırım <100111937+metesynnada@users.noreply.github.com>

fmt

0ebb752

alamb mentioned this pull request May 4, 2023

Simplify and speed up MemoryExec insert #6236

Merged

alamb merged commit 6118a93 into apache:main May 4, 2023

Simplify MemoryWriteExec #6154

Simplify MemoryWriteExec #6154

Uh oh!

Conversation

tustvold commented Apr 28, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

tustvold Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

metesynnada Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

tustvold Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

metesynnada Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

tustvold Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metesynnada Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

ozankabak commented Apr 28, 2023

Uh oh!

alamb commented Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ozankabak commented Apr 30, 2023

Uh oh!

ozankabak commented May 2, 2023

Uh oh!

metesynnada commented May 2, 2023

Uh oh!

tustvold commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ozankabak commented May 2, 2023

Uh oh!

alamb commented May 4, 2023

Uh oh!

Uh oh!

ozankabak commented May 4, 2023

Uh oh!

alamb commented May 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tustvold Apr 28, 2023 •

edited

Loading

alamb commented Apr 30, 2023 •

edited

Loading

tustvold commented May 2, 2023 •

edited

Loading