Conversation
|
I tried the reproducer and now it fails like this However, when I cranked the ulimit via ulimit -n 10000Then it fails like this That being said, this seems like an improvement to me |
alamb
left a comment
There was a problem hiding this comment.
Thank you @cetra3
FYI @2010YOUY01 as. you may be interested in this fix
| // Append the batch | ||
| if let Some(ref mut writer) = file_shared.writer { | ||
| writer.append_batch(batch)?; | ||
| // make sure we flush the writer for readers |
There was a problem hiding this comment.
I do wonder if it is ok to flush each batch (will this result in too many small file IOs?)
There was a problem hiding this comment.
We need to do this, or we need to ensure data is written before we try reading it, maybe we could do it periodically rather than each batch
|
I got it to finish! andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ ./target/profiling/datafusion-cli -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;"
DataFusion CLI v52.1.0
+---------------------+----+--------------+----------+
| UserID | m | SearchPhrase | count(*) |
+---------------------+----+--------------+----------+
| 1313338681122956954 | 31 | | 589 |
| 1313338681122956954 | 28 | | 578 |
| 1313338681122956954 | 29 | | 572 |
| 1313338681122956954 | 33 | | 567 |
| 1313338681122956954 | 27 | | 557 |
| 1313338681122956954 | 32 | | 554 |
| 1313338681122956954 | 30 | | 552 |
| 1313338681122956954 | 34 | | 546 |
| 1313338681122956954 | 26 | | 540 |
| 1313338681122956954 | 10 | | 539 |
+---------------------+----+--------------+----------+
10 row(s) fetched.
Elapsed 233.198 seconds.That is a crazy long time though For comparison, without a memory limit it takes 1s |
|
I’d guess a file IO per batch is okay. A file per batch probably not that’s what all of this complexity tries to avoid). But it would be interesting to (1) confirm this PR doesn’t make other spilling queries slower and (2) investigate why the spilling version is so slow (can be done outside of this PR) |
|
@alamb Wdyt of adding a benchmark that forces spilling? Or an option to the current script to run with a memory limit? |
Adjust formatting
3314b1c to
572cd33
Compare
|
OK I have tried to get some benchmark results out using disk spilling and the On this branch: On I've attached the JSON of both runs: |
Call for a test, might extract from the example |
|
Here are the results from the clickbench benchmark. With 1GB memory limit, however we reach the resource limits but the bug doesn't materialize with the change: fix_spill_read_underrun.log |
Here's the comparison: This looks good to me, there's some slowdowns but overwhelmingly speedups and fails -> not fails. |
Which issue does this PR close?
Rationale for this change
This adjusts the way that the spill channel works. Currently we have a spill writer & reader pairing which uses a mutex to coordindate when a file is ready to be read.
What happens is, that because we were using a
spawn_bufferedcall, the read task would race ahead trying to read a file which is yet to be written out completely.Alongside this, we need to flush each write to the file, as there is a chance that another thread may see stale data.
What changes are included in this PR?
Adds a flush on write, and converts the read task to not buffer reads.
Are these changes tested?
I haven't written a test, but I have been running the example in the attached issue. While it now fails with allocation errors, the original error goes away.
Are there any user-facing changes?
Nope