I ran some tests with clickbench, reading from local files is worse after we merged #20823
[ec2-user@ip-172-31-0-185 datafusion]$ ./benchmarks/bench.sh compare json-test-on-main test-json-improvement
Comparing json-test-on-main and test-json-improvement
--------------------
Benchmark clickbench_2.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ json-test-on-main ┃ test-json-improvement ┃ Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 2938.54 ms │ 36468.92 ms │ 12.41x slower │
│ QQuery 1 │ 4189.48 ms │ 36706.26 ms │ 8.76x slower │
│ QQuery 2 │ 3021.24 ms │ 36695.04 ms │ 12.15x slower │
│ QQuery 3 │ FAIL │ FAIL │ incomparable │
│ QQuery 4 │ 3518.24 ms │ 37016.08 ms │ 10.52x slower │
│ QQuery 5 │ 3138.41 ms │ 37131.63 ms │ 11.83x slower │
│ QQuery 6 │ FAIL │ FAIL │ incomparable │
│ QQuery 7 │ 4191.68 ms │ 36874.60 ms │ 8.80x slower │
│ QQuery 8 │ 4405.33 ms │ 37054.97 ms │ 8.41x slower │
│ QQuery 9 │ 3473.41 ms │ 37308.28 ms │ 10.74x slower │
│ QQuery 10 │ 4351.06 ms │ 36934.39 ms │ 8.49x slower │
│ QQuery 11 │ 3306.45 ms │ 37101.39 ms │ 11.22x slower │
│ QQuery 12 │ 3226.21 ms │ 37235.60 ms │ 11.54x slower │
│ QQuery 13 │ 3970.11 ms │ 37244.27 ms │ 9.38x slower │
│ QQuery 14 │ 3246.59 ms │ 37085.69 ms │ 11.42x slower │
│ QQuery 15 │ 4563.53 ms │ 37182.89 ms │ 8.15x slower │
│ QQuery 16 │ 4506.85 ms │ 37391.07 ms │ 8.30x slower │
│ QQuery 17 │ 4377.16 ms │ 37381.49 ms │ 8.54x slower │
│ QQuery 18 │ 3555.18 ms │ 37603.25 ms │ 10.58x slower │
│ QQuery 19 │ 4568.01 ms │ 36996.50 ms │ 8.10x slower │
│ QQuery 20 │ 3193.87 ms │ 37069.19 ms │ 11.61x slower │
│ QQuery 21 │ 4415.33 ms │ 37185.73 ms │ 8.42x slower │
│ QQuery 22 │ 3312.73 ms │ 37190.81 ms │ 11.23x slower │
│ QQuery 23 │ FAIL │ FAIL │ incomparable │
│ QQuery 24 │ 4382.53 ms │ 37093.81 ms │ 8.46x slower │
│ QQuery 25 │ 4339.69 ms │ 37121.90 ms │ 8.55x slower │
│ QQuery 26 │ 4425.42 ms │ 37106.02 ms │ 8.38x slower │
│ QQuery 27 │ 4505.30 ms │ 37059.04 ms │ 8.23x slower │
│ QQuery 28 │ 3582.82 ms │ 37409.12 ms │ 10.44x slower │
│ QQuery 29 │ 4440.96 ms │ 36868.93 ms │ 8.30x slower │
│ QQuery 30 │ 4675.71 ms │ 37081.23 ms │ 7.93x slower │
│ QQuery 31 │ 4276.55 ms │ 37165.64 ms │ 8.69x slower │
│ QQuery 32 │ 3615.42 ms │ 37662.39 ms │ 10.42x slower │
│ QQuery 33 │ 4446.09 ms │ 37558.30 ms │ 8.45x slower │
│ QQuery 34 │ 4521.66 ms │ 37647.72 ms │ 8.33x slower │
│ QQuery 35 │ 4321.41 ms │ 37225.06 ms │ 8.61x slower │
│ QQuery 36 │ FAIL │ FAIL │ incomparable │
│ QQuery 37 │ FAIL │ FAIL │ incomparable │
│ QQuery 38 │ FAIL │ FAIL │ incomparable │
│ QQuery 39 │ FAIL │ FAIL │ incomparable │
│ QQuery 40 │ FAIL │ FAIL │ incomparable │
│ QQuery 41 │ FAIL │ FAIL │ incomparable │
│ QQuery 42 │ FAIL │ FAIL │ incomparable │
└───────────┴───────────────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Total Time (json-test-on-main) │ 131002.95ms │
│ Total Time (test-json-improvement) │ 1225857.24ms │
│ Average Time (json-test-on-main) │ 3969.79ms │
│ Average Time (test-json-improvement) │ 37147.19ms │
│ Queries Faster │ 0 │
│ Queries Slower │ 33 │
│ Queries with No Change │ 0 │
│ Queries with Failure │ 10 │
└──────────────────────────────────────┴──────────────┘
The issue is the into_stream function of objects store's get_result reads data in 8KiB chunks for local files, so we need to either replace it with custom code or use a completely separate path for local files, as it was done previously
Originally posted by @ariel-miculas in #20823 (comment)
I ran some tests with clickbench, reading from local files is worse after we merged #20823
The issue is the
into_streamfunction of objects store'sget_resultreads data in 8KiB chunks for local files, so we need to either replace it with custom code or use a completely separate path for local files, as it was done previouslyOriginally posted by @ariel-miculas in #20823 (comment)