-
Notifications
You must be signed in to change notification settings - Fork 237
Add more measurements for Arc #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you. |
|
That's kind of weird indeed. @xe-nvdk ? |
|
Let me see if I can replicate this. The data used is the parquet file downloaded and we query from those, without any modification. |
|
Yep, indeed is not returning data. === Query 37 Results === === Query 38 Results === === Query 39 Results === === Query 40 Results === === Query 41 Results === === Query 42 Results === === Query 43 Results === Let me see what we are missing in the queries. |
|
The deviation may be due to your use of integers in the filters instead of dates. I suspect if you use the same queries as DuckDB (parquet, single), you should get the same results. Do you recall why you changed them before? |
Yes, to match on how we save the data, that we save in unix time, but too much stuff we were changing these days so, I'm not 100% sure. I will try with the queries for duckdb, parquet that I guess that is what you take. We don't query the parquet directly with duckdb, is going through an api endpoint, so I guess that is going to be a little bit of overhead there. I will post results or findings. |
|
They map the EventDate field to an actual date on ingest (view creation): ClickBench/duckdb-parquet/create.sql Line 3 in fb49f11
They also use single-shot CLI calls, so I think it should even out (or be slower) than your API version: https://github.com/ClickHouse/ClickBench/blob/main/duckdb-parquet/run.sh |
|
Found the issue! The EventDate range in the dataset is 15888 to 15917, but queries 37-43 are searching for 16262-16292 (which is way out of range). That's why they're returning no results! I will run this and update the results for c6a.4xlarge.json |
|
Ok, we have new values, I'm pushing this now. I revisited all the queries and all of them returned values. 121.69 cold run. 36.40 hot run. So, the comparison with DuckDB now is going to have more sense. Thank you @nwoolmer for bringing this to my attention. |

Added a few more measurements for Arc after the scripts were re-submitted.