-
Notifications
You must be signed in to change notification settings - Fork 70
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
tl;dr
With this test data in Parquet form:
{log_time:2012-10-01T00:00:02Z,client_ip:99.85.61.193,request:"/courses/cs132/2012/",status_code:304(uint16),object_size:213(uint64)}(=bench2)
{log_time:2012-01-01T00:00:00Z,client_ip:25.152.171.147,request:"/books/Six_Easy_Pieces.html",status_code:404(uint16),object_size:271(uint64)}(=bench2)
The where clause in the following aggregation causes the entry with client_ip:25.152.171.147 to show a count of 1 when it should have been 0.
$ SUPER_VAM=1 super -c 'from data.parquet | count() where log_time >= 2012-10-01T00:00:00Z by client_ip'
{client_ip:"99.85.61.193",count:1(uint64)}
{client_ip:"25.152.171.147",count:1(uint64)}
Details
Repro is with super commit fc8ab65. This is a simplification of the mgbench bench2/q4 query.
Starting with the data.zson.gz test data shown above, in sequential runtime we see the record with client_ip:25.152.171.147 showing a count of 0 as we'd expect given the filter where log_time >= 2012-10-01T00:00:00Z.
$ super -version
Version: v1.18.0-213-gfc8ab655
$ super -c 'from data.zson.gz | count() where log_time >= 2012-10-01T00:00:00Z by client_ip'
{client_ip:99.85.61.193,count:1(uint64)}
{client_ip:25.152.171.147,count:0(uint64)}
However, the problem surfaces if we turn the data into Parquet and execute the query in the vector runtime.
$ super -f parquet -o data.parquet data.zson.gz
$ SUPER_VAM=1 super -c 'from data.parquet | count() where log_time >= 2012-10-01T00:00:00Z by client_ip'
{client_ip:"99.85.61.193",count:1(uint64)}
{client_ip:"25.152.171.147",count:1(uint64)}
But the problem doesn't happen if I query the same Parquet file using the sequential runtime, or query the data as CSUP in vector runtime.
$ super -c 'from data.parquet | count() where log_time >= 2012-10-01T00:00:00Z by client_ip'
{client_ip:"99.85.61.193",count:1(uint64)}
{client_ip:"25.152.171.147",count:0(uint64)}
$ super -f csup -o data.csup data.zson.gz
$ SUPER_VAM=1 super -c 'from data.csup | count() where log_time >= 2012-10-01T00:00:00Z by client_ip'
{client_ip:99.85.61.193,count:1(uint64)}
{client_ip:25.152.171.147,count:0(uint64)}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working