Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] [DataFusion] Can't alias an aggregate expression #25588

Closed
asfimport opened this issue Jul 18, 2020 · 3 comments
Closed

[Rust] [DataFusion] Can't alias an aggregate expression #25588

asfimport opened this issue Jul 18, 2020 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Jul 18, 2020

The following test (on execute) fails:

    #[test]
    fn aggregate_with_alias() -> Result<()> {
        let results = execute("SELECT c1, COUNT(c2) AS count FROM test GROUP BY c1", 4)?;

        assert_eq!(field_names(batch), vec!["c1", "count"]);
        let expected = vec!["0,10", "1,10", "2,10", "3,10"];
        let mut rows = test::format_batch(&batch);
        rows.sort();
        assert_eq!(rows, expected);
        Ok(())
    }

The root cause is that, in sql::planner, we interpret COUNT(c2) AS count as An Expr::Alias, which fails the is_aggregate_expr condition, thus being interpreted as grouped expression instead of an aggregated expression. This raises the Error

General("Projection references non-aggregate values")

The planner could interpret the statement above as two steps: an aggregation followed by a projection. Alternatively, we can allow aliases to be valid aggregation expressions.

Reporter: Jorge Leitão / @jorgecarleitao

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-9520. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Andrew Lamb / @alamb:
Here is another reproducer (using the aggregate_test_100.csv data from the tests):

    CREATE EXTERNAL TABLE aggregate_test_100 (
        c1  VARCHAR NOT NULL,
        c2  INT NOT NULL,
        c3  SMALLINT NOT NULL,
        c4  SMALLINT NOT NULL,
        c5  INT NOT NULL,
        c6  BIGINT NOT NULL,
        c7  SMALLINT NOT NULL,
        c8  INT NOT NULL,
        c9  BIGINT NOT NULL,
        c10 VARCHAR NOT NULL,
        c11 FLOAT NOT NULL,
        c12 DOUBLE NOT NULL,
        c13 VARCHAR NOT NULL
    )
    STORED AS CSV
    WITH HEADER ROW
    LOCATION 'arrow/testing/data/csv/aggregate_test_100.csv';

And then run this query:

> select min(c3) from aggregate_test_100 group by c1;

Actual behavior: Error

> select min(c3) from aggregate_test_100 group by c1;
General("Projection references non-aggregate values")

Expected behavior: Results

+---------+
| min(c3) |
+---------+
| -101    |
| -95     |
| -99     |
| -117    |
| -117    |
+---------+

Note, If you include the group key, c1, in the select list, then it does work:

> select min(c3), c1 from aggregate_test_100 group by c1;
+---------+----+
| min(c3) | c1 |
+---------+----+
| -101    | a  |
| -95     | e  |
| -99     | d  |
| -117    | c  |
| -117    | b  |
+---------+----+
5 row in set. Query took 0 seconds.

Typically handling this kind of query requires that c1 is brought up in the in plan, but is hidden in the final selection.

@asfimport
Copy link
Collaborator Author

Andrew Lamb / @alamb:
Here is another example:

> create external table sales(customer_id varchar, sales bigint) stored as CSV location '/tmp/foo.csv';
0 rows in set. Query took 0 seconds.

> SELECT customer_id, sum(sales) FROM sales ORDER BY sum(sales);
General("Projection references non-aggregate values")

@asfimport
Copy link
Collaborator Author

Andy Grove / @andygrove:
Issue resolved by pull request 8322
#8322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant